Hello, World

October 10, 2023 · 15 min read

Co-founder of Debox

This is Debox.

We'll soon get to part where we explain what Debox is, but first imagine the following:

You are able to securely access your files on all your devices without having to rely on any centralized cloud storage provider. You can securely stash your digital assets in a dependable decentralized network that would enable you to easily retrieve them at any time from any device. You and only you have the keys to your data; you can share it whenever and with whomever. Finally, this is all done in a streamlined, ergonomic way, avoiding all the notorious complexities of interfacing with decentralized networks.

In a nutshell, that is our vision for Debox. That is also the starting point to understanding Debox's value proposition.

In this introductory article, we share a brief overview of the current larger decentralized storage landscape and, ultimately, how we believe Debox fits into it.

But first, let's give some background context.

"everyone needs storage"

Data storage has become an inseparable part of life in our modern world. We all rely on it and use it in at least some capacity on a daily basis. The ongoing proliferation of storage devices and cloud services shows no sign of slowing down.

Let's look at the numbers. According to market research, the global cloud storage market size was valued at around USD 90 billion in 2022 (surpassing Statista's forecast from 2021) and will be around USD 100 billion in 2023. It is projected to be between 300-500 billion by 2030, exhibiting a CAGR (Compound Annual Growth Rate) between 17.4% and 23.4%.

That's quite a substantial market.

"what's the issue with current storage solutions?"

Why fix something that's not broken?

Well, it's fundamentally an issue of trust. Who do you trust with your data? Is it fundamentally a good thing that the aforementioned market is essentially monopolized by a handful of corporations that never have the best interests of the billions of users that depend on them at heart?

The consolidation and centralization of data leads to complete dependence on these 3rd-party entities for storing/sharing data both in a personal and professional context. Even though these corporations boast secure data centers and close to 100% up-time, if something were to go wrong (i.e. power loss, infrastructure failure), users could potentially lose their data.

Furthermore, these corporations cannot guarantee complete data security and invulnerability from data leaks or data breaches. They cannot be independently audited (i.e. they are all completely closed source). This is coupled with a lack of tangible guarantees regarding user privacy. No one is stopping these platforms from having direct access to client data and metadata, regardless of what they promise in their terms and conditions.

In many countries we see the proliferation of government surveillance and censorship activities. Globally, we've seen ISPs restrict access to various services that don't comply with regional government policies, potentially affecting file storage/sharing access.

In short, regardless of one's position on data privacy, it is impossible to ignore these issues.

"what is the alternative?"

In an attempt to offer a fundamentally new paradigm as a potential solution to some of the world's systemic issues (including trust, consensus, privacy, security, permanence, etc.), decentralized systems seem promising. Nonetheless, a more thorough evaluation of their capabilities often times demonstrates that they still fall short of (perhaps initial) expectations, especially in direct comparison to current Web2 solutions.

Let's start with a simple down-to-earth example that most people can relate to:

"I have a bunch of files. I want to store (and easily retrieve) them without relying on a 3rd-party company (i.e. cloud storage provider). I want a similar experience to using something like Dropbox or Google Drive."

At the moment, this is not possible. There are no out-of-the box consumer-oriented completely decentralized alternatives that mirror the experience (and robustness) of platforms such as Dropbox or Google Drive.

There is, however, much development happening in the problem space of decentralized storage, bringing the initially articulated vision closer to reality.

So what does the current decentralized storage landscape look like?

All projects in this space can be roughly divided into two main groups:

group 1: projects that are independently tackling this problem, while being supported by small communities with limited developers and resources
group 2: projects that have emerged around the evolving IPFS/Filecoin ecosystem

Even though projects from the first group present a range of novel approaches to tackling the problem of decentralized storage, the latter group , arguably, has demonstrated the most potential in actually realizing a sustainable approach to fostering a community where, in fact, most development around decentralized storage has occurred.

Projects from the first group include:

This is by no means an exhaustive list of all the projects that are out there; however, it is more or less representative of the general non-IPFS/Filecoin decentralized storage landscape.

Before diving into the second group, we want to emphasize a simple, yet potentially overlooked concept when it comes to conceptualizing and evaluating solutions to problems in the decentralization problem space.

"think in terms of an ecosystem"

The complexity & dynamics of decentralized systems necessitates shifting from a mental model of isolated Web2 processes to one that conceptualizes and assumes the participation of multiple independent actors.

Instead of thinking in terms of user-facing self-contained services (usually provided by a single entity), think in terms of an ecosystem of independently functioning networks, each responsible for solving an aspect of the problem, while being inextricably linked together via evolving crypto-economic incentives, altogether yielding the actual solution to the problem. The solution itself can be conceptualized as the convergence of the dynamics of the ecosystem

So whereas the functionality of storing, sharing, retrieval is provided by a single entity via a simple interface in Web2, the same functionality in Web3 can be modeled as an ecosystem of networks (made up of many thousands of entities), where multiple networks address various sub-problems of providing storage, such as:

receiving and encrypting user data
determining where the data is allocated (with appropriate redundancy)
ensuring that the data is always stored in a provable manner and accessible for retrieval
expediently retrieving the data to the authorized user
providing quick access to publicly available data
ensuring that all network participants are aptly rewarded for their resource contributions
and many others...

This is exactly what IPFS/Filecoin has done.

To get a better understanding of how this ecosystem actually works, we must understand the core protocol, which lies at the heart of the ecosystem - IPFS

"so, what is IPFS?"

Again, we won't go into too much detail here. We're just trying to give an overview of the landscape of the IPFS/Filecoin ecosystem. (Stay tuned for future more in-depth articles on IPFS and other parts of the ecosystem.)

"IPFS is a modular suite of protocols for organizing and transferring data, designed from the ground up with the principles of content addressing and peer-to-peer networking." (Refer to the official docs for a great overview of IPFS and all related concepts. )

Let's emphasize what does IPFS actually do (and not do):

IPFS provides a network for transferring data between any of its nodes
IPFS addresses all data (based on its content, not location)
IPFS does NOT provide content encryption
IPFS is NOT itself a storage provider, i.e. IPFS does not offer offline data persistence

To elaborate on the last point: if an IPFS node goes offline, the data that was being shared by it via the IPFS network is no longer accessible. There is no automatic network-initiated replication of your data across the decentralized "ether".

(On a side note: Now if you wanted to setup an IPFS node that would continually be online, you could make your own IPFS "cloud". But that is just another version of a self-hosted cloud storage solution. And there are many good open source solutions for that, especially when considering that IPFS lacks out-of-the-box content encryption & mounting functionality.)

Other IPFS nodes can "pin" the data that I am sharing, which would increase the redundancy of the data and help ensure that it is accessible even when my node is not. However, that begs the questions. Who's going to "pin" my data and what would their incentive be? Obviously, some form of monetary compensation, if that can be arranged.

And yes, that is not only possible, it in fact has already been done. Here's a list of featured providers:

Each of these providers manage a cluster (or several clusters) of IPFS nodes that are used to "host" client data via "pinning".

But wasn't this supposed to be a decentralized storage ecosystem? Looks no different from traditional cloud providers, while being more expensive and lacking the robustness that traditional cloud providers guarantee. Also, if any of these providers decide to close shop (such precedents have been set by Textile and Estuary), it's on you to adapt to whatever situation that puts you in. Given the dynamics of nascent markets & evolving ecosystems, such developments are expected. Regardless, pinning services are promoting the proliferation of the underlying technologies, which is crucial for expansion of the ecosystem's infrastructure. We by no means are discouraging the usage of these platforms; we're merely re-iterating the fact that they do not actually provide truly decentralized storage, even though their infrastructure is based on IPFS.

In short, IPFS gives us a foundation for decentralized data look-up and transfer, but needs to be further complemented with functionality related to provable storage, storage redundancy, encryption, etc. to bring us closer to actual decentralized storage.

"this is where Filecoin comes in"

Filecoin is a parallel p2p network that combines the same IPFS protocol stack with blockchain consensus and crypto-economic incentives to provide dependable provable storage. (Forgoing the details in this article, we highly recommend taking a look at the official docs for more information, especially the section that introduces Filecoin basics - What is Filecoin.)

At a high level, Filecoin introduces decentralized data persistence to the ecosystem by enabling network participants to "offload" their data to other network participants that have joined the network as storage providers. The network ensures that the data is actually being stored in a provable manner and that storage providers are compensated by data owners.

Sectors, sealing, unsealing, provable storage...

The current implementation of decentralized data persistence comes with certain practical limitations. The process of persisting one's data in the network is still somewhat manual (though there are utilities that partially expedite it, such as Boost). Some parts of the process exhibit high latency. Some of these initial limitations are being solved via ecosystem extensions, such as retrieval networks (more on that in the next section).

To elaborate:

WHAT THE USER WANTS:

a simple interface to upload variable-sized files to the network
a simple way to pay for the amount of storage that is being used
low latency retrieval of data from the network

WHAT THE USER IS CURRENTLY EXPECTED TO DO:

find a storage provider & ask for price (via Filecoin+, other?)
negotiate a deal (off-chain), based on the data volume, length of storage, etc. and commit funds to the deal (while storage provider stakes collateral)
if all conditions are met, wait for the deal to be published to the Filecoin blockchain. the data then is transferred to the storage provider
if the user wants redundancy, they must repeat with another storage provider
to retrieve the data, the user must issue a request for retrieval and pay? this takes a long time

And this does't mention other caveats such as:

provable storage uses sectors of 32/64GB, forcing storage providers to set minimum file size thresholds (why?)
though encouraged, storage providers are not obligated to store a copy of unsealed data

The economics of storage essentially prohibit average users from becoming storage providers due to the high upfront infrastructure costs for fulfilling minimum requirements. [Link?] The average person will not make enough money to even cover costs. storage.market? Calculation? The storage provider on-boarding is non trivial. [Link?] There are more providers (12 exabytes committed) than current demand.

From a practical perspective, Filecoin functions as a decentralized archive. Even with all the initiatives, it is still not what people understand and anticipate as decentralized storage. The evolving ecosystem is trying to further complement Filecoin's functionality and extend it use case as the backbone of the ecosystem.

Initiatives that address these issues, by adding more to the ecosystem...

storage on-ramps (boost, etc.)
FVM ([link?], perpetual storage, replication)
retrieval markets...

This brings us to one of the main frontiers and, possibly, on of the greatest current challenges in decentralized storage development - retrieval. Specifically, low latency retrieval of data that is stored in a provable manner.

"retrieval is... complicated"

Specifically, retrieval is complicated within a complex ecosystem of decentralized networks working together to provide decentralized storage to a wide array of end users, whose data needs range from a few megabytes to multiple petabytes.

A fundamental part of a usable decentralized storage solution is completely dependable low latency data retrieval. Users obviously expect to be guaranteed continual access to their data for quick retrieval.

Currently, once your data is in Filecoin, you must issue a request for it to be returned. Since storage providers are not obligated (even though they are encouraged) to store a duplicate non-sealed copy of your data for faster retrieval, there is no guarantee that you will receive your data immediately upon request. As noted in the previous section, retrieval is a manual process (i.e. a retrieval deal), mirroring the initial upload phase (i.e. a storage deal). [Link?] One can simplify the process with [Boost], but it still remains not trivial. Furthermore, data in Filecoin is stored in large (32GB?) sectors. It takes time for the storage provider to unseal them and return the unsealed data to the user. [Link?]

Again, conceptualizing the overall solution in terms of the interplay between various decentralized layers within an ecosystem, Filecoin has solved the decentralized storage problem to the point of retrieval. Retrieval is possible, but only for limited use-cases where latency is not an issue and users don't mind paying separately for retrieval (ex. retrieving archival data).

Conceptualizing retrieval as a further extension of the ecosystem, functioning independent from Filecoin, while complementing its functionality, we require additional networks that cater to various categories of retrieval. This subset of the ecosystem is known as the retrieval market. (Refer here for a more in-depth overview of the various projects that are part of the retrieval market initiative within the IPFS/Filecoin ecosystem.)

Economics of retrieval, coupled with the economics of Storage. Low prices...

As mentioned in the retrieval.markets roadmap, we should see integration with the FVM in the near future to facilitate the integration of L2 networks to streamline retrieval from Filecoin.

Currently most of the work in the retrieval market is focused on bringing decentralized CDNs to the IPFS/Filecoin ecosystem with Saturn as the initial proof of concept. Even though Saturn's crypto-economics are still being finalized and formalized, we can already get a sense of the incentive dynamics [Link?].

CDNs are just one type of retrieval that are optimized for small size publicly available data that is high-demand., such as websites, media.

Roughly speaking, we can divide retrieval into the following categories:

CDNs for quick retrieval of websites, media, high-demand publicly available data
Large volumes in long term storage, both public & private, such as scientific data sets, archives
low-demand variable volume, in the few MBs- few GBs range, such as personal files, photos, documents

This last category is addressed by a handful of projects that offer a centralized "wrapper" over IPFS/Filecoin. Examples:

They offer a decent experience, however, they are not truly decentralized... a stepping stone in the right direction

We anticipate more development in this last area....

"overall, a promising future"

As we have seen up to this point, in spite of remaining pain points, the IPFS/Filecoin ecosystem is headed in a promising direction. As part of our contribution to the community, we will be covering many of the already mentioned aspects and well as aspects that we have not presented in this overview. Each part deserves an entire article (or even series) to do justice to all the nuances.

To conclude, we recommend a thorough look at the following compilation to get a sense for all the groups and projects that are part of the IPFS/Filecoin ecosystem:

"so, what about Debox?"

Remember the vision that we started with?

That is ultimately what we are trying to realize within the IPFS/Filecoin ecosystem.

we want to be dropbox working on providing a solution that enables the average user to seamlessly transition from Web2 cloud storage providers to Web3 decentralized storage integration with user OS, integration with import

planning on launching its own (L2) complementary decentralized storage & retrieval network for data volumes geared towards the average user

looking to provide value to the IPFS / Filecoin ecosystem and community, since we are building on top of these technologies

Currently, we offer a prototype CLI as our way of starting this conversation with the community. We invite feedback. We want to be transparent. We want to grow.

http://debox.network
X.com
discord
github

"everyone needs storage"​

"what's the issue with current storage solutions?"​

"what is the alternative?"​

"think in terms of an ecosystem"​

"so, what is IPFS?"​

"this is where Filecoin comes in"​

"retrieval is... complicated"​

"overall, a promising future"​

"so, what about Debox?"​

Sources / References​