What's IPFS and why should you care?

IPFS is a decentralized protocol that we use to store data.

In “explain like I’m 5” terms: it’s a tool that anyone can use to share data between many computers.

Here’s how it works:

You install the IPFS Software on many computers and make sure they know about each other. The software running on these machines forms an IPFS Network.

This IPFS Network can have any shape:

Maybe there is only 1 computer, it’s a rather simple network. Maybe there are a million computers.
Maybe they are all connected to each other, it’s a very dense network. Maybe they only know about ONE or TWO other peers. It’s a quite “skeletal” network.

No matter the network size and shape, IPFS makes a few promises. The most basic one is the following:

Any of the computers on the network can upload files and can download files from the whole network. Same thing if you start a client and connect it to one of the machines of this network. The client has access to all the files on every machine of the network.

In the end, IPFS turns a network of computers into a single HUGE hard drive that can store and retrieve files.

That’s a distributed storage network. The storage is distributed amongst many machines and it acts as one big entity.

We rely on a network of computers

Now that we took the time to set up this network of computers so that it acts as a single storage device, what did we gain? Was it worth the effort?

IPFS makes more promises:

First of all, as long as there is a path between two members of the network, they can exchange data and files. So if I’m on one side of the network, I can safely assume that I’ll be able to get files from the other side of the network. That means the network acts as a single storage system.

Second, if any machine on the network tries to mess up with the data, the software on the other end, and eventually the user, will know about it. When a client retrieves a piece of data, like a field, it’s easy to verify that its content is actually what you asked for. No tampering is possible.

Third, the network is designed to keep working if a machine leaves the network. A machine can enter or leave the network without the need for any kind of ceremony. No need to register it, update a fixed database, or tell everyone in the network. It does mean that the file stored on a machine can disappear if no one else has replicated the data.

Fourth, the network is designed to be efficient in terms of storage: If you store the same data twice in the network, it will be KNOWN as a single piece of data with a single “identifier”. The network will always see a single piece of data and it won’t duplicate the amount of storage or bandwidth necessary to process it.

Fourth and a half, the network is designed to be efficient in terms of bandwidth. If a client “requests” a piece of data from the network, the machines that are the closest to them will answer. Instead of moving bits from one side of the ocean to the other, you might download them directly from your neighbors if they have it. It happens automatically.

With all these promises, we can implement decentralized distribution

I’ll ignore the “efficiency” part. These are incremental improvements in my opinion: nice to have but they do not fundamentally change its features.

What is interesting here, is that with these promises, anyone in the network has the same “weight” as anyone else. And that is the basis for decentralization.

If I were to build an application and store its data on IPFS. As long as you have access to the network that I’m using, you may be able to access the data and replicate it. And if you are willing to pay to host the data (someone has to replicate the data at some point), you would get the same “power” as I have.

The network would even become stronger since now there are more machines duplicating and distributing the data.