As Graydon Hoare says:
This post is in response to Mark Boas' P2P Web Apps - Brace yourselves, everything is about to change.
The old client-server web model is out of date, it’s on its last legs, it will eventually die. We need to move on, we need to start thinking differently if we are going to create web applications that are robust, independent and fast enough for tomorrow’s generation. We need to decentralise and move to a new distributed architecture
I've come to the conclusion that something like Git is the right architecture to move forward. This post tries to explain why and how.
It's somewhat disorganized, but I want to get this out of the Drafts folder.
A Git overview
Git is extremely simple: you have:
- an object store, mapping content addresses to content objects
- a set of references, mutable pointers to certain objects
Git uses trees to create filesystem-like hierarchical structures.
By knowing the root of the tree, we know the whole tree. (The connection to persistent data structures is interesting.)
Commits are just another kind of content object, which point to a particular tree. Commits also point to their parent commits, creating a chain of commits, recording the history of the repository over time.
Git's architecture has huge benefits:
- The integrity of arbitrarily large and complex data structures can be verified with a single hash comparison.
- Synchronization between two repositories is made extremely efficient: the repo that wants to sync with a remote repository sends the IDs of commits it has to the remote, which tells the remote what objects the repo is missing. These are then sent to the repo in a compressed packfile.
Git for P2P Web Apps
What's a P2P web app? I'm thinking of things like del.icio.us, wikis, Twitter - but serverless.
Well, there may be servers, but they're not the central component anymore. They can act as caches, and as always-on storage.
How could this work?
- Local storage: A user's data is stored locally in the browser.
- Browser-2-browser communications: There's a way to talk to other users' browsers. (Initially this will have to run through a in-cloud relay server, but I'm sure we'll get browser-2-browser comms soon. See for example IETF's Real-Time Communication in WEB-browsers.)
- Efficient sync: Use Git-like protocol to retrieve other users' data and cache it locally.
- Cloud-based caches/storage: Servers are just peers in the network, that are usually online.
- Indexing: Peers index interesting data using IndexedDB to provide views like timelines and search.
- Security: This is really a huge issue, but I think it can be solved with a liberal sprinkling of public-key cryptography. ;)
One problem Git has is with very large directories. But we can easily work around this using something like a date-based YYYY/MM/DD hierarchical structure.
The main thing to get is that integrity and efficient transfer of changes falls right out of the Git model (or any similar content-based model.)
The great aesthetic which will inaugurate the twenty-first century will be the utterly invisible quality of intellectual integrity; the integrity of the individual dealing with his scientific discoveries; the integrity of the individual in dealing with conceptual realization of comprehensive interrelatedness of all events; the integrity of the individual dealing with the only experimentally arrived at information regarding invisible phenomena; and finally integrity of all those who formulate invisibly within their respective minds and invisibly with the only mathematically dimensionable, advanced technologies, on the behalf of their fellow men.
- Buckminster Fuller, 1973
- Buckminster Fuller, 1973