plans within plans within plans: May 2011

This post is in response to Mark Boas' P2P Web Apps - Brace yourselves, everything is about to change.

Mark writes:

The old client-server web model is out of date, it’s on its last legs, it will eventually die. We need to move on, we need to start thinking differently if we are going to create web applications that are robust, independent and fast enough for tomorrow’s generation. We need to decentralise and move to a new distributed architecture

I've come to the conclusion that something like Git is the right architecture to move forward. This post tries to explain why and how.

It's somewhat disorganized, but I want to get this out of the Drafts folder.

A Git overview

Git is extremely simple: you have:

an object store, mapping content addresses to content objects
a set of references, mutable pointers to certain objects

Git uses trees to create filesystem-like hierarchical structures.

By knowing the root of the tree, we know the whole tree. (The connection to persistent data structures is interesting.)

Commits are just another kind of content object, which point to a particular tree. Commits also point to their parent commits, creating a chain of commits, recording the history of the repository over time.

Git's architecture has huge benefits:

The integrity of arbitrarily large and complex data structures can be verified with a single hash comparison.
Synchronization between two repositories is made extremely efficient: the repo that wants to sync with a remote repository sends the IDs of commits it has to the remote, which tells the remote what objects the repo is missing. These are then sent to the repo in a compressed packfile.

Git for P2P Web Apps

What's a P2P web app? I'm thinking of things like del.icio.us, wikis, Twitter - but serverless.

Well, there may be servers, but they're not the central component anymore. They can act as caches, and as always-on storage.

How could this work?

Local storage: A user's data is stored locally in the browser.
Browser-2-browser communications: There's a way to talk to other users' browsers. (Initially this will have to run through a in-cloud relay server, but I'm sure we'll get browser-2-browser comms soon. See for example IETF's Real-Time Communication in WEB-browsers.)
Efficient sync: Use Git-like protocol to retrieve other users' data and cache it locally.
Cloud-based caches/storage: Servers are just peers in the network, that are usually online.
Indexing: Peers index interesting data using IndexedDB to provide views like timelines and search.
Security: This is really a huge issue, but I think it can be solved with a liberal sprinkling of public-key cryptography. ;)

One problem Git has is with very large directories. But we can easily work around this using something like a date-based YYYY/MM/DD hierarchical structure.

The main thing to get is that integrity and efficient transfer of changes falls right out of the Git model (or any similar content-based model.)

2011-05-24

CCN@PARC update

2011-05-20

Serverless Social Web Apps

2011-05-18

Elsewhere

Feeds

Archive