CCN@PARC update

As Graydon Hoare says:
CurveCP + CCNx = future. Hurry fast new-topology-net, we need you.

(Via neuraxon77.)


Serverless Social Web Apps

This post is in response to Mark Boas' P2P Web Apps - Brace yourselves, everything is about to change.

Mark writes:
The old client-server web model is out of date, it’s on its last legs, it will eventually die. We need to move on, we need to start thinking differently if we are going to create web applications that are robust, independent and fast enough for tomorrow’s generation. We need to decentralise and move to a new distributed architecture
I've come to the conclusion that something like Git is the right architecture to move forward. This post tries to explain why and how.

It's somewhat disorganized, but I want to get this out of the Drafts folder.

A Git overview

Git is extremely simple: you have:
  • an object store, mapping content addresses to content objects
  • a set of references, mutable pointers to certain objects
Git uses trees to create filesystem-like hierarchical structures.

By knowing the root of the tree, we know the whole tree. (The connection to persistent data structures is interesting.)

Commits are just another kind of content object, which point to a particular tree. Commits also point to their parent commits, creating a chain of commits, recording the history of the repository over time.

Git's architecture has huge benefits:
  • The integrity of arbitrarily large and complex data structures can be verified with a single hash comparison.
  • Synchronization between two repositories is made extremely efficient: the repo that wants to sync with a remote repository sends the IDs of commits it has to the remote, which tells the remote what objects the repo is missing. These are then sent to the repo in a compressed packfile.
Git for P2P Web Apps

What's a P2P web app? I'm thinking of things like del.icio.us, wikis, Twitter - but serverless.

Well, there may be servers, but they're not the central component anymore. They can act as caches, and as always-on storage.

How could this work?
  • Local storage: A user's data is stored locally in the browser.
  • Browser-2-browser communications: There's a way to talk to other users' browsers. (Initially this will have to run through a in-cloud relay server, but I'm sure we'll get browser-2-browser comms soon. See for example IETF's Real-Time Communication in WEB-browsers.)
  • Efficient sync: Use Git-like protocol to retrieve other users' data and cache it locally.
  • Cloud-based caches/storage: Servers are just peers in the network, that are usually online.
  • Indexing: Peers index interesting data using IndexedDB to provide views like timelines and search.
  • Security: This is really a huge issue, but I think it can be solved with a liberal sprinkling of public-key cryptography. ;)
One problem Git has is with very large directories. But we can easily work around this using something like a date-based YYYY/MM/DD hierarchical structure.

The main thing to get is that integrity and efficient transfer of changes falls right out of the Git model (or any similar content-based model.)


The great aesthetic which will inaugurate the twenty-first century will be the utterly invisible quality of intellectual integrity; the integrity of the individual dealing with his scientific discoveries; the integrity of the individual in dealing with conceptual realization of comprehensive interrelatedness of all events; the integrity of the individual dealing with the only experimentally arrived at information regarding invisible phenomena; and finally integrity of all those who formulate invisibly within their respective minds and invisibly with the only mathematically dimensionable, advanced technologies, on the behalf of their fellow men.

- Buckminster Fuller, 1973


Towards independent web apps

Werner Vogels is stripping away dependencies from his weblog app in a way that all Winerians will find pleasing.

So now we're serving static HTML like it's the 1990's all over again. (Only this time, we're saving them to the file server via HTTP, not FTP.)

This has the nice effect of not tying us into some weird, proprietary server - the only thing it needs to understand is GET and PUT, basically.

But this time, it's the 2010's, and we now have browsers powerful enough to run whatever apps we desire.

So: what about running all the HTML generation stuff in the browser, and using the server only as dumb storage? Just an idea.

The Intimate Internet: You're gonna need a thick skin

Due to social networking, we're all neighbors now. (And it's only getting worse.)

I find that this deprives me of a lot of options for jokes. The people (and their projects) I want to joke about are just one hyperlink away. I myself am quite sensitive to criticism, so I also hold back criticism of others.

But criticism is important, lest we all become a bunch of homogeneous circlejerkers.

In the old days, artists, and everybody who put out stuff into public view needed a very thick skin. It came with the territory. If you were an artist, you had to live with those blood-sucking critics.

But because the internet is so low-key, we're all just normal persons now, putting our projects out into public view.

But that shouldn't stop us from criticizing each other. I think we all need to grow thicker skins, so that we can allow the criticism that invariably comes from public attention.

Get ready to be dissed!


Content-centric networking and microdata

PARC's content-centric networking (CCN), aka named data networking, is a "master idea" about how to re-architect the internet, given what we've learned since the WWW took off.

Like Git, CCN uses content-based addressing. A piece of content has a symbolic name (e.g. /com/nytimes/front-page/2011/02/19), which is cryptographically bound to a signature of the creator of the content (e.g. NYT Corp), and to a hash of the data.

Symbolic names have hierarchical structure, and are late-bound: a common example is /room/projector which is an interface to the projector in the current room. (Plan 9, anyone?)

There are two kinds of packets used in the CCN protocol: Interest and Data packets.
  • An interest packet is analogous to an HTTP GET, and expresses the client's interest in a piece of content given its symbolic name.
  • A data packet is analogous to an HTTP response, and satisfies an interest. It contains arbitrary content, cryptographically bound to its creator via a signature and to its symbolic name. The receiver can thus easily verify that the content is legit.
CCN can not only be used for classic WWW-style applications, but also for multimedia protocols like VoIP. Results show adequate performance compared to stock protocols. From CCN's content-based architecture we get a lot of benefits for free (e.g. caching is much easier).

Content-centric architectures seem ideally suited to the problems of our current, heavily fragmented social web experience. We still need to figure out the details, though.

See also: camlistore, Content-Addressable Multi-Layer Indexed Storage, a "gitified" database, by the Google frat pack.


Dazed and slightly confused notes on the future of web platforms

First, there's the Locker Project, motto: "I am the platform".

Even though it's by the creator of Jabber, it's not about streaming XML stanzas, thank you very much.

Locker uses JavaScript connectors, that know how to talk to a service to retrieve and sync your stuff (Flickr, Twitter, ...). Think Emacs modes for different services.

Locker seems to run on the server, using node.js, so until everyone of us has their FreedomBox plugged in, this will not really give us freedom.

What if we ran this in the browser, using client-local storage? And then, as Dave Winer would tell us, we push that stuff as RSS, err, Atom feeds, maybe connected using Atom Paging and Archiving, to a dumb cloud store, hopefully free as in freedom? (This is where Content-centric networking comes in, as a simple way to dedup, among other awesomeness.)

Doing the crawling of a user's feeds in the browser has the big benefit that we have cycles to burn and bandwidth to waste there. Scalable following is damn hard, which can be seen from the fact that it's one of the few algorithms that's not provided as a shrink-wrapped, commoditized package. Yet.

The second piece I find wildly interesting is homomorphic encryption. As far as I can tell, the promise of homomorphic encryption re cloud storage is that the cloud store can sort and search encrypted data. I.e., data it can't even look at can still be subjected to the Big Data treatment, aka Google's infinitely-scalable b-tree. (Links: Order-preserving Encryption, Cryptographic Constructions for Secure, Privacy-preserving Distributed Information Sharing.)

Exciting times!

Of course, if we want to do to microdata and the web what Emacs did to plain text and Unix, first we need a real Lisp in the browser. Thankfully, that's exactly what I'm working on at the moment. Stay tuned.