2010-01-21

Looking for the Super-Structure of Information Organization

An incomplete investigation into the be-all/end-all of information organization structures.

Successful Information Organization Structures

File Systems

Classical file systems have directories and files arranged in a tree.

Directories have dual roles: they contain files, and they provide a namespace for files.

File systems are frequently criticized, but containment and namespacing are critical features, so I don't believe they will ever go away.

Wikis

The most prominent feature of wikis is the emphasis on naming and ergonomic linking, that is, links between pages can be created easily.

Some wikis provide backlinks, but for some reason these are usually not featured prominently.

Blogs

Blogs put (reverse) chronology center stage, and have been hugely successful with that simple device.

Outlines

Outlines, like directories, provide containment, but unlike directories, no namespacing.

Outlines are most useful for their graphical properties: collapsing stuff you don't want to see, and expanding stuff you want to see.

Databases

Databases' main features from an organizational perspective are that they usually store complex objects (tuples, documents, ...) and provide sorting by attribute and sometimes more complex queries.

Unlike the other systems, databases are usually of no use for ad-hoc work, and instead require programmers to create a user interface for the stored information.

Tagging Systems

Tagging systems associate keywords with items, and can return all items with one or more keywords.

Tagging systems can be viewed as a special case of search engines that only index terms the user has chosen for indexing, which leads to interesting social effects and good results in many cases (cf. Delicious).

Search Engines


Search engines take in a corpus of unstructured documents, and answer similarly unstructured queries, and usually employ ranking, such as PageRank.

Search engines are different from all the other systems, in that they don't require the user to organize information herself, but rather impose some organization of their own.

Typed Links

Systems that support typed links allow items to be connected arbitrarily with edges, and to follow incoming and outgoing edges from an item.

Spreadsheets (I can't believe I forgot those in the first version!)

Spreadsheets let you put data into a two-dimensional row/column form, and then filter, sort, and otherwise manipulate the data. Spreadsheets also come with formula libraries for doing a lot of different stuff.

Spreadsheets are often abused, but still a major workhorse of information organization.

Common Hybrids

Many attempts have been made to combine one or more of the above structures:

Outline + Database

Many advanced outliners let users add attributes to items, which are displayed in columns.

Wiki + Blog

"Bliki" systems are wikis that usually display a blog on their front page.

File System + Database

An example would be BeFS which indexes user-defined attributes on files.

Search Engine + Database

The goal here is to extend a search engine so that it can also answer queries for attributes of items, and interpret e.g. numeric attributes.

Wiki + Database

Wikis with database functions allow users to add attributes to pages, upon which one then can sort and filter pages.

Anything + Tagging System
Anything + Search Engine

Tagging systems and search engines can easily be added to any other structure.

Can we combine them all?

File System + Wiki + Blog + Outlines + Tags
+ Typed Links + Spreadsheet + Database + Search Engine

Let's get rid of wikis and blogs:

Wiki = File system with only one directory + Simple Linking

Blog = Database query for items, sorted by time

So, if we provide simple linking in the user interface, and keep file system and database functionality, we can drop wikis and blogs from the list.

File System + Outlines + Tags + Typed Links + Database + Spreadsheet + Search Engine

Let's get rid of outlines and tags, shall we?

Outline = Items have typed links to child items

Tagging = Items have typed links to tag items

This means, if we keep typed links, we can drop outlines and tags from the list:

While we're at it, we can also drop spreadsheets from the list, as

Spreadsheet = 2D view of items with attributes

Note that this doesn't cover every (ab)use of spreadsheets, but should do for now.

Super-Structure = File System + Typed Links + Database + Search Engine ???

To be continued...

(Yeah, I know, this isn't super-convincing just yet.)

4 comments:

Fred Blasdel said...

Spotlight absolutely does not index user-defined attributes, and neither do any of its competitors. Spotlight does better by having the courtesy to let you add your own arbitrary attributes, but it only indexes the ones it ships with. In practice, everyone overloads the standard indexed 'comment' attribute with freeform tags.

Relational databases are terrible at this kind of multitenancy, but Graph Databases are perfect — it's too bad the Semantic Web pissants poisoned that well too.

Manuel Simoni said...

Oh, thanks for the correction.

I replaced Spotlight with good old BeFS (both by Dominic Giampaolo), which I indexed user defined attrs.

Craig said...

I was distracted, then rushed writing the following, so it may not make any sense:

Unification for me requires a superposition of states: the ability to represent the spectrum of simple (or primitive) and complex states in a unified whole.

One of my main dislikes of file systems is their lack of a unified extensible namespace that can be composed. Only through composition/decomposition of simple identifiers can superposition and all possible complex states be obtained, and directory or tree containment imposes inhibiting hierarchy that impedes this composition. Which is all good and well if all you need is a centralised storage system for the practical but outdated mode of structured static file or document storage we use today.

Structure you add at the bottom, inevitably limits complexity at the top.

The efficiency of any system involves how much and how often one must dismantle in order to rebuild and thus evolve the system.

Unnecessary hierarchy leads to Towers of Babel and chaos. Which is why I rarely tell this to people, because I like seeing the fallouts. :D

Manuel Simoni said...

Craig, what do you think of Plan 9's union directories? In Plan 9 they obviate the need for symlinks for example.

Generally, being able to union arbitrary directories makes directories a less cumbersome abstraction, in that the hierarchical limitations basically "go away".

I guess I have to plead guilty to just want the "practical but outdated mode of structured static file or document storage"... Once that's taken care of, we can work on the next step.

Like the bit about the fallout -- hehe. ;)