Archive for August 11th, 2010

Tidbit: More technical

In the previous post I covered the ideas behind tidbit. In this post I will try and cover the technical aspects of the tidbit system. Currently the work is very exploratory, so everything may change.

Tidbit record structure

This is a typical tidbit which was generated using the Rhythmbox plugin:

TIDBIT/0.1; libtidbit/0.1; Rhythmbox Tidbit Plugin v0.1
tidbit_userkey==usePzEg4Cl4g1ASdzpssVHtQ1hJJilS+ryiBWjF...
tidbit_table==audio/track
tidbit_created:=1281479640
tidbit_expires:=1313037240
artist==Arcade Fire
title==Keep The Car Running
album:=Indie/Rock Playlist: May (2007)
genre:=Indie
year:=2007
play_count:=34
rating:=0.8
tidbit_signed:=JyJ1fIwhRL5t3y9CACmshm/UibYVhvInxh7XVx4...

The first line is the header. It states the version of the tidbit followed by the user agent. The rest of the record is composed of key-value pairs. The key has a strict format of lower-case letters and underscore. The value can contain any character above 0x1F, and is terminated by a new line. Other characters must be escaped. The first  four pairs are compulsory and they all contain “tidbit_” at the start to distinguish them from normal data. The userkey is a unique(ish) 1024 bit RSA key the user uses to identify themselves and also serves as the public portion of their signing key. It is base64 encoded and in the text above it is clipped but in reality it is over 100 characters long. Table is compulsory field which designates the subject matter. The created and expires values state when the record was created (must be in the past) and when it will expire. Expired records are no longer valid. These are currently using Unix time, but a more general format will be used in the future. This is followed by a number of values specific to the record type. Finally, the record is completed by a signature which signs the body up to that point (also base64 encoded). The signature is generated using the user key which signs an SHA512 hash of the record (up to that line). There is a hard limit of 2KB per record to prevent abuse.

The separation between the key and the value is either ‘==’ or ‘:=’. These signify if to search for that value, or overwrite the value. When inserting a new record, a search is performed for any records which match all the key/value pairs with the ‘==’ separator. These records are discarded as they are overwritten by the new record. To ensure the correct sequence in cases where an old record is re inserted into the database, the created date is checked. This allows a record to be updated by destroying an older version.

Library

A library (libtidbit) handles most of the complexity of creating tidbits, key handling, communicating with databases and performing queries. Keys are stored in a gnome-keyring. There are also python bindings which make creating plugins simple. Here is partial mock-up of an example use in Rhythmbox:

In this plugin, forming tidbits and passing the out is very easy. Presenting the data is the hard part.

Databases

There are several database backends used in tidbit:

  • Memory database is used to cache recently accessed records.
  • Fork database is not a real database but rather a connection to two, which fetches records from the local database to minimise long distance transactions.
  • D-Bus database is a service which allows several applications to share a single cache, and minimise external accesses.
  • HTTP database is the method used for long distance transactions with the global servers.
  • Sqlite database allows cached records to be saved between sessions.

The default database supplied for libtidbit access is a caching fork of a memory database and a D-Bus connection. The D-Bus service wakes up automatically to connect the applications to the global servers.

There are just three database commands at the moment:

  • Insert to push new tidbits into the system
  • Query to ask for tidbit GUIDs which match a query
  • Fetch to get the full record from a GUID

The GUID is actually the signature and is unique(ish) to each record.

Example

Lets do a 2 minute into of how to create and post a tidbit for a fictional TV application. The following should be the same in both C and Python (although C requires types).

Step 1: Get a key

key = tidbit_key_get ("mytv", "MyTV v1.2");

Here we supply the name of out application twice. The first should never change so we pick up the same key each time, and the second is used for the user agent.

Step 2: Get a database

database = tidbit_database_default_new ();

This gets the default database on the system.

Step 3: Create the record

record = tidbit_record_new ("television/episode");

This creates a new record we can put data into. The table name is compulsory so we supply it here.

Step 4: Add the data

tidbit_record_add_element (record, "series_name", "Ugly Betty", TIDBIT_RECORD_ELEMENT_TYPE_KEY);
tidbit_record_add_element (record, "episode_name", "The Butterfly Effect (Part 1)", TIDBIT_RECORD_ELEMENT_TYPE_KEY);
tidbit_record_add_element (record, "rating", "0.6", TIDBIT_RECORD_ELEMENT_TYPE_VALUE);

Note the difference between the key and value entries (as the ‘==’ and ‘:=’ before). We may change our rating later, so that is a value, and so overwrite the records which match on the keys.

Step 5: Sign the record

tidbit_record_sign (record, key);

Once a record is signed, it cannot be altered.

Step 6: Insert it into the database

tidbit_database_insert (database, record);

Step 7: Tidy up

tidbit_record_unref (record);

Now we are finished with this record, we free it. By now, the record is happily on its way around the world.

Development

If you have interests in the semantic web/distributed hashtables, you have an idea for an awesome application, you found a fundamental error or you just want to have a bit of a play, then the source is available.

Tidbit: A global database for exchanging signed tidbits of information

Social everything

Many of us, use a range of range of so-called Web2.0 services.

  • Social bookmarking which enables you to recommend sites as well as tag sites with relevant words to make searching easier.
  • Microblogging services allowing you to inform your friends (and others) of your status, while attaching tags to the message.
  • Systems which note the music you have listened to recently and share that with the community, recommending other music and events.
  • You can declare yourself as going to an event and check if your friends are too.

This is a system which will keep expanding and undoubtedly within a couple years your bike will send out a message to say you are stuck in traffic which warns your friends that you will be late, while telling others to avoid your route. As you take a photo of the space invader mosaic, your phone will ping out the image with its GPS position to an urban art site with the tag of the artist, while informing you that there is another one just round the corner.

Fear of clouds

Great! The future is awesome! Well, not quite. There are several weaknesses to these systems.

  • Each system requires a sign-up. There are solutions like OpenID which make this easier, but generally you cannot use them anonymously very easily.
  • There are multiple providers for each kind of service, so you may have to keep several profiles up to date and post your data to several services.
  • The data is transferred to the service owners so only one company can make use of it. Users are giving this data out for free, and that’s the way they would like to keep it.
  • Services close. If you have built up a massive profile of contributions with millions of followers and the service dies, you are left with nothing. No you can’t take the data and create your own.
  • Competition is stifled. Imagine that you thought of a system like Facebook but better. Who would sign up for that? There is no chance of cooperation between companies to allow new competitors.
  • It is difficult to queue up data when not connected to the internet. You have to wait till you get home to write a review of that restaurant in Thailand which does great tofu.

So, this “Tidbit” thing?

The principle is pretty simple. You don’t send your data directly to the service provides, but to a distributed open database. Each piece of information is a “tidbit”. Anyone can post, read and search for these tidbits. If you wish to provide a service, you read the tidbits that are of interest to you. No one gets to keep a monopoly on the data and everyone has the opportunity to to use the data to make new inspired products.

Anatomy of a tidbit

Each tidbit contains:

  • Your username. The username is actually your public signing key. You can generate a new one whenever you like and is completely private (unless you reveal your identity to someone).
  • The date the tidbit was created and when it should expire. Most data becomes irrelevant after a year so that is the default unless you set it to be longer.
  • The table the data belongs in. For example “audio/track” would be talking about an audio track you have listened to.
  • A set of key value pairs which hold the data you wish to tell the world. There is no fixed structure so your tidbit can contain fields which will be ignored by some applications.
  • A signature to make sure it was you that generated that tidbit. It is impossible to adjust the data without damaging the signature, so no one can spoof as you.

You can’t trust this

Stop! Reality time! This is bound to be abused by spammers, robots etc, just like the current services, but worse. I can’t trust anyone.

On top of this system, you can extend a web of trust. You can post a tidbit stating your trust of someone. Say you only fully trust the 10 people you know, but they trust 10 more and so on. You might only trust an individual a little (since they are several friends away), but if you combine a whole group of people you trust a bit, you get a fairly sensible picture. You can also partly trust someone who you have only a little confidence in due to information they posted, and perhaps only for some kinds of information (music taste only). Producers of original content are thus rewarded with respect of their audience, while building a network that gives people confidence in the data.

I want my privacy

Privacy is at the core of the system. You may choose to only reveal your username to your friends. Only they will know who you are. All applications work with a different auto-generated username, so unless you manually set your movie watching application to use the same username as your dating profile, you essentially remain as two different people. Obviously, all data you post is open for anyone to read, so posting personal information is a bad idea. This is not a system which sensibly replaces private social networks.

Let’s get technical

The next post will be somewhat more technical and explain the system in glorious geeky detail. There is a git repository you can take a look at and if you have questions there is a room #tidbit on irc.freenode.net, or leave a comment or email me.