Archive for the ‘ Linux ’ Category

Valgrind T-shirt

This morning I received an unexpected delivery. Someone bought me a Valgrind T-shirt. The bill doesn’t say who. I do love Valgrind but only a handful of people know that my obsession extends to the promise that I will name my first-born after it (Valgrind Brej may get bullied somewhat). To whoever it was that bought it, I would like to say Thank You! I shall wear it with pride.

UPDATE: Mystery solved. Matt Horsnell was offered it for spotting and fixing a bug and he knew about my secret love.

Tidbit: More technical

In the previous post I covered the ideas behind tidbit. In this post I will try and cover the technical aspects of the tidbit system. Currently the work is very exploratory, so everything may change.

Tidbit record structure

This is a typical tidbit which was generated using the Rhythmbox plugin:

TIDBIT/0.1; libtidbit/0.1; Rhythmbox Tidbit Plugin v0.1
artist==Arcade Fire
title==Keep The Car Running
album:=Indie/Rock Playlist: May (2007)

The first line is the header. It states the version of the tidbit followed by the user agent. The rest of the record is composed of key-value pairs. The key has a strict format of lower-case letters and underscore. The value can contain any character above 0x1F, and is terminated by a new line. Other characters must be escaped. The first  four pairs are compulsory and they all contain “tidbit_” at the start to distinguish them from normal data. The userkey is a unique(ish) 1024 bit RSA key the user uses to identify themselves and also serves as the public portion of their signing key. It is base64 encoded and in the text above it is clipped but in reality it is over 100 characters long. Table is compulsory field which designates the subject matter. The created and expires values state when the record was created (must be in the past) and when it will expire. Expired records are no longer valid. These are currently using Unix time, but a more general format will be used in the future. This is followed by a number of values specific to the record type. Finally, the record is completed by a signature which signs the body up to that point (also base64 encoded). The signature is generated using the user key which signs an SHA512 hash of the record (up to that line). There is a hard limit of 2KB per record to prevent abuse.

The separation between the key and the value is either ‘==’ or ‘:=’. These signify if to search for that value, or overwrite the value. When inserting a new record, a search is performed for any records which match all the key/value pairs with the ‘==’ separator. These records are discarded as they are overwritten by the new record. To ensure the correct sequence in cases where an old record is re inserted into the database, the created date is checked. This allows a record to be updated by destroying an older version.


A library (libtidbit) handles most of the complexity of creating tidbits, key handling, communicating with databases and performing queries. Keys are stored in a gnome-keyring. There are also python bindings which make creating plugins simple. Here is partial mock-up of an example use in Rhythmbox:

In this plugin, forming tidbits and passing the out is very easy. Presenting the data is the hard part.


There are several database backends used in tidbit:

  • Memory database is used to cache recently accessed records.
  • Fork database is not a real database but rather a connection to two, which fetches records from the local database to minimise long distance transactions.
  • D-Bus database is a service which allows several applications to share a single cache, and minimise external accesses.
  • HTTP database is the method used for long distance transactions with the global servers.
  • Sqlite database allows cached records to be saved between sessions.

The default database supplied for libtidbit access is a caching fork of a memory database and a D-Bus connection. The D-Bus service wakes up automatically to connect the applications to the global servers.

There are just three database commands at the moment:

  • Insert to push new tidbits into the system
  • Query to ask for tidbit GUIDs which match a query
  • Fetch to get the full record from a GUID

The GUID is actually the signature and is unique(ish) to each record.


Lets do a 2 minute into of how to create and post a tidbit for a fictional TV application. The following should be the same in both C and Python (although C requires types).

Step 1: Get a key

key = tidbit_key_get ("mytv", "MyTV v1.2");

Here we supply the name of out application twice. The first should never change so we pick up the same key each time, and the second is used for the user agent.

Step 2: Get a database

database = tidbit_database_default_new ();

This gets the default database on the system.

Step 3: Create the record

record = tidbit_record_new ("television/episode");

This creates a new record we can put data into. The table name is compulsory so we supply it here.

Step 4: Add the data

tidbit_record_add_element (record, "series_name", "Ugly Betty", TIDBIT_RECORD_ELEMENT_TYPE_KEY);
tidbit_record_add_element (record, "episode_name", "The Butterfly Effect (Part 1)", TIDBIT_RECORD_ELEMENT_TYPE_KEY);
tidbit_record_add_element (record, "rating", "0.6", TIDBIT_RECORD_ELEMENT_TYPE_VALUE);

Note the difference between the key and value entries (as the ‘==’ and ‘:=’ before). We may change our rating later, so that is a value, and so overwrite the records which match on the keys.

Step 5: Sign the record

tidbit_record_sign (record, key);

Once a record is signed, it cannot be altered.

Step 6: Insert it into the database

tidbit_database_insert (database, record);

Step 7: Tidy up

tidbit_record_unref (record);

Now we are finished with this record, we free it. By now, the record is happily on its way around the world.


If you have interests in the semantic web/distributed hashtables, you have an idea for an awesome application, you found a fundamental error or you just want to have a bit of a play, then the source is available.

Tidbit: A global database for exchanging signed tidbits of information

Social everything

Many of us, use a range of range of so-called Web2.0 services.

  • Social bookmarking which enables you to recommend sites as well as tag sites with relevant words to make searching easier.
  • Microblogging services allowing you to inform your friends (and others) of your status, while attaching tags to the message.
  • Systems which note the music you have listened to recently and share that with the community, recommending other music and events.
  • You can declare yourself as going to an event and check if your friends are too.

This is a system which will keep expanding and undoubtedly within a couple years your bike will send out a message to say you are stuck in traffic which warns your friends that you will be late, while telling others to avoid your route. As you take a photo of the space invader mosaic, your phone will ping out the image with its GPS position to an urban art site with the tag of the artist, while informing you that there is another one just round the corner.

Fear of clouds

Great! The future is awesome! Well, not quite. There are several weaknesses to these systems.

  • Each system requires a sign-up. There are solutions like OpenID which make this easier, but generally you cannot use them anonymously very easily.
  • There are multiple providers for each kind of service, so you may have to keep several profiles up to date and post your data to several services.
  • The data is transferred to the service owners so only one company can make use of it. Users are giving this data out for free, and that’s the way they would like to keep it.
  • Services close. If you have built up a massive profile of contributions with millions of followers and the service dies, you are left with nothing. No you can’t take the data and create your own.
  • Competition is stifled. Imagine that you thought of a system like Facebook but better. Who would sign up for that? There is no chance of cooperation between companies to allow new competitors.
  • It is difficult to queue up data when not connected to the internet. You have to wait till you get home to write a review of that restaurant in Thailand which does great tofu.

So, this “Tidbit” thing?

The principle is pretty simple. You don’t send your data directly to the service provides, but to a distributed open database. Each piece of information is a “tidbit”. Anyone can post, read and search for these tidbits. If you wish to provide a service, you read the tidbits that are of interest to you. No one gets to keep a monopoly on the data and everyone has the opportunity to to use the data to make new inspired products.

Anatomy of a tidbit

Each tidbit contains:

  • Your username. The username is actually your public signing key. You can generate a new one whenever you like and is completely private (unless you reveal your identity to someone).
  • The date the tidbit was created and when it should expire. Most data becomes irrelevant after a year so that is the default unless you set it to be longer.
  • The table the data belongs in. For example “audio/track” would be talking about an audio track you have listened to.
  • A set of key value pairs which hold the data you wish to tell the world. There is no fixed structure so your tidbit can contain fields which will be ignored by some applications.
  • A signature to make sure it was you that generated that tidbit. It is impossible to adjust the data without damaging the signature, so no one can spoof as you.

You can’t trust this

Stop! Reality time! This is bound to be abused by spammers, robots etc, just like the current services, but worse. I can’t trust anyone.

On top of this system, you can extend a web of trust. You can post a tidbit stating your trust of someone. Say you only fully trust the 10 people you know, but they trust 10 more and so on. You might only trust an individual a little (since they are several friends away), but if you combine a whole group of people you trust a bit, you get a fairly sensible picture. You can also partly trust someone who you have only a little confidence in due to information they posted, and perhaps only for some kinds of information (music taste only). Producers of original content are thus rewarded with respect of their audience, while building a network that gives people confidence in the data.

I want my privacy

Privacy is at the core of the system. You may choose to only reveal your username to your friends. Only they will know who you are. All applications work with a different auto-generated username, so unless you manually set your movie watching application to use the same username as your dating profile, you essentially remain as two different people. Obviously, all data you post is open for anyone to read, so posting personal information is a bad idea. This is not a system which sensibly replaces private social networks.

Let’s get technical

The next post will be somewhat more technical and explain the system in glorious geeky detail. There is a git repository you can take a look at and if you have questions there is a room #tidbit on, or leave a comment or email me.

Migrating to Google apps

I have always hosted my own mail server on my home machine, upon which I have become more and more reliant, over the years. But one thing always worried me a little. I have a dynamic IP address, changes in which are tested for every 5 minutes and the DNS entry is automatically updated (thanks to the guys at So at most there is a five minute window during which I cannot reach my home computer, and neither can my mail. Mail servers have an automatic cooloff and retry system, so if they cannot contact the target system, they will retry a few minutes, hours and eventually days later. This is great because an email simply doesn’t disappear if a server goes down for a while. You can always tell if someone is lying if they say that the email they sent must have got lost (never seen this happen). But what can happen, in my case, is the mail server attempts to connect to some random machine assigned by old IP address. Luckily these rarely run mail-servers themselves so nothing bad usually happens, but never the less there is always a chance and I like to sleep at night knowing it is all fine.

Google Apps

I was a little put off by the thought of someone else running my mail server because I was scared that many of the options I relied on were not going to be there, and secondly, I am always afraid that I will get lazy, stop being able to manage things like mail servers, get a mac and consume my own brain.

I already have a gmail account, but you cannot deliver the mail of an entire domain to gmail account. Instead, you can use Google Apps. These are designed for businesses and organisations, but for individual (and small group) use it is free. You have to control the domain DNS entries. Firstly you have to prove that you have control over the domain, before the setup is allowed to begin. Setting up a single user and selecting all uncaught mail to go to that account was fairly straightforward. Once that is set up, you can flip the switch and point your MX entries to the gmail servers. The site has a guide including images of how to do this with most domain providers, although the images are very blurry (not sure why). Then it is a case of waiting a couple hours while the DNS caches are refilled and your mail starts trickling to the new server. IMAP is fairly easy to set up. There is a folder called [Gmail] that holds the normal set of default folders, so in thunderbird (or any other mail client), you have to set the drafts and sent to point to those. There is no support for nested folders, which is a shame, but the folders themselves are just representations of tags, so it may not make as much sense. The biggest job is setting up the filters.

I have a set of procmail filters which made prioritising very simple. To replicate this in Gmail, it took a few more filters. Most mailing lists make this easy by adding a list name to the headers and gmail recognises these ans suggests the right filter. What I didn’t realise at first is that filters can have reasonably powerful logical expressions, but you have to use the rather generic sounding “has words” filter. There is an implicit AND between the different filters, so using this field is the only way of getting an OR between a subject and a from field e.g. from:( OR subject:(“[foo]” OR “bar”).

One annoyance is that it matches on whole words (with underscore being a valid letter) and I still haven’t found a method of  matching “CS_Newsletter_2010” but with any number at the end. The second annoyance is the outgoing SMTP “corrects” your from address to be the one of the account you logged in with. This is annoying as I like to send from different addresses, but I guess I can still keep my home sendmail setup for that, or create an account (with a forwarding to the master account) for each outgoing address. (UPDATE) Actually it couldn’t be easier. You have to validate that indeed an email address belongs to you by entering a code sent to it. It works with addresses of other domains too.

TexMex evening

Sorry about the delay but, finally, here are some photos from the TexMex evening.

You know it is going to be a silly night when your drinks acquire worms from the very first bottle.

Will proudly placed himself in charge of making the margaritas. These were incredibly strong (and personally quite horrid).  Strange that we managed to get though three bottles of tequila, yet we still had plenty of limes. I suspect Will was not sticking to the correct measures.

But still he managed to find a steady stream of willing victims.

And the there was the Tux piñata.

Tux will be remembered for his bravery in the revolutionary cause (and for sharing his sweets).

But the point of the night was the food. Lots of it. This is just one of the many bowls of salsa I spent four hours chopping.

This is only about half the food items. Shame I have no photos of the table when full as it was literally brimming with food. This was the first course of wraps and tacos.

This was followed by chili con carne (two types), expertly carried by Mai Anh (who also deserves thanks for helping me make the guacamole too).

All together there were 35 people there which is a personal record. I even invited some of the better students round to try and bully them into doing something amazing over the summer.

Here is a misbehaving pair of banditos.

Sadly this was a photo taken while I was carrying Tux to the bin for his un-ceremonial funeral.

The brave little lappy managed to play Mariachi music for some 5 hours without dying (note the Dynamplifier).

And the final course was the nachos, which were indoors as it was very dark outside by that hour. Because we run out of salsa I (foolishly considering the drinks Will forced me to have) decided to chop up some more. Thanks to John for taking that job over while I tried to stem the bleeding.

Tux piñata

Following the success of the Indian night. I am hosting a TexMex party.As the party invitation points out “I have never been to Mexico, but I have been to Texico and I have watched a lot of Speedy Gonzales, so I imagine it is a bit like that”. So apparently one thing people have at Mexican parties is a piñata. I have never seen a piñata in real life so this is completely guess work as to how to make it.

The body is make of papier-mâché. I was hoping to a baloon the exact right shape, but instead I had to go for a large balloon for the body, and a second balloon for the head. I covered the body balloon from all sides but the base, then turned it upside down and placed the smaller balloon on top and started placing more and more paper strips to stick the two baloons together. You really need three hands for this task.

After the first layer, I let it dry in front of a fan for a couple hours before adding the second layer including a beak make of card. There were 3 layers all together. I used the flour water glue mixture, of which the second batch worked a lot better as it was a bit thicker. This is the end of day one, as it then takes about 24 hours to dry completely.

Then it is onto the crate paper. I found the easiest way was to get a full folded roll of the paper, cut it lengthwise into two and add cuts to make the loops. Then draw a line of liquid paper glue and stick the strip to it.

Work from the bottom up, otherwise each strip gets in the way of the last. Also I kept some areas fur free, in the bottom and the face. Here I glued a single layer of black crate paper. For the beak, that needed about three layers to not show the text under it.

At this point, I did the surgery to add fill the penguin and attach the rope. I was worried that the rope would just rip the head apart, so I tied it to a pencil and fed it though a hole in a CD. That distributed the force around a ring in the head. At one point the back caved in a little, but with the weight of the sweets inside, it was possible to push it back out.

Then finally, attach the wings and feet. I punched two holes in the body and the wing/foot and used coloured cable ties. These work very well as you can trim them off.

Add a decorated hitting stick, and there you go, a Tux piñata. I’ll shall see tomorrow if it works.

UPDATE: Pictures and video of the party including Tux.

Gstreamer to the rescue

A week ago, I sent my laptop and the Utopium test-board and samples, with Will and Andrew to the Async symposium. This was to present the chips at the demo session. I didn’t go but I was planning to control the laptop remotely from the office and just needed Andrew to connect everything up on the demo. I configured the laptop to auto-login to the special user who had all the necessary programs on the desktop. Most importantly, that user had desktop sharing turned on. In case you didn’t know, this uses VNC screen 0 on port 5900. When you connect to WiFi spots, you don’t tend to get web visible IP addresses, so I also added a “phone home” button which would ssh to my work desktop machine and forward a port to the desktop VNC port. If you have set your ssh keys correctly, it won’t ask for a password. For voice and video I set up an Ekiga account so I could call the laptop.

What I didn’t expect is for just about every outgoing port under the sun to be filtered. This was rather scary because no one could contact me as all instant messenger ports were blocked. Eventually, Andrew managed to get a word though to tell me. The most useful port to get through is ssh (port 22). If you can ssh out, you can do pretty much anything. Port forwarding has become much easier nowadays with the GUI tools. If you compare setting up some forwarding in the GUI

to the equivalent in the iptables rules

-A PREROUTING -i eth+ -p tcp --dport 5920 -j DNAT --to-destination :5900
-A PREROUTING -i eth+ -p tcp --dport 995 -j DNAT --to-destination :22

you can see how easy we have it now. So nice and customisable, I translated my custom iptables rules files, for the group machines, into the GUI ones. So at this point, Andrew could ssh out by adding “-o Port=995” to the ssh line.

But this is only half the solution, as I could only get a couple ports through, with limited time (I had some 20 minutes before the demo to fix everything), there was little chance of getting Ekiga working. Instead I quickly created a poor person’s internet telephony solution using Gstreamer. First you need to establish an ssh connection forwarding one port in each direction. I used 5000 and 5001 and it makes things simpler if you do the port swap in the ssh pipe like so:

ssh -L5000:localhost:5001 -R5000:localhost:5001 remote_machine.domain

This transfers connections to local port 5000 to the remote port 5001 (and vice-versa). So the servers should sit on port 5001 and clients should connect to port 5000. To create a server run:

gst-launch -v alsasrc ! audio/x-raw-int,rate=16000,channels=1 ! audioconvert ! speexenc ! tcpserversink host= port=5001

One server is needed at each end (assuming you wish to talk both ways). Only once the server has been created can a client connect to it using:

gst-launch -v  tcpclientsrc host= port=5000 ! speexdec ! alsasink sync=false

And there you have two way audio communication. It is possible to disconnect and reconnect to the server. Also, I noticed that when there was a complete network cut-out (happened several times) the audio would cut out, but then resynchronise and return.

You can do the same with video by swap forwarding ports 5002 and 5003 and running the server:

gst-launch -v v4l2src ! video/x-raw-yuv,framerate=\(fraction\)5/1 ! smokeenc threshold=1000 ! tcpserversink host= port=5003

and the client:

gst-launch -v tcpclientsrc host= port=5002 ! smokedec ! xvimagesink sync=false

This uses “smoke”, which is like motion JPEG but detects where there were no changes between frames.

So with that in place, the demo went quite smoothly with some people happily asking questions to the laptop and others being a bit surprised at voices coming from nowhere.

So, thank you Gstreamer! You saved my demo.

Utopiums are back

After months being manufactured, the Utopiums are back! I will explain more about what they are in another post, but for now here are some photos.

Here are the packaged chips (20 of).

They also send you the remaining unpackaged dies. These have an excellent ability of confusing the camera’s auto focus.

The full die is 5mm by 3mm.

And this is what they look like under a microscope. They do get dirty very quickly when exposed to a dusty room.

On the bottom right of the chip logo are the thank-yous. The Tux and the Fedora logo are about 0.5 mm tall (perhaps the smallest ever?). You can see the diffraction grating giving a nice secondary colour.

At different angles, they look very different.

And here is a wise comment left by the one of the Async symposium reviewers.

I am still testing the beast, but it does work. It has executed a number of programs and the wagging slices do become by-passable. The biggest worry was the reset as that is quite complicated, but it seems fine. I will open source the design and the tool set some time next month.

Falling blocks game in Plymouth

So, you have sat down at your computer and you’re waiting for it to boot, then suddenly you realise that it is doing a full fsck which is going to take a few minutes. What to do. You have two options:

  1. Sit quietly watching the little bar move slowly across
  2. Plymouth falling blocks game!

This is not a serious proposal, I just wanted to exercise the scripting system to see if I could find any bugs, but if you want to have a play with it, the script is available.

Fedora on USB sticks

I ordered some USB sticks to give away to the better students to encourage them to contribute to open source software. The idea is that they can run their own installation where they can install development libraries etc. I’ll write more about this in a few weeks when I know how successful this has been.

Installing Fedora on the disks is relatively easy. Nowadays I install computers using a USB drive, by simply DDing the iso directly to the device.

dd if=Fedora-12-i686-Live.iso of=/dev/sdb

The target USB stick will look just like any other hard drive. You just have to make sure you install the bootloader onto the target stick by overriding the BIOS boot order in the grub installation screen.

Once installed, I didn’t want to actually boot the device as I wanted the students to go through the first boot process of setting up a their own user. But I wanted to install some development packages and do a full system update. This can be done by mounting the device, chrooting and running yum commands. The live image has a /mnt/sysimage which is already set up to do something like this by already having /proc and /dev correctly set up.

mount /dev/sdb1 /mnt/sysimage
chroot /mnt/sysimage

The biggest issue with running from USB sticks is that they have no on device cache, thus each fsync command takes absolutely ages. Yum, correctly, makes heavy use of fsync to make sure it leaves the system in a sensible state even if interrupted. To speed things up I tried libeatmydata, which worked surprisingly well. I updated the installation several times faster. LibEatMyData is named thusly because of it’s real ability to screw things up royally, but in this situation if anything went wrong, I could just restart. Maybe some yum devels could mention if this is outright dangerous, or a fairly safe trick if you can guarantee no interruptions.

Of cause at this point I only have one stick installed, and making six this way is out down right boring. So long as the other disks are the same size (or larger), you can clone the disks from one to another. You need a bit of storage space so best to do this from another machine.

dd if=/dev/sdb of=master_image
dd if=master_image if=/dev/sdc
dd if=master_image if=/dev/sdd
dd if=master_image if=/dev/sde

Watch out though if you use this method, all the partitions will gain the same UUID, which will confuse the system when more than one is plugged into a single machine.

The postage costs are annoying so I went with, who offer free postage (which is nice). What was ridiculous is that they post each item in a separate box which is way too big. For a tiny piece of plastic, there is a Kingston presentation box, each placed in its own massive cardboard box and posted separately. I hear this is because they have some kind of tax loophole where parcels of value below some threshold are not taxed.

The entire CS department has be refitted with awful Dell machines which have some screwy USB chipsets which allow booting off a memory stick only from the back ports on some manufacturers. I did something really stupid by accidentally mentioning to duty-office that it was possible to boot the departmental machines off a USB device. Now they are now going to go through and disable this feature (Grrr).