Valgrind T-shirt

This morning I received an unexpected delivery. Someone bought me a Valgrind T-shirt. The bill doesn’t say who. I do love Valgrind but only a handful of people know that my obsession extends to the promise that I will name my first-born after it (Valgrind Brej may get bullied somewhat). To whoever it was that bought it, I would like to say Thank You! I shall wear it with pride.

UPDATE: Mystery solved. Matt Horsnell was offered it for spotting and fixing a bug and he knew about my secret love.

Coding discipline

Over years of programming I have learned how to be most productive, but I found it strange we do not teach this to the students. We teach obscure features of academic systems they will never see again, but we fail to teach how to tackle large projects. Course lab tasks usually consist of writing around 30 lines of code, which doesn’t expose the students to the challenges of large code bases. Here are some things I tried to teach this year in COMP20252.

The one hour cycle

I code in cycles. During these I try to disconnect as much from environmental distractions as possible (headphones). These cycles are roughly one hour long, including a 5 minute break at the end to commit the code to the repository, have a biscuit and a stretch. There are three reasons for this and I think it makes sense for students to learn this as a sensible method of being productive.

Set-up and burnout

When starting a new change on the code base, I spend about five to ten minutes trying to locate all the areas that will need to be changes and generally planning how to tackle the problem. After about 15 minutes, I am in full swing as I have everything I need to know, cached in my head. At this point I can code for hours straight, but generally I target to finish after a further half an hour and move onto testing and debugging. The reason for stopping so early is because there is a good chance you will burn out (become tired and clumsy).

The first ten minutes are not productive, yet most students will only work for 10 minutes at a time before updating their status to “Bored now” and having a chat with someone before starting again from the beginning.

Testing and debugging

In half an hour of coding you will probably write about 50 lines scattered between several files. The rule of thumb is there being one bug every ten lines of code. So, you now have five bugs to find in 50 lines of code. Testing while you still remember what you may have broken is very easy compared to testing something that was written by someone else, or by you but months ago. Finding the bugs, knowing they are somewhere in the small blocks code you have just written, gives you a massive locating boost. This is the main reason why you should never abruptly walk away from the computer without doing some testing. Test and debug will hopefully take five or ten minutes, but can often take an hour. If it does take an hour you will be grateful that you didn’t carry on coding until you were exhausted.

There is one even greater sin than that of walking away from a computer leaving untested code, and that is leaving the code in a broken state. Each programming task involves breaking an already working piece of code in order to add functionality, change the behaviour etc. In the set-up you will probably want to examine the current behaviour of the system to ascertain the areas that need to be changed. This is very difficult if the system doesn’t work.

Students rarely do any testing as they never see their code coming back to them. After writing something and finding that it broke the system, they would generally walk away hoping the bug fix itself in their absence. The next week they would ask a demonstrator to fix it for them while saying “I can’t remember what I did”.  Their original buggy code coming back week after week scared many students in COMP20252. In the feedback forms, that was one of their main criticisms. I say “GOOD! Be afraid. Very afraid”.

Divide and conquer

Forcing yourself to have a working system every hour partitions large tasks down to sensible sized components. Many of the students, from the start, wanted to create a large system involving several ambitious components. They would begin by writing a massive monolithic block containing all the features expecting that, once you write that last line of code, the program will work. This is bad on two counts. Firstly, it won’t work due to bug problems outlined above, and secondly there is no way you are going to keep so much state in you head. Even if you can keep track of the state of every variable you’ve used, all the possible input combinations and every possible error that could arise, you are making the job unnecessarily hard for yourself and for anyone else reading your code.

There is one more bonus reason of why you want to return to working code sooner rather then later, and that is the frequency of random unblockable interrupts. I receive a “stop what you’re doing and help me with this” request several times a day and with small changes it is still possible to revert the changes (infinite undo in nedit) and play them back to remind myself as to what I was doing. If I do have to revert and start again, then I have wasted at most half an hour, but at least I know what I am doing the second time round.

Tidbit: More technical

In the previous post I covered the ideas behind tidbit. In this post I will try and cover the technical aspects of the tidbit system. Currently the work is very exploratory, so everything may change.

Tidbit record structure

This is a typical tidbit which was generated using the Rhythmbox plugin:

TIDBIT/0.1; libtidbit/0.1; Rhythmbox Tidbit Plugin v0.1
tidbit_userkey==usePzEg4Cl4g1ASdzpssVHtQ1hJJilS+ryiBWjF...
tidbit_table==audio/track
tidbit_created:=1281479640
tidbit_expires:=1313037240
artist==Arcade Fire
title==Keep The Car Running
album:=Indie/Rock Playlist: May (2007)
genre:=Indie
year:=2007
play_count:=34
rating:=0.8
tidbit_signed:=JyJ1fIwhRL5t3y9CACmshm/UibYVhvInxh7XVx4...

The first line is the header. It states the version of the tidbit followed by the user agent. The rest of the record is composed of key-value pairs. The key has a strict format of lower-case letters and underscore. The value can contain any character above 0x1F, and is terminated by a new line. Other characters must be escaped. The first  four pairs are compulsory and they all contain “tidbit_” at the start to distinguish them from normal data. The userkey is a unique(ish) 1024 bit RSA key the user uses to identify themselves and also serves as the public portion of their signing key. It is base64 encoded and in the text above it is clipped but in reality it is over 100 characters long. Table is compulsory field which designates the subject matter. The created and expires values state when the record was created (must be in the past) and when it will expire. Expired records are no longer valid. These are currently using Unix time, but a more general format will be used in the future. This is followed by a number of values specific to the record type. Finally, the record is completed by a signature which signs the body up to that point (also base64 encoded). The signature is generated using the user key which signs an SHA512 hash of the record (up to that line). There is a hard limit of 2KB per record to prevent abuse.

The separation between the key and the value is either ‘==’ or ‘:=’. These signify if to search for that value, or overwrite the value. When inserting a new record, a search is performed for any records which match all the key/value pairs with the ‘==’ separator. These records are discarded as they are overwritten by the new record. To ensure the correct sequence in cases where an old record is re inserted into the database, the created date is checked. This allows a record to be updated by destroying an older version.

Library

A library (libtidbit) handles most of the complexity of creating tidbits, key handling, communicating with databases and performing queries. Keys are stored in a gnome-keyring. There are also python bindings which make creating plugins simple. Here is partial mock-up of an example use in Rhythmbox:

In this plugin, forming tidbits and passing the out is very easy. Presenting the data is the hard part.

Databases

There are several database backends used in tidbit:

  • Memory database is used to cache recently accessed records.
  • Fork database is not a real database but rather a connection to two, which fetches records from the local database to minimise long distance transactions.
  • D-Bus database is a service which allows several applications to share a single cache, and minimise external accesses.
  • HTTP database is the method used for long distance transactions with the global servers.
  • Sqlite database allows cached records to be saved between sessions.

The default database supplied for libtidbit access is a caching fork of a memory database and a D-Bus connection. The D-Bus service wakes up automatically to connect the applications to the global servers.

There are just three database commands at the moment:

  • Insert to push new tidbits into the system
  • Query to ask for tidbit GUIDs which match a query
  • Fetch to get the full record from a GUID

The GUID is actually the signature and is unique(ish) to each record.

Example

Lets do a 2 minute into of how to create and post a tidbit for a fictional TV application. The following should be the same in both C and Python (although C requires types).

Step 1: Get a key

key = tidbit_key_get ("mytv", "MyTV v1.2");

Here we supply the name of out application twice. The first should never change so we pick up the same key each time, and the second is used for the user agent.

Step 2: Get a database

database = tidbit_database_default_new ();

This gets the default database on the system.

Step 3: Create the record

record = tidbit_record_new ("television/episode");

This creates a new record we can put data into. The table name is compulsory so we supply it here.

Step 4: Add the data

tidbit_record_add_element (record, "series_name", "Ugly Betty", TIDBIT_RECORD_ELEMENT_TYPE_KEY);
tidbit_record_add_element (record, "episode_name", "The Butterfly Effect (Part 1)", TIDBIT_RECORD_ELEMENT_TYPE_KEY);
tidbit_record_add_element (record, "rating", "0.6", TIDBIT_RECORD_ELEMENT_TYPE_VALUE);

Note the difference between the key and value entries (as the ‘==’ and ‘:=’ before). We may change our rating later, so that is a value, and so overwrite the records which match on the keys.

Step 5: Sign the record

tidbit_record_sign (record, key);

Once a record is signed, it cannot be altered.

Step 6: Insert it into the database

tidbit_database_insert (database, record);

Step 7: Tidy up

tidbit_record_unref (record);

Now we are finished with this record, we free it. By now, the record is happily on its way around the world.

Development

If you have interests in the semantic web/distributed hashtables, you have an idea for an awesome application, you found a fundamental error or you just want to have a bit of a play, then the source is available.

Tidbit: A global database for exchanging signed tidbits of information

Social everything

Many of us, use a range of range of so-called Web2.0 services.

  • Social bookmarking which enables you to recommend sites as well as tag sites with relevant words to make searching easier.
  • Microblogging services allowing you to inform your friends (and others) of your status, while attaching tags to the message.
  • Systems which note the music you have listened to recently and share that with the community, recommending other music and events.
  • You can declare yourself as going to an event and check if your friends are too.

This is a system which will keep expanding and undoubtedly within a couple years your bike will send out a message to say you are stuck in traffic which warns your friends that you will be late, while telling others to avoid your route. As you take a photo of the space invader mosaic, your phone will ping out the image with its GPS position to an urban art site with the tag of the artist, while informing you that there is another one just round the corner.

Fear of clouds

Great! The future is awesome! Well, not quite. There are several weaknesses to these systems.

  • Each system requires a sign-up. There are solutions like OpenID which make this easier, but generally you cannot use them anonymously very easily.
  • There are multiple providers for each kind of service, so you may have to keep several profiles up to date and post your data to several services.
  • The data is transferred to the service owners so only one company can make use of it. Users are giving this data out for free, and that’s the way they would like to keep it.
  • Services close. If you have built up a massive profile of contributions with millions of followers and the service dies, you are left with nothing. No you can’t take the data and create your own.
  • Competition is stifled. Imagine that you thought of a system like Facebook but better. Who would sign up for that? There is no chance of cooperation between companies to allow new competitors.
  • It is difficult to queue up data when not connected to the internet. You have to wait till you get home to write a review of that restaurant in Thailand which does great tofu.

So, this “Tidbit” thing?

The principle is pretty simple. You don’t send your data directly to the service provides, but to a distributed open database. Each piece of information is a “tidbit”. Anyone can post, read and search for these tidbits. If you wish to provide a service, you read the tidbits that are of interest to you. No one gets to keep a monopoly on the data and everyone has the opportunity to to use the data to make new inspired products.

Anatomy of a tidbit

Each tidbit contains:

  • Your username. The username is actually your public signing key. You can generate a new one whenever you like and is completely private (unless you reveal your identity to someone).
  • The date the tidbit was created and when it should expire. Most data becomes irrelevant after a year so that is the default unless you set it to be longer.
  • The table the data belongs in. For example “audio/track” would be talking about an audio track you have listened to.
  • A set of key value pairs which hold the data you wish to tell the world. There is no fixed structure so your tidbit can contain fields which will be ignored by some applications.
  • A signature to make sure it was you that generated that tidbit. It is impossible to adjust the data without damaging the signature, so no one can spoof as you.

You can’t trust this

Stop! Reality time! This is bound to be abused by spammers, robots etc, just like the current services, but worse. I can’t trust anyone.

On top of this system, you can extend a web of trust. You can post a tidbit stating your trust of someone. Say you only fully trust the 10 people you know, but they trust 10 more and so on. You might only trust an individual a little (since they are several friends away), but if you combine a whole group of people you trust a bit, you get a fairly sensible picture. You can also partly trust someone who you have only a little confidence in due to information they posted, and perhaps only for some kinds of information (music taste only). Producers of original content are thus rewarded with respect of their audience, while building a network that gives people confidence in the data.

I want my privacy

Privacy is at the core of the system. You may choose to only reveal your username to your friends. Only they will know who you are. All applications work with a different auto-generated username, so unless you manually set your movie watching application to use the same username as your dating profile, you essentially remain as two different people. Obviously, all data you post is open for anyone to read, so posting personal information is a bad idea. This is not a system which sensibly replaces private social networks.

Let’s get technical

The next post will be somewhat more technical and explain the system in glorious geeky detail. There is a git repository you can take a look at and if you have questions there is a room #tidbit on irc.freenode.net, or leave a comment or email me.

Slanted monitors

At my desk I have two 20″ 1600×1200 (4:3) monitors which I got rather used to. Unfortunately the hard-drive in my machine failed and I have been waiting over two weeks for a replacement (not sure if it is the university or MicroDirect being useless). Normally I would buy one and claim it back and the problem would be solved within an hour, but the new university austerity measures forbid this. Instead, I have moved onto Christian’s desk and am experiencing his two wide-screen monitor setup. It feels a bit weird having pages which are very wide, but not very tall. Having one of the monitors vertical is equally creepy. So I came up with a compromise. Have the monitors slanted by ten degrees. This makes them both taller and wider.

If you wish to try this yourself, just browse through this slanted frame page and turn your monitors ten degrees to the right. Press F11 for full screen mode to make it look believable. This should work in Firefox and Webkit based browsers.

ACSD in Braga

We are working very hard at the ACSD converence in Braga.

But because Doug and Will are involved, beer is never far away.

This is Will’s attempt of reproducing the Isle of Man symbol

Several acts of sillyness including Doug attempting to destroy the pool.

I will try and upload the video as soon as I can over this awful connection.

http://brej.org/blog/wp-content/uploads/2010/06/braga_pool

Yes accidents will happen.

Migrating to Google apps

I have always hosted my own @brej.org mail server on my home machine, upon which I have become more and more reliant, over the years. But one thing always worried me a little. I have a dynamic IP address, changes in which are tested for every 5 minutes and the DNS entry is automatically updated (thanks to the guys at afraid.org). So at most there is a five minute window during which I cannot reach my home computer, and neither can my mail. Mail servers have an automatic cooloff and retry system, so if they cannot contact the target system, they will retry a few minutes, hours and eventually days later. This is great because an email simply doesn’t disappear if a server goes down for a while. You can always tell if someone is lying if they say that the email they sent must have got lost (never seen this happen). But what can happen, in my case, is the mail server attempts to connect to some random machine assigned by old IP address. Luckily these rarely run mail-servers themselves so nothing bad usually happens, but never the less there is always a chance and I like to sleep at night knowing it is all fine.

Google Apps

I was a little put off by the thought of someone else running my mail server because I was scared that many of the options I relied on were not going to be there, and secondly, I am always afraid that I will get lazy, stop being able to manage things like mail servers, get a mac and consume my own brain.

I already have a gmail account, but you cannot deliver the mail of an entire domain to gmail account. Instead, you can use Google Apps. These are designed for businesses and organisations, but for individual (and small group) use it is free. You have to control the domain DNS entries. Firstly you have to prove that you have control over the domain, before the setup is allowed to begin. Setting up a single user and selecting all uncaught mail to go to that account was fairly straightforward. Once that is set up, you can flip the switch and point your MX entries to the gmail servers. The site has a guide including images of how to do this with most domain providers, although the images are very blurry (not sure why). Then it is a case of waiting a couple hours while the DNS caches are refilled and your mail starts trickling to the new server. IMAP is fairly easy to set up. There is a folder called [Gmail] that holds the normal set of default folders, so in thunderbird (or any other mail client), you have to set the drafts and sent to point to those. There is no support for nested folders, which is a shame, but the folders themselves are just representations of tags, so it may not make as much sense. The biggest job is setting up the filters.

I have a set of procmail filters which made prioritising very simple. To replicate this in Gmail, it took a few more filters. Most mailing lists make this easy by adding a list name to the headers and gmail recognises these ans suggests the right filter. What I didn’t realise at first is that filters can have reasonably powerful logical expressions, but you have to use the rather generic sounding “has words” filter. There is an implicit AND between the different filters, so using this field is the only way of getting an OR between a subject and a from field e.g. from:(a@foo.org) OR subject:(“[foo]” OR “bar”).

One annoyance is that it matches on whole words (with underscore being a valid letter) and I still haven’t found a method of  matching “CS_Newsletter_2010” but with any number at the end. The second annoyance is the outgoing SMTP “corrects” your from address to be the one of the account you logged in with. This is annoying as I like to send from different addresses, but I guess I can still keep my home sendmail setup for that, or create an account (with a forwarding to the master account) for each outgoing address. (UPDATE) Actually it couldn’t be easier. You have to validate that indeed an email address belongs to you by entering a code sent to it. It works with addresses of other domains too.

TexMex evening

Sorry about the delay but, finally, here are some photos from the TexMex evening.

You know it is going to be a silly night when your drinks acquire worms from the very first bottle.

Will proudly placed himself in charge of making the margaritas. These were incredibly strong (and personally quite horrid).  Strange that we managed to get though three bottles of tequila, yet we still had plenty of limes. I suspect Will was not sticking to the correct measures.

But still he managed to find a steady stream of willing victims.

http://brej.org/blog/wp-content/uploads/2010/06/tux_pinata

And the there was the Tux piñata.

Tux will be remembered for his bravery in the revolutionary cause (and for sharing his sweets).

But the point of the night was the food. Lots of it. This is just one of the many bowls of salsa I spent four hours chopping.

This is only about half the food items. Shame I have no photos of the table when full as it was literally brimming with food. This was the first course of wraps and tacos.

This was followed by chili con carne (two types), expertly carried by Mai Anh (who also deserves thanks for helping me make the guacamole too).

All together there were 35 people there which is a personal record. I even invited some of the better students round to try and bully them into doing something amazing over the summer.

Here is a misbehaving pair of banditos.

Sadly this was a photo taken while I was carrying Tux to the bin for his un-ceremonial funeral.

The brave little lappy managed to play Mariachi music for some 5 hours without dying (note the Dynamplifier).

And the final course was the nachos, which were indoors as it was very dark outside by that hour. Because we run out of salsa I (foolishly considering the drinks Will forced me to have) decided to chop up some more. Thanks to John for taking that job over while I tried to stem the bleeding.

Tux piñata

Following the success of the Indian night. I am hosting a TexMex party.As the party invitation points out “I have never been to Mexico, but I have been to Texico and I have watched a lot of Speedy Gonzales, so I imagine it is a bit like that”. So apparently one thing people have at Mexican parties is a piñata. I have never seen a piñata in real life so this is completely guess work as to how to make it.

The body is make of papier-mâché. I was hoping to a baloon the exact right shape, but instead I had to go for a large balloon for the body, and a second balloon for the head. I covered the body balloon from all sides but the base, then turned it upside down and placed the smaller balloon on top and started placing more and more paper strips to stick the two baloons together. You really need three hands for this task.

After the first layer, I let it dry in front of a fan for a couple hours before adding the second layer including a beak make of card. There were 3 layers all together. I used the flour water glue mixture, of which the second batch worked a lot better as it was a bit thicker. This is the end of day one, as it then takes about 24 hours to dry completely.

Then it is onto the crate paper. I found the easiest way was to get a full folded roll of the paper, cut it lengthwise into two and add cuts to make the loops. Then draw a line of liquid paper glue and stick the strip to it.

Work from the bottom up, otherwise each strip gets in the way of the last. Also I kept some areas fur free, in the bottom and the face. Here I glued a single layer of black crate paper. For the beak, that needed about three layers to not show the text under it.

At this point, I did the surgery to add fill the penguin and attach the rope. I was worried that the rope would just rip the head apart, so I tied it to a pencil and fed it though a hole in a CD. That distributed the force around a ring in the head. At one point the back caved in a little, but with the weight of the sweets inside, it was possible to push it back out.

Then finally, attach the wings and feet. I punched two holes in the body and the wing/foot and used coloured cable ties. These work very well as you can trim them off.

Add a decorated hitting stick, and there you go, a Tux piñata. I’ll shall see tomorrow if it works.

UPDATE: Pictures and video of the party including Tux.

Gstreamer to the rescue

A week ago, I sent my laptop and the Utopium test-board and samples, with Will and Andrew to the Async symposium. This was to present the chips at the demo session. I didn’t go but I was planning to control the laptop remotely from the office and just needed Andrew to connect everything up on the demo. I configured the laptop to auto-login to the special user who had all the necessary programs on the desktop. Most importantly, that user had desktop sharing turned on. In case you didn’t know, this uses VNC screen 0 on port 5900. When you connect to WiFi spots, you don’t tend to get web visible IP addresses, so I also added a “phone home” button which would ssh to my work desktop machine and forward a port to the desktop VNC port. If you have set your ssh keys correctly, it won’t ask for a password. For voice and video I set up an Ekiga account so I could call the laptop.

What I didn’t expect is for just about every outgoing port under the sun to be filtered. This was rather scary because no one could contact me as all instant messenger ports were blocked. Eventually, Andrew managed to get a word though to tell me. The most useful port to get through is ssh (port 22). If you can ssh out, you can do pretty much anything. Port forwarding has become much easier nowadays with the GUI tools. If you compare setting up some forwarding in the GUI

to the equivalent in the iptables rules

*nat
:PREROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-A PREROUTING -i eth+ -p tcp --dport 5920 -j DNAT --to-destination :5900
-A PREROUTING -i eth+ -p tcp --dport 995 -j DNAT --to-destination :22
COMMIT

you can see how easy we have it now. So nice and customisable, I translated my custom iptables rules files, for the group machines, into the GUI ones. So at this point, Andrew could ssh out by adding “-o Port=995” to the ssh line.

But this is only half the solution, as I could only get a couple ports through, with limited time (I had some 20 minutes before the demo to fix everything), there was little chance of getting Ekiga working. Instead I quickly created a poor person’s internet telephony solution using Gstreamer. First you need to establish an ssh connection forwarding one port in each direction. I used 5000 and 5001 and it makes things simpler if you do the port swap in the ssh pipe like so:

ssh -L5000:localhost:5001 -R5000:localhost:5001 remote_machine.domain

This transfers connections to local port 5000 to the remote port 5001 (and vice-versa). So the servers should sit on port 5001 and clients should connect to port 5000. To create a server run:

gst-launch -v alsasrc ! audio/x-raw-int,rate=16000,channels=1 ! audioconvert ! speexenc ! tcpserversink host=127.0.0.1 port=5001

One server is needed at each end (assuming you wish to talk both ways). Only once the server has been created can a client connect to it using:

gst-launch -v  tcpclientsrc host=127.0.0.1 port=5000 ! speexdec ! alsasink sync=false

And there you have two way audio communication. It is possible to disconnect and reconnect to the server. Also, I noticed that when there was a complete network cut-out (happened several times) the audio would cut out, but then resynchronise and return.

You can do the same with video by swap forwarding ports 5002 and 5003 and running the server:

gst-launch -v v4l2src ! video/x-raw-yuv,framerate=\(fraction\)5/1 ! smokeenc threshold=1000 ! tcpserversink host=127.0.0.1 port=5003

and the client:

gst-launch -v tcpclientsrc host=127.0.0.1 port=5002 ! smokedec ! xvimagesink sync=false

This uses “smoke”, which is like motion JPEG but detects where there were no changes between frames.

So with that in place, the demo went quite smoothly with some people happily asking questions to the laptop and others being a bit surprised at voices coming from nowhere.

So, thank you Gstreamer! You saved my demo.