Mis-using this blog to write up a personal projects again, I thought I’d explain the basics behind a simple eye-tracker I hacked together a while back (this video from August 08 shows my first version):

I’ve had quite a few questions about how to implement this in Python, so I thought I would explain the general process:
Detecting the Face
The first stage is to take the image from the webcam and estimate where the face is. There are good tutorials on using the haar classifiers that come with open cv to detect faces, and I would recommend reading a few of those.
The important thing is that this stage has to be really fast and give a low false positive rate – so I tweaked the parameters so that the false positive rate was low, and it accurately detected a face position on average about one in ten frames. Since the face isn’t going to be moving much, I used the previous face positions if I had not detected a new one. I did not care too much about accuracy for this stage, though.
Detecting the Eye positions
This is (again) very important, but this step has to be done with a very high level of accuracy for the simple method I used. Since the eyes are expected to be in a certain position on the head, I ran the haar classifier for eyes over the top 2/3 of the “face” detected before, which massively reduces the processing time for this stage.
I also found that I regularly had several suggested regions where the eyes were (possibly because I used a low-quality classifier), so I enforced some hard-coded rules to try to reduce the number of regions to two, while ensuring that they were in fact eyes being detected.
For example, I checked that the two regions didn’t intersect, and that they were roughly the same size. I also checked the probability that they were in that position based on the previous recorded position etc.
Analysing the pupils
Once we have the eye positions, we only look at properties of the image around the eyes. (you can see an example image of an eye region being shown in the video – I analysed both, but I only displayed one)
I really did take a very basic technique here – rather than relying on having a light in front to reflect a white dot in the pupils (which is the standard trick), I simply analyse the normalised moments of the pixelvalues.
(I just realised that sentence doesn’t sound like “simply” should be included, so here’s a bit more background). Basically, when you’re looking a distribution (like the distribution of values along the x-axis of an image), the “first moment” of the distribution is the average position, the second moment is the variance, and the third moment is the skew.
The variance does not tell us anything about the direction of the pupils, so I used the first and third moments in the X and Y axis as inputs to decide where the eyes are looking.
Feeding this into a simple linear learning algorithm worked very well. I did try a few more complicated algorithms, but my quick tests did show that I got almost linear behavior between the moments measured and the (true) pupil position on-screen (probably due to the wonderful accuracy of small angle approximations).
Question: Open Source Eye Tracking
What astounded me was the simplicity of the project - I think it took me about two evenings to do as much as I had done. Why then does all quality eye-tracking software cost an arm and a leg? Sure, there are algorithms and bits of code out there as part of thesis’, but they all have to be made from source and none of them really have a nice interface.
In my opinion, it would be a massive boost to the FOSS community if someone would focus on building such a system. Something that integrates screen capture, a key/mouse logger, and eyetracking. Unfortunately my code for this was really a quick hack – and I think it would be better to start from scratch than to re-factor the code (which is why I haven’t made it available yet).