Yesterday I found out that Google have open-sourced their sawzall programming language.
Unlike the the Go Programming Language that Google released a while ago, Sawzall is a DSL tailored for fairly specific kind of jobs – processing large amount of log data in a map-reduce style to get statistical summary data.
Working on the Viral Ad Network, I do quite a lot of this kind of work, so when I first came across the Sawzall research paper a few years ago I was quite interested.
I’ve seen a few opinions about Sawzall from ex-googlers online, and the opinions at completely different ends of the spectrum. Now I’ve had a chance to play around with the language myself, here are my initial thoughts:
Firstly, the language itself isn’t exactly pretty (I’m comparing it to Python here), but as far as I’m concerned it’s actually quite good compared to languages such as R, or GLSL.
Sawzall is a remarkably compact language for some kind of jobs though – I quickly ported a simple hadoop streaming map-reduce step from Python to Sawzall, and (to my surprise) what was a 55 line Python program just for the mapper became a 22 line Sawzall program! (admittedly that’s because half of the python code deals with reading key,value pairs in and emitting the values – which are fairly much built-in to Sawzall.
One thing that did surprise me was that having ported my code to Sawzall, my application was slower than the original Python version. (That’s surprising because Sawzall is supposed to generate native code). I’m assuming this is something I could optimize away, and would become negligible for more complex processing for each row.
Of course the Sawzall compiler / runtime isn’t really designed to be run by itself – it’s designed to be embedded into a larger application as part of a distributed map-reduce flow – I haven’t had time to try integrating it into a larger application yet, but I’m guessing time will tell.





















