Author Archive

CRU hacked

Tim Wintle - November 21st, 2009

I’m amazed that this has barely been fully covered in the UK press, but the Climate Research Unit at the University of East Anglia has been hacked, and a large amount of data (and emails) released onto the web?

Why is this so interesting? Well for one thing, the CRU have come under question quite a lot in the past for refusing access to their raw data – effectively refusing other scientists the chance to validate or reject their research. The Wall Street journal has read the emails released, and reports:

“Many of the email exchanges discussed ways to decline such [FOIA] requests for information, on the grounds that the data was confidential or was intellectual property. In other email exchanges related to the FOIA requests, some U.K. researchers asked foreign scientists to delete all emails related to their work for the upcoming IPCC summary. In others, they discussed boycotting scientific journals that require them to make their data public.”

Now a lot of the data is publicly available, it will be interesting to see what independent researchers make of the data.

The leak is also incredibly important because the CRU is one of the main research departments in the world on climate change – and it’s research changes both public and national opinion.

Closure Compiler – Javascript optimising compiler

Tim Wintle - November 20th, 2009

While I still haven’t got around to releasing my python port of the yui-compressor (I will soon, I promise), my plan had originally been to extend the compressor into an optimising compiler for javascript – but I  just stumbled upon the a Google project that seems to have beaten me to it.

The Closure Compiler (google code page) does exactly that – it takes your javascripts and optimises them for both filesize and run-time.

I haven’t looked too deep yet, but it seems that it uses the parser from Rhino, and augments it with an implementation of a javascript AST, and an optimiser that works on the generated AST.

The Closure compiler is written in Java – anyone who feels like working on a python version is likely to find pynoceros (the python version of Rhino’s parser I annouced a few weeks ago) useful.

Dilbert does cloud computing

Tim Wintle - November 19th, 2009

74149.strip Dilbert does cloud computing

Optimisation – it’s sometimes needed.

Tim Wintle - November 10th, 2009

Here at Team Rubber, we pride ourselves on working in a fairly agile manner. For the viral ad network in particular, this means that the most important thing at the end of an iteration is to ship working features, rather than to wait until everything is perfect before shipping.

As a side effect of this, optimisation is generally left until it becomes a noticeable issue. Obviously we worry about the complexity of our algorithms, and not designing ourselves into a corner – but I’m generally happy to take a constant speed reduction in exchange for faster development.

The code I’ve been working on this iteration was a different story, however, and I thought this story might be useful to people in the same situation.

My first implementation took about six CPU hours (in userspace alone!) to process just 1.5 Gb of data. Sure it scales sub-linearly, but I don’t want to have to bring loads of extra hardware on-line just to support this program.

The first thing to look at was the profiling information. pstats showed me that we were spending over 35 minutes looking up entries in my custom cache class – which is shared between different applications. This has a knock-on effects on the rest of the system as our cached items have to expire within a set time – this 35 minute delay means at least 10,000 extra (expensive) cache misses during this run. Each cache miss takes an average of 0.03 CPU seconds, so that’s an extra five minutes on top

(more…)

Announcing Pynoceros

Tim Wintle - October 30th, 2009

I haven’t mentioned any free time coding on this blog for a while, but I thought some people might be interested in a project I released a couple of days ago.

logo blog Announcing PynocerosPynoceros is a python port of the javascript parser from the Rhino javascript interpreter. (It’s a fairly straight forward language conversion at the moment – so don’t expect it to be pythonic!)

Why? Well Rhino is a  stable, and well used code base, and it forms the basis for the YUI Compressor – which I’ve often wanted a python version of.

My version of the YUI compressor is almost done (I’ve just got to find time to add all the license information and prepare the release), but in the mean time I thought I’d release pynoceros for anyone who’s interested.

Marketers over-valuing Twitter?

Tim Wintle - October 18th, 2009

I’ve been arguing for a while that some marketers massively over-rate twitter when trying to measure on-line opinion. A majority of the “social media monitoring tools” put far too much emphasis on twitter in my opinion; and now two press release from Hitwise strongly support my argument.

To summarise, I feel that focusing on twitter ends up creating a very bad sample for any kind of opinion research, practically ignoring the effect of Facebook, Beebo, Myspace, Youtube, Search engines, News sites, Email,  Blogs, Forums, Instant messaging, and all the millions of other websites on-line.

What is more, I believe that focusing on twitter so strongly is what throws twitter’s collective opinion out of line with the rest of the internet. Online marketers going to twitter to measure internet opinion is like a market research team only inviting people who work for market research companies to give feedback on a product.

(more…)

US bloggers required to disclose Endorsements

Tim Wintle - October 13th, 2009

The US FTC recently updated their guidelines on endorsements and testimonials, bringing in specific examples regarding bloggers who are endorsing a product as part of an advertising campaign.

The guidelines were last updated in 1980, and it appears they wanted to clarify the implications to online advertising.

An example from the updated guidelines:

The advertiser requests that a blogger try a new body lotion and write a review of the product on her blog. Although the advertiser does not make any specific claims about the lotion’s ability to cure skin conditions and the blogger does not ask the advertiser whether there is substantiation for the claim, in her review the blogger writes
that the lotion cures eczema and recommends the product to her blog readers who suffer from this condition. The advertiser is subject to liability for misleading or unsubstantiated representations made through the blogger’s endorsement. The blogger also is subject to liability for misleading or unsubstantiated representations made in the course of her endorsement. The blogger is also liable if she fails to disclose clearly and conspicuously that she is being paid for her services.

In order to limit its potential liability, the advertiser should ensure that the advertising service provides guidance and training to its bloggers concerning the need to ensure that statements they make are truthful and substantiated. The advertiser should also monitor bloggers who are being paid to promote its products and take steps necessary to halt the continued publication of deceptive representations when they are discovered.

According to the Wall Street Journal,

Regulators say they haven’t seen a wave of abuses involving endorsements by bloggers but wanted to establish clear rules to prevent any problems in the future.

The FTC announcement can be read here, where you can also find the full FTC guidelines.

Do you really need those Keyword args? (python optimisation)

Tim Wintle - September 14th, 2009

I’ve been reading the python interpreter fairly closely recently (more on that to come in a later blog post), and I was surprised by how much optimisation there is around function calls, dispatching them differently depending on their parameters (expected and supplied)*.

I was going to include this in a later post, but to keep that shorter here’s a quick example. Let’s take these two massively simplified functions.


def a(spam, eggs):
    return

def b(spam, eggs=None):
    return

Calling a(0,1) is fast because the interpreter skips the keyword argument tests and pushes the parameters direct onto the stack for the new function.

Next came b(0,1) – which takes roughly 10% longer on my machine as there is more for the interpreter to set up.

Calling with keyword arguments is far slower though – with:

a(spam = 0, eggs = 1)

and

b(spam = 0, eggs = 1)

both taking 50% longer than the fastest opportunity (and a negligible difference between the two). Obviously a large part of the 50% increase is setting up the names of the parameters on the stack – but if you read the interpreter source you’ll see there’s far more than that at play (including quick lookup of “self” for bound methods etc.).

(obviously most non-trivial functions will spend a significant time within the function body – which will reduce the relative performance boost – but there’s sure to be the odd situation where it’s worth knowing this.)

* n.b. – I was looking at the 2.5 tag of python, but as far as I can tell none of this code seems to have changed so far in the 3.x trunk.

The Team…

Tim Wintle - July 17th, 2009

As time goes by it’s getting tougher and tougher to get a photo of the people that make up Team Rubber, so I thought I’d use the opportunity of a fire drill today to try to snap a photo of the office.

Unfortunately it seems half the office was out today (and our fire procedure isn’t strict enough to get the London office to travel down to Bristol just to line up outside for ten minutes.) – but here’s a snap of those of us that were in.

team 17 july 2009 The Team...

For those interested – here is the last time we actually managed to snap the majority of the team (in December ’07)

team big The Team...

Python tail-optimisation using byteplay

Tim Wintle - April 20th, 2009

(I’m going to start off by emphasizing that this is not for production use, it is just a bit of harmless fun while I was looking at the structure of python’s bytecode, and I thought it might be interesting reading for others)

There have been quite a few hacks in the past to add tail-call optimisation to python – normally in cross-interpreter python, but while I was playing with the byteplay module thought I’d have a go at writing a function that re-compiles a function with basic tail-call optimisation inserted.

My method is basic, and converts tail-recursive calls in a (pure) function into jump statements.

(more…)