Author Archive

Inovative viral marketing

Tim Wintle - March 24th, 2009

I just got sent an interesting request from a friend of mine – he’s applying for a job and as part of the application process he’s been asked to film a job interview talking about the company, put it on YouTube, and drive traffic to it to see who can get the most views.

The result is that thousands of people will be getting emails from friends asking them to watch a video of one of their  friends recommending the company (great way to mix viral marketing and word of mouth!)

You can seem my friend Alex telling the worst joke ever here. (it’s worth it – it’s about dwarfs)

We can’t hide it – gender test

Tim Wintle - March 24th, 2009

Just found the gender genie – a web app that tries to determine if the author of a passage of text is male or female.

It seems to work very well – it worked out that Adam was male from his last post, that Helen was female from this post, and that Ben was male from this post.

On the other hand, it did say that Michaela was Male from this post – so it’s not perfect.

Nothing stays hidden on-line

Tim Wintle - March 17th, 2009

All these blog posts from colleagues in texas; how come this hasn’t been posted here yet?

2332364105 94a2ea3265 Nothing stays hidden on line

(Thanks to Clare Reddington for pointing out the flickr photostream)

YouTube, Google and counting views.

Tim Wintle - March 13th, 2009

As many people in industries such as ours will have noticed – YouTube is being slow at updating the view count for some videos at the moment. Luckily we have our own numbers to go by, so it’s not affecting us as much as it is affecting many companies, but I thought I’d put my explanation up here so we can refer people to it.

According to youtube, this seems to be due to an algorithm change made on the 25th of February. (they have made similar comments elsewhere)

Quote:

We’ve made a change in our public-facing view counts across the site
that will enable us to consistently reflect what is considered a
‘view,’ based upon video consumption, video streaming and spam
filtering. This only affects view counts from February 25 moving
forward.Implementing this change also caused view count updates to slow down a
bit in general; many people have noticed this and we’re aware of the
issue.

This raises some very interesting points (these are my observations, have not been confirmed with Google and may not reflect the opinions of Team Rubber):

First, for people who don’t deal with software like this every day (like I do for the viral ad network), I’ll explain the common way that numbers like this are updated:

  • There are one or more “tracking servers”, running all over the place – these are the servers that actually record a “view”, “hit”, or “action” – and they simply record lots of information about each action, which will be looked over later.
  • Every few minutes the main algorithm runs over all the data it hasn’t looked at yet and updates the numbers that are shown on the dashboards.

The important thing to notice is that the views are recorded right at the beginning and they will be updated at some point. Even if the main algorithm is stopped entirely for a few days, it will carry on in the future if you’re patient.

Prioritizing videos (“Why does this only happen once I reach 200/300 views?”)

You may have noticed that the number of views per video has always been updated quicker for videos with few views than for videos with more views. For example, a newly uploaded video will normally update it’s view count within a few minutes of a video being watched, where a video that has already had several thousand views will update it’s view count more slowly.

This suggests that when Google run their main script, they tend to update the numbers for videos with less views more often than for videos with a higher number of views – and leave the other data to be processed less often (say every few hours)

This makes a lot of sense, because  people with 50 views are more likely to be watching their numbers every few minutes to see if they have another 5 views than people who have had 200,000 views – who may only care about their views increasing by 1,000. It keeps users happier.

This explains why we (and others affected by this issue) have seen view counts rising as normal until they get above 200-300 views – at which point the numbers appear “stuck”.

Balancing the work (“Why doesn’t this affect all videos?”)

Clearly a massive site like YouTube getting so many views need more than one computer running to update these numbers. I’m going to assume that Google run this over their normal map-reduce system.

They may tens, hundreds, or even thousands of computers running their view-counting algorithm (and I don’t expect to ever find out…), but all views for a video have to be counted by the same computer, so they need some manner of splitting up the millions of views they have recorded into batches of work to be done.

They almost certainly do this using some form of hash function – you can picture this as saying that every video on YouTube is grouped into various buckets – each of these buckets will have it’s views processed on the same machine (or at the same time).

The problem comes when a hash function doesn’t split up the items equally (i.e. one “bucket” has significantly more/less videos in it than another one). This appears to be the problem here – only some videos have been affected, and my assumption is that this is because one of these “buckets” has ended up with far more views than the others – meaning that one set of machines (or one job) gets over-loaded and ends up being incredibly slow.

Lessons Learned

For me, working with a similar system to the above, the number one thing that I have learned is that for tasks like this that might be incredibly sensitive to hash functions it’s not safe to assume that a hash function that’s theoretically good is going to remain good.

I don’t know if they are able to,  but the situation would be better if YouTube chose the hash function at the beginning of each main job. I.e. each time that they run the main script that updates the information on the dashboards, they chose to use a different hash function. This way, if a video ends up in a bucket that’s overloaded one time, it will end up in a different bucket next time (which shouldn’t be overloaded).

Of course, this is all theoretical, and is based on a large number of assumptions – YouTube may perform their hashing at a far earlier stage, and they may not be able to change the hash function each time they run the job.

Tim Wintle

What is “Viral Marketing”? (and language->semantic effects)

Tim Wintle - January 24th, 2009

Reading through the RubberRepublic blog, I thought I’d point the whole of team rubber at the article on the semantics of the word “viral” when applied to marketing.

This is a very interesting topic to me. With my leniency towards very specific definitions, I’m going to start right from the start, and explain that I’m certainly not a believer in Wittgenstein’s views on natural language.

To me, it’s not unreasonable to define a strict subset of natural language with a single, well defined, 1-1 semantic value function for discussing technical matters (and I believe the definition of “viral” should fit into such a subset), in the same way that we define mathematical terms in first order logic (I’m not going to get into provability here).

i.e. I think that it’s possible, and reasonable, to define the meaning of individual words which are indisputable and fixed when talking in technical language.

For this reason, it really drives me up the wall when two people talk about something, use the same word, but are actually discussing different things.

An example is how we have recently changed the naming for our “Syndicated Ad Units” (Previously “Content Units”).

When we define terms for such a large system, we are effectively defining our own language (or at least the non-common alphabet – technically the set from which words are taken rather than set from which letters are taken). By changing the naming for the system, we have effectively created a second language.

These languages are technically as distinguishable as programming languages are – and switching between requires the same work as switching between programming languages.

Obviously we don’t want too much redundancy in our alphabet, or we end up

  • Having to remember a much larger set of nouns
  • diverging strongly from a 1-1 semantic value function, which is indistinguishable from the effect of losing orthogonality of the semantic values of individual nouns

To explain the second point, if we mark the semantic value function of a language “I” (for “interpretation”), and we have an alphabet that consists only of the words “abc”, “def” and “ghi”, then if we have orthogonality in semantic values of the nouns, we would have that changing the meaning of “abc” – marked I(“abc”) (though adding functionality to the thing we call “abc”) changes I(“abc def”) in the same way as I(“abc ghi”) changes. It also implies that I(“zyx…wv”) does not change  unless “abc” is part of “zyx…wv”.

We did a very interesting thing while we changed the language used for syndication – we changed which objects have their own names. This changes the set of concepts that can be described with a single word – {I(a) for a in the alphabet}, rather than simply changing the strings used for the objects in the alphabet.

This fundamentally changes the language, and changes the effort required when semantic value functions change (as they always will on a long term software project).

For example, let’s take some of the objects that have changed names (you can see what the semantic values of these names are here):

Old Name New Name
Med. Rect. Gadget Content Unit Med. Rect. Fun Unit (Gadget)*
Med. Rect. Text Link Content Unit Med. Rect. Text Link Fun Links
Med. Rect. Text/Image Link Content Unit Med. Rect. Text and Image Link Fun Links
Fun Link of the day Text Link Content Unit Fun Link of the day Fun Links

* “(gadget)” is added internally for this type of Fun Unit.

By enforcing these changes, several changes to the alphabet are implied – firstly the fact that we do not use “gadget” externally creates two alphabets, and hence two new languages – but the aim is to keep one language a subset of the other one.

Previous alphabet:

“Med. Rect.”, “Gadget”, “Content Unit”, “Text Link”, “Text/Image Link”, “Fun Link of the Day”

New alphabet (internal):

“Med. Rect.”, “Fun unit”, “Gadget”, “Text Link”, “Fun Links”, “Text and Image Link”, “Fun Link of the Day”

New alphabet (external):

“Med. Rect.”, “Fun unit”, “Text Link”, “Fun Links”, “Text and Image Link”, “Fun Link of the Day”

Now let’s look at how orthogonal the meanings of these are…

if we suddenly decided to break years of internet tradition and say that a “Med. Rect”  was actually 1024 pixels wide and one pixel high – that would effect all names with “med. rect.” in them equally. In fact, the meaning has not changed at all between the two languages – it defines the size that the syndicated placement will take up on your website.

similarly, the meaning of “Text Link” has not changed (and although we changed the string “Text/Image Link to “Text and Image Link”, the actual interpretation of these strings has not changed).

If you are looking closely though, you will have noticed that “Fun Link of the day Text Link Content Unit” has changed to “Fun Link of the day Fun Links” – this is an important change, since they have the same interpretation, the meaning of “Text Link” has not changed, and yet “Text Link” is not in the new name. This means that this semantic value must be associated with the phrase “Fun Link of the Day” – and so it is. But “Fun Link of the day” is also associated with a size (this is more obvious in the origional naming conventions) This is a many-1 mapping between the old language and the new one, and as such it changes the implied syntax quite dramatically.

Now for the most interesting strings – “Content Unit” has been changed to either “Fun Unit”, or “Fun Links”. This is very clearly a 1-many mapping during the changeover, which again changes the syntax of the language dramatically.

For example, let’s imagine (and this is purely imaginary), we decided to add a feature that (describing the semantic change in the old language) allowed you to “place your content unit in an RSS feed”.

In the old language, we have just updated the semantic value of “content unit”, however in the new language we have updated the semantic value of “Fun Unit” and “Fun Links”. Thus they are non-orthogonal (in fact, in terms of this change they would be parallel!).

For a user (of the language) who understands the new semantic value of “Fun Unit”, they cannot know that the semantic value of “Fun Links” has changed unless they have some prior knowledge about the language.

But how do we describe this intra-linguistical knowledge in the language itself? We cannot say “All content units have …”, because “content unit” is not in the new language. Rather, we would have to state that “Fun Units and Fun Links have …” – but this requires updated semantic values to two strings from the alphabet. This might not seem like much, but by talking in a specific language we actually train parts of our brains to translate from this language into semantics (this was explained by Derek Smith at the Bristol Knowledge Unconference much better than I could explain it). This is an actual change to our brain that we are requiring – and we are requiring two changes in the new language.

To avoid this, we have added another string to the alphabet of our new language – “Ad Unit”. An “Ad Unit” can be a “Fun Unit” or “Fun Links” – but not both – so the interpretation of “Ad Unit” is the common interpretation between a”Fun Unit” and “Fun Links”.

But then if the semantics have remained the same between the two languages, the interpretation of “Ad Unit” must be the same interpretation as “Content Unit” was before…

That would mean that “Fun Unit” and “Fun Links” (both new words) are completely irrelevant words – since they don’t add any semantic information to the language!

Well, to explain this one we have to go back to the reason that we actually replaced the language in the first place. The first language  was defined by myself and Andy as we were thinking over the technical requirements, for use in implementing the systems. The second language came about after our sales and network teams mentioned that they thought users would get confused over the meanings of phrases.

We took this as a sign that we actually had two languages in use already – since there was obviously some concept that was essential to how this second group of people viewed the system that was not a concept to the technical team.

After some very long discussions, this concept had still not appeared to me, but the non-technical users described something that resembled the new language as describing their concepts better. This is a very interesting point to have come to in development terms – since it appeared to be a sign of what the users want to use the system to which is far better than anything that could be got out of a simple user-interview.

After hammering this out for several days, we finalised the new language. The above examples are only a small sub-section, but they are the section that cause the largest change in the allowed syntax of the language.

What was the difference?

Firstly let me quote the definition that the non-technical users decided on for “Fun Units” and “Fun Links” (I would have been far more strict over the definition, but if you’ve read this far then there’s a good chance that you would have as well; we may re-visit this definition):

  • “A Fun Link is an Ad Unit into which the Viral Ad Network can place a link to site hosting an asset”
  • “A Fun Unit is an Ad Unit into which the Viral Ad Network can insert actual content”

And that seems to be the difference – to me as a developer, there is a difference between the A,IMG,OBJECT,SCRIPT tags etc, – but that difference is contained in the selection between “Text Link”, “Text/Image Link”, “Video Player” (not mentioned previously) and “Gadget” (“Text” can only contain A tags, “Text/Image” can contain A and IMG tags, “Video Player” can only contain an actual video file, and “Gadget” can contain whatever you want).

For a non-technical user, really focusing on “this could be content” is more important – even though you could say that that information is already contained in the language, it’s so important that they want to say it twice.

Hence we’ve got the “old” and the “new” languages describing these things – the old language is cognitively simpler to learn, contains less vocabulary to learn, requires less mental work when new features are implemented, contains a single induced syntactic structure (at least over the vocabulary mentioned here) due to the orthogonality of semantic values, and does not require intra-linguistical “meta-knowledge” to talk about subsets of the language. It also has very low redundancy due to the

The new language has enforced redundancy, non-static syntactical structure, a larger vocabulary, non-orthogonal nouns, and thus has no strict subsets (when talking about Ad Units) that are languages capable of describing semantic updates in themselves. On the other hand, it focuses the mind on what the non-technical users found most important by repeating itself.

As you may have guessed, developers will continue to use the “old” language, and not just because switching over to the new language would require thousands of lines of code to be re-worked.

Tim Wintle

(oh, BTW, My definition of “viral” was brought here)

Basic Eyetracking using Python – overview

Tim Wintle - January 18th, 2009

Mis-using this blog to write up a personal projects again, I thought I’d explain the basics behind a simple eye-tracker I hacked together a while back (this video from August 08 shows my first version):

0 Basic Eyetracking using Python   overview

I’ve had quite a few questions about how to implement this in Python, so I thought I would explain the general process:

Detecting the Face

The first stage is to take the image from the webcam and estimate where the face is. There are good tutorials on using the haar classifiers that come with open cv to detect faces, and I would recommend reading a few of those.

The important thing is that this stage has to be really fast and give a low false positive rate – so I tweaked the parameters so that the false positive rate was low, and it accurately detected a face position on average about one in ten frames. Since the face isn’t going to be moving much, I used the previous face positions if I had not detected a new one. I did not care too much about accuracy for this stage, though.

Detecting the Eye positions

This is (again) very important, but this step has to be done with a very high level of accuracy for the simple method I used. Since the eyes are expected to be in a certain position on the head, I ran the haar classifier for eyes over the top 2/3 of the “face” detected before, which massively reduces the processing time for this stage.

I also found that I regularly had several suggested regions where the eyes were (possibly because I used a low-quality classifier), so I enforced some hard-coded rules to try to reduce the number of regions to two, while ensuring that they were in fact eyes being detected.

For example, I checked that the two regions didn’t intersect, and that they were roughly the same size. I also checked the probability that they were in that position based on the previous recorded position etc.

Analysing the pupils

Once we have the eye positions, we only look at properties of the image around the eyes. (you can see an example image of an eye region being shown in the video – I analysed both, but I only displayed one)

I really did take a very basic technique here – rather than relying on having a light in front to reflect a white dot in the pupils (which is the standard trick), I simply analyse the normalised moments of the pixelvalues.

(I just realised that sentence doesn’t sound like “simply” should be included, so here’s a bit more background). Basically, when you’re looking a distribution (like the distribution of values along the x-axis of an image), the “first moment” of the distribution is the average position, the second moment is the variance, and the third moment is the skew.

The variance does not tell us anything about the direction of the pupils, so I used the first and third moments in the X and Y axis as inputs to decide where the eyes are looking.

Feeding this into a simple linear learning algorithm worked very well. I did try a few more complicated algorithms, but my quick tests did show that I got almost linear behavior between the moments measured and the (true) pupil position on-screen (probably due to the wonderful accuracy of small angle approximations).

Question: Open Source Eye Tracking

What astounded me was the simplicity of the project -  I think it took me about two evenings to do as much as I had done. Why then does all quality eye-tracking software cost an arm and a leg? Sure, there are algorithms and bits of code out there as part of thesis’, but they all have to be made from source and none of them really have a nice interface.

In my opinion, it would be a massive boost to the FOSS community if someone would focus on building such a system. Something that integrates screen capture, a key/mouse logger, and eyetracking. Unfortunately my code for this was really a quick hack – and I think it would be better to start from scratch than to re-factor the code (which is why I haven’t made it available yet).

Mini-LD #6

Tim Wintle - January 10th, 2009

(3:33 am)

I didn’t have anything planned for this weekend, so I thought I’d take part in the Mini Ludum Dare – the monthly version of the infamous Ludum Dare 48 hour game-programming challenge.

You can keep up to date on my progress over the weekend in my Ludum Dare Blog, and you can see all the entrants in the competition blog.

At the last minute I got an arrangement for Saturday, so with only 24 hours of code time (I need to sleep tonight if I’ve got things to do tomorrow…), I decided not to attempt anything too complicated. It’s roughly nine hours in, and after a few hours thought I chose the secondary theme “Infection” to go with the primary theme of “monochrome”.

Working in Python with PyGame, here’s what my attempt looks like so far (remembering I’m not a graphics type of person):

screenshot 2 Mini LD #6

At the moment it’s fairly much like Asteroids – the player is the cell in the middle of the screen, and the nucleus points the direction you fire. The aim is to shoot all the infections with capsules of medicine.

I’ve still got a long way to go – loads of ideas for features to code, graphics to do/improve, and sound effects to make, but it’s on it’s way there.

If you want to comment, I’ll be blogging frequently on the ludum dare site whenever I’m online this weekend.

Going Viral (or not…)

Tim Wintle - December 22nd, 2008

100x20 digg button Going Viral (or not...)

Here’s a little Christmas present for all the media-buffs who read our blog – a colourful diagram from a simulation I ran of the distribution mechanism of viral marketing. The colour reflects the number of viewers the content ends up with, with Red being the most and Blue being the least.

(Click for larger version)
uk with labels Going Viral (or not...)

Explanation:

The x axis measures the quality of the viral content – i.e. the chance that someone will pass the content on to a friend when they see it.

The y axis measures a slightly abstract property of graphs. In this situation, given I could pass a viral to you, what is the probability that you could pass a viral to me?

The two extremes are Facebook and Bloggers:

facebook eg Going Viral (or not...)

On facebook there are only really “undirected” (two-way) connections – If you’re my friend then I’m your friend, and we both have the same ability to pass the content to the other person.

bloggers eg Going Viral (or not...)

The Blogosphere is different – If I write a blog post (like this one) you can comment, but the vast majority of readers will not, so the connections are virtually all one-way (“directed”). I am far more likely to pass content to the readers of this blog than a random reader is to pass content back to me.

It turns out that this difference can be really important in the amount of traffic you can get from seeding a viral marketing campaign, but (in my model) there is a significant change in what this difference will mean for your campaign depending on the quality of the content.

Looking at the main plot again, it’s clear that there is a bifurcation in the results (commonly known as the “tipping-point”) half way along the x-axis (It’s only half-way because I have chosen the range on the x-axis to place it in the centre).

This is the point where many people will say the content has “gone viral” – although very little content ever reaches this level. It is close to the point where you expect every person that sees your content to pass it on to exactly one friend (see “Jamie Oliver starts to get viral marketing“.

This is often chosen as the point of “viral”, since it is a distinct point that can be predicted in models – even though it cannot possibly be measured in reality (perhaps that is also part of the reason…).

But notice how the behaviour changes at that point – on the left (where the vast majority of content lies) your content will do better if you focus on Blogger-like seeding, with one-way arrows. On the right your content may do better with facebook-style seeding.

uk with labels Going Viral (or not...)

More information:

This simulation was run assuming a population of 60,000,000 (approximately the size of the UK internet population), and with an initial seeding pool of 1,000 viewers. The colours shown are the predicted number of unique viewers of the content. For More information, contact Rubber Republic.

Tim

Removing the new YouTube Search Box

Tim Wintle - December 4th, 2008
YouTube added a new search box to the youtube embedded player today.

Update: YouTube appear to have listened to users’ comments, and made a second update in a day – the search box now only displays when you hover over the video.

Here’s what it looks like:

0 Removing the new YouTube Search Box

Understandably, many people might want to remove this “feature” – so here’s how:
(The YouTube api has been updated to mention this, so it appears to be a supported method)

  1. In your embed code, find the url for the flash player (it is in there twice if you are using the standard YouTube embed code)
  2. Add the parameter showsearch=0 (by adding “&showsearch=0″ to the url)
  3. There is no step three – the video shouldn’t be showing the search box any more.

The video should now look like this:

0 Removing the new YouTube Search Box

Code:

<object classid=”clsid:d27cdb6e-ae6d-11cf-96b8-444553540000″ width=”425″ height=”344″ codebase=”http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0″><param name=”allowFullScreen” value=”true” /><param name=”allowscriptaccess” value=”always” /><param name=”src” value=”http://www.youtube.com/v/4Cq4O_z5Blo&hl=en&fs=1&showsearch=0″ /><embed type=”application/x-shockwave-flash” width=”425″ height=”344″ src=”http://www.youtube.com/v/4Cq4O_z5Blo&hl=en&fs=1&showsearch=0″ allowscriptaccess=”always” allowfullscreen=”true”></embed></object>

I’m sure some people will find that useful when they look at their blogs today …

Tim W

They don’t make them like this any more…

Tim Wintle - October 19th, 2008

My first laptop was a Dell 320N, which I recently recovered from where it’s been gathering dust for over a decade. When I first unpacked it I found that it was reporting read errors from the hard disk, so I was going to start over and install linux on it. Luckly, having taken it to pieces (which was tough – the secret to removing the 320′s case is to undo the screws hidden under the rubber base) I found that there was a dusty pin on the IDE connector, and having fixed that it booted fine.

I am proud to say that booting up into DOS (and then Windows) brought back all the excitement I remember feeling 15 years ago at the power at my fingertips in such a small package.

19102008713 They dont make them like this any more...

The 320 runs an AMD 386 at a whopping 20MHz clockspeed (that’s roughly 300 times slower than my current laptop) – giving up to 10MIPS (Mega Instructions Per Second) – roughly 2,250 times less than my current laptop. With the extension module giving me 2 MB of RAM, and a 30Mb hard disk, I felt like I had enough power to do anything I could dream of (Although I never bought the Maths Coprocessor)

Having booted it up I couldn’t help but try out my favorate game – “SkiFree”. This skiing game really packed a punch (notice the screen colours are inverted to increase readability)

0 They dont make them like this any more...