Junk data: why we still have no idea what the DfT’s most popular websites are

A couple of stories in the Telegraph and Daily Mail this week have hailed data released by the Department for Transport about the websites visited most often by workers at their department.

But if you look a little more closely at the raw data, it quickly becomes clear that these figures are being badly misrepresented by the newspapers involved. There’s a very important note on the last page of the data PDF (fascinatingly, missing from the Mail’s repost). It says:

Note : “number of hits” includes multiple components (e.g. text, images, videos), each of which are counted.

The difference between page views, visits and hits in web analytics is fairly important. Page views is the number of individual pages on a site that have been viewed; visits is the number of separate browsing sessions that have occurred. And hits is the number of individual files that are requested by the browser.

An individual page view can include dozens, or even hundreds, of hits. A single page view of the Telegraph front page, for instance, includes at least 18 hits just in the header of the page alone. That’s before we get to any images or ads. There are about another 40 image files called. It’s fair to suggest you could rack up the hits very quickly on most news websites – whereas very simple, single-purpose sites might register 10 or fewer per pageview.

Also important to note – if a website serves files from different sites – such as advertisements, or tracking codes – those sites will register a hit despite not never actually being seen by the person doing the browsing.

That explains why the second “most popular” site on the list is www.google-analytics.com – a domain that is impossible to visit, but which serves incredibly popular tracking code on millions of other websites. It’s probably safe to conjecture that it also explains the presence of other abnormalities – for instance, stats.bbc.co.uk, static.bbc.co.uk, news.bbcimg.co.uk, and cdnedge.bbc.co.uk, all in the top 10 and all impossible to actually visit. There are two IP addresses in the top 11 “most popular” sites, too.

As David Higgerson points out (in comments), there are some interesting patterns in the data.  But unless you know the number of hits per page, at the time the pages were viewed, as well as which ads were served from which other sites at the time, any straight comparison of the figures is meaningless. And the data itself is so noisy that any conclusions are dubious at best.

We can say that the BBC website is certainly popular, that the Bears Faction Lorien Trust LARP site probably got more visits than you might expect, and that civil servants do seem to like their news. Beyond that, the Mail’s claims of “cyberslacking”, of gambling sites (common advertisers) being popular and of there being six separate BBC sites in the top 10 are at best unsupported and at worst downright misleading.

Stealing the story: the death of the News of the World

The News of the World is dead, Rebekah Brooks has so far survived, Andy Coulson has been arrested and the British media is in overdrive, hunting down the next revelation about phone and voicemail hacking, covert surveillance, police bribery and political corruption. That’s the story that’s been obsessing me since it began to break on Monday night, with the Guardian revelation that murder victim Milly Dowler’s phone had been hacked by News of the World journalists.

Today the final edition of the 168-year-old News of the World hit the stands, and 200 people woke up without jobs, thanks to the decision by News International on Thursday to close the paper.

Killing the News of the World, along with its many other possible benefits for Rupert Murdoch, is an attempt to grab control of the story back – or at least to dilute it. Suddenly, instead of dissecting past issues of the paper to look for more evidence of illegal (or at least immoral) behaviour, journalists are dissecting the final issue. Instead of the possible guilt of former editors, the result is to introduce a discussion about the relative innocence of Colin Myler and his current staff. [Edit: see also Roy Greenslade’s look at the final edition.]

The gesture also attempts to make martyrs of the newspaper and of its existing journalists.  Suddenly it’s almost churlish to write furious diatribes about the past, when 200 forlorn journalist faces are staring out at you from the last ever newsroom photograph. The urge now is to eulogise, to sum up the 168-year life of the paper – and that means the narrative turns from exposing the illegal and immoral activities that have taken place over the years to a gentler summation of the paper’s life – lauding the good as well as discussing the bad.

It’s a hugely expensive and risky smokescreen to throw in front of a hungry set of journalists, but the result is still to change the terms of the narrative. The focus has shifted.

The political implications of this scandal are immensely complicated and far-reaching, but what I find most fascinating is the idea that the Murdoch empire had an interest in keeping politicians corrupt. If your power rests in part on your ability to unmask corruption – in selectively dishing dirt on those politicians who don’t do what you want – then in fact you have an incentive to ensure that there is a skeleton in everyone’s closet, and that you have the ability to expose it. You have a vested interest in building up the careers of celebrities whose secrets you can use to sell papers. The more corrupt the people at the top – the more dirty secrets you have on the most powerful politicians and policemen – the more control and power you wield.

Thanks to its 2.7m circulation and an estimated readership of about 8m, the News of the World was a kingmaker and a kingbreaker. But those readers won’t just disappear into the ether. The media landscape in the UK is undergoing seismic change not just because of the newspaper closure and the potential damage to other News International titles, but also because we don’t know where those loyal tabloid readers will end up. Presumably a Sunday edition of the Sun would snap them up immediately – so long as it wasn’t dead in the water from the News of the World fallout. But it will be very interesting to see whether the other Sunday papers see a circulation bump in the wake of the death of the Screws – or where the paper’s online readers will migrate to other mainstream titles, or disappear off to celebrity blogs or fragmented new media.

If the mass audience fragments, that could permanently reshape the hierarchy of power in this country in ways that are impossible to predict. We have already seen the power of the network in driving the story forwards. We have already seen a massive shift in power, with politicians openly attacking Rupert Murdoch, a man who seemed untouchable this time last week.

What happens next is anyone’s guess.

————————————–

Somewhere in the middle of all this, I start at the Guardian tomorrow as SEO Subeditor. I don’t know what next week holds but I’m immensely excited to be part of it – sad to leave Citywire, hugely so, but so excited.