Time vs the news

Jason Kint, in an interesting piece at Digiday, argues that page views are rubbish and we should use time-based metrics to measure online consumption.

Pageviews and clicks fuel everything that is wrong with a clicks-driven Web and advertising ecosystem. These metrics are perfectly suited to measure performance and direct-response-style conversion, but tactics to maximize them inversely correlate to great experiences and branding. If the goal is to measure true consumption of content, then the best measurement is represented by time. It’s hard to fake time as it requires consumer attention.

Some issues here. Time does not require attention: I can have several browser tabs open and also be making a cup of tea elsewhere. TV metrics have been plagued by the assumption that TV on === attentively watching, and it’s interesting to see that fallacy repeated on the web, where a branching pathway is as easy as ctrl+click to open in a new tab. It’s also easy to game time on site by simply forcing every external link to open in a new tab: it’s awful UX, but if the market moves to time as the primary measurement in the way that ad impressions are currently used, I guarantee you that will be widely used to game it, along with other tricks like design gimmicks at bailout points and autorefresh to extend the measured visit as long as possible. Time is just as game-able as a click.

 

It’s worth noting that Kint is invested in selling this vision of time-based metrics to the market. That doesn’t invalidate what he says out of hand, of course, but it is important to remember that if someone is trying to sell you a hammer they are unlikely to admit that you might also need a screwdriver.

In a conversation on Twitter yesterday Dave Wylie pointed me to a Breaking News post which discusses another time-based metric – time saved. It’s a recognition that most news consumers don’t actually want to spend half an hour clicking around your site: they want the piece of information they came for, and then they want to get on with their lives. Like Google, which used to focus on getting people through the site as fast as possible to what they needed. Or like the inverted pyramid of news writing, which focusses on giving you all the information you need at the very top of the piece, so if you decide you don’t need all the details you can leave fully informed.

There’s a truism in newsroom analytics: the more newsy a day is, the more traffic you get from Google News or other breaking news sources, the less likely those readers are to click around. That doesn’t necessarily mean you’re failing those readers or that they’re leaving unsatisfied; it may in fact make them more likely to return later, if the Breaking News theory holds true for other newsrooms. Sometimes the best way to serve readers is by giving them less.

Medium’s reading time

“I think of competing for users’ attention as a zero-sum game. Thanks to hardware innovation, there is barely a moment left in the waking day that hasn’t been claimed by (in no particular order) books, social networks, TV, and games. It’s amazing that we have time for our jobs and families.

“There’s no shortage of hand-wringing around what exactly “engagement” means and how it might be measured — if it can be at all.Of course, it depends on the platform, and how you expect your users to spend their time on it.

“For content websites (e.g., the New York Times), you want people to read. And then come back, to read more.

“A matchmaking service (e.g., OkCupid) attempts to match partners. The number of successful matches should give you a pretty good sense of the health of the business.

“What about a site that combines both of these ideas? I sometimes characterize Medium as content matchmaking: we want people to write, and others to read, great posts. It’s two-sided: one can’t exist without the other. What is the core activity that connects the two sides? It’s reading. Readers don’t just view a page, or click an ad. They read.

“At Medium, we optimize for the time that people spend reading.

- Medium’s metric that matters: Total Time Reading

Medium, as a magazine-style publisher(/platform/hybrid thing), wants a browsing experience in which every article is fully read through and digested, and where the next piece follows on from the former serendipitously. News publishers don’t necessarily want that, or at least not across the board. For features the approach makes a lot of sense, but for news that’s geared towards getting the important facts across in the first paragraphs – even the first sentence – it’s fundamentally at odds with the writer’s goals. News that aims to be easy to read shouldn’t, and doesn’t, take a lot of time to consume. So generalist publishers have to balance metrics for success that are often in direct conflict. (This is one of many reasons why, actually, page views are pretty useful, with all the necessary caveats about not using stupid tricks to inflate things and then calling it success, of course.)

Newsrooms also have to use – buzzwordy as the phrase is – actionable metrics. It doesn’t matter what your numbers say if no one can use them to make better decisions. And newsrooms have something that Medium doesn’t: control over content. Medium doesn’t (for the most part) get to dictate what writers write, how it’s structured, the links it contains or the next piece that ought to follow on from it. So the questions it wants to answer with its metrics are different from those of editors in most newsrooms. Total time reading is most useful for news publishers in the hands of devs and designers, those who can change the furniture around the words in order to improve the reading experience and alter the structure of the site to improve stickiness and flow. Those are rarely editorial decisions.

The clue’s in the headline – it’s Medium’s metric that matters. Not necessarily anyone else’s.

Syria vs Cyrus: The Onion on CNN’s editorial judgment

Elegant, pointed rant in the Onion, courtesy of CNN’s decision to put Miley Cyrus’s VMAs appearance in the top slot of their site:

There was nothing, and I mean nothing, about that story that related to the important news of the day, the chronicling of significant human events, or the idea that journalism itself can be a force for positive change in the world. For Christ’s sake, there was an accompanying story with the headline “Miley’s Shocking Moves.” In fact, putting that story front and center was actually doing, if anything, a disservice to the public. And come to think of it, probably a disservice to the hundreds of thousands of people dying in Syria, those suffering from the current unrest in Egypt, or, hell, even people who just wanted to read about the 50th anniversary of Martin Luther King’s “I Have A Dream” speech.

But boy oh boy did it get us some web traffic.

The argument’s not one that purely applies to online news (see also: recent Sun front pageevery Daily Express splash for quite some time). Appealing to a mass audience is one way of trying to keep generalist news organisations in business so they can also do the harder, less grabby, more worthy news – selling sweets to fund the broccoli business. There’s a strong argument too that people are interested in many things: I can care about Syria and Cyrus simultaneously, and if someone who comes for the twerking can be persuaded to stick around for the complex international news coverage then that can be a valid growth tactic for generalist outlets. But that doesn’t mean it’ll happen organically, without a strategy to convert the Miley fans into long-term readers. And it doesn’t mean that giving the two stories equivalent billing is wise, unless you’re making a Mail-style move towards a very specific editorial tone.

You don’t have to stop doing the fun stuff, the cheeky and irreverent things, the entertainment journalism, in order to be a serious news outlet – but packaging and context do still matter online. People who go to your front page – your most loyal readers – notice shifts in your editorial approach to these sorts of stories, and will judge you on these things. It’s a deliberate, conscious editorial decision to put twerking in the top slot or rosy cheeks on the front page. There’s definite sense in promoting what your readers most want to read, and in using data to guide editorial strategy. But the key word in that sentence is guide, not subvert or overrule. It’s not the traffic play the Onion’s really angry about – it’s the editorial strategy that underlies it.

10 things I learned from a web traffic spike

Look Robot wordpress stats

Last week, my other half wrote a rather amusing blog post about the Panasonic Toughpad press conference he went to in Munich. He published on Monday afternoon, and by the time he went out on Monday evening the post had had just over 600 views. I texted him to tell him when it passed 800, making it the best single day in his blog’s sporadic, year-long history.

Next day it hit 45,000 views, and broke our web hosting. Over 72 hours it got more than 100,000 views, garnered 120 comments, was syndicated on Gizmodo and brought Grant about 400 more followers on Twitter. Here’s what I learned.

1. Site speed matters

The biggest limit we faced during the real spike was CPU usage. We’re on Evohosting, which uses shared servers and allots a certain amount of usage per account. With about 180-210 concurrent visitors and 60-70 page views a minute, according to Google Analytics real-time stats, the site had slowed to a crawl and was taking about 20 seconds to respond.

WordPress is a great CMS, but it’s resource-heavy. Aside from single-serving static HTML sites, I was running Look Robot, this blog, Zombie LARP, and, when I checked, five other WordPress installations that were either test sites or dormant projects from the past and/or future. Some of them had caching on, some didn’t; Grant’s blog was one of the ones that didn’t.

So I fixed that. Excruciatingly slowly, of course, because everything took at least 20 seconds to load. Deleting five WordPress sites, deactivating about 15 or 20 non-essential plugins, and installing WP Super Cache sped things up to a load time between 7 and 10 seconds – still not ideal, but much better. The number of concurrent visitors on site jumped up to 350-400, at 120-140 page views a minute – no new incoming links, just more people bothering to wait until the site finished loading.

2. Do your site maintenance before the massive traffic spike happens, not during

Should be obvious, really.

3. Things go viral in lots of places at once

Grant’s post started out on Twitter, but spread pretty quickly to Facebook off the back of people’s tweets. From there it went to Hacker News (where it didn’t do well), then Metafilter (where it did), then Reddit, then Fark, at the same time as sprouting lots of smaller referrers, mostly tech aggregators and forums. The big spike of traffic hit when it was doing well from Metafilter, Fark and Reddit simultaneously. Interestingly, the Fark spike seemed to have the longest half-life, with Metafilter traffic dropping off more quickly and Reddit more quickly still.

4. It’s easy to focus on activity you can see, and miss activity you can’t

Initially we were watching Twitter pretty closely, because we could see Grant’s tweet going viral. Being able to leave a tab open with a live search for a link meant we could watch the spread from person to person. Tweeters with large follower counts tended to be more likely to repost the link rather than retweeting, and often did so without attribution, making it hard to work out how and where they’d come across it. But it was possible to track back individual tweets based on the referrer string, thanks to the t.co URL wrapper. From some quick and dirty maths, it looks to me like the more followers you have, the smaller the click-through rate on your tweets – but the greater the likelihood of retweets, for obvious reasons.

Around midday, Facebook overtook Twitter as a direct referrer. We’d not been looking at Facebook at all. Compared to Twitter and Reddit, Facebook is a bit of a black box when it comes to analytics. Tonnes of traffic is coming, but who from? I still haven’t been able to find out.

5. The more popular an article is, the higher the bounce rate

This doesn’t *always* hold true. However, I can’t personally think of a time when I’ve witnessed it being falsified. Reddit in particular is also a very high bounce referrer, due to its nature, and news as a category tends to see very high bounce especially from article pages, but it does seem to hold true that the more popular something is the more likely people are to leave without reading further. Look, Robot’s bounce rate went from about 58% across the site to 94% overall in 24 hours.

My feeling is that this is down to the ways people come across links. Directed searching for information is one way: that’s fairly high-bounce, because a reader hits your site and either finds what they’re looking for or doesn’t. Second clicks are tricky to get. Then there’s social traffic, where a click tends to come in the form of a diversion from an existing path: people are reading Twitter, or Facebook, or Metafilter, they click to see what people are talking about, then they go straight back to what they were doing. Getting people to break that path and browse your site instead – distracting them, in effect – is a very, very difficult thing to do.

Look Robot referrals

The head of a rather long tail.

6. Fark leaves a shadow 

Fark’s an odd one – not a site that features frequently in roundups of traffic drivers, but it can still be a big referrer to unusual, funny or plain daft content. It works like a sort of edited Reddit – registered users submit links, and editors decide what goes on the front page. Paying subscribers to the site can see everything that’s submitted, not just the edited front. I realised before it happened that Grant was about to get a link from their Geek front, when the referrer total.fark.com/greenlit started to show up in incoming traffic – that URL, behind a paywall, is the place where links that have been OKed are queued to go on the fronts.

7. The front page of Digg is a sparsely populated place these days

I know that Grant’s post sat on the front page of Digg for at least eight hours. In total, it got just over 1,000 referrals. By contrast, the post didn’t make it to the front page of Reddit, but racked up more than 20,000 hits mostly from r/technology.

8. Forums are everywhere

I am always astonished at the vast plethora of niche-interest forums on the internet, and the amount of traffic they get. Much like email, they’re not particularly sexy – no one is going to write excitable screeds about how forums are the next Twitter or how exciting phpBB technology is – but millions of people use them every day. They’re not often classified as ‘social’ referrers by analytics tools, despite their nature, because identifying what’s a forum and what’s not is a pretty tricky task. But they’re everywhere, and while most only have a few users, in aggregate they work to drive a surprising amount of traffic.

Grant’s post got picked up on forums on Bad Science, RPG.net, Something Awful, the Motley Fool, a Habbo forum, Quarter to Three, XKCD and a double handful of more obscure and fascinating places. As with most long tail phenomena, each one individually isn’t a huge referrer, but the collection gets to be surprisingly big.

9. Timing is everything…

It’s hard to say what would have happened if that piece had gone up this week instead, but I don’t think it would have had the traffic it has. Grant’s post hit a chord – the ludicrous nature of tech events – and tapped into post-CES ennui and the utter daftness that was the Qualcomm keynote this year.

10. …but anything can go viral

Last year I was on a games journalism panel at the Guardian, and I suggested that it was a good idea for aspiring journalists to write on their own sites as though they were already writing for the people they wanted to be their audience. I said something along the lines of: you never know who’s going to pick it up. You never know how far something you put online is going to travel. You never know: one thing you write might take off and put you under the noses of the people you want to give you a job. It’s terrifying, because anything you write could explode – and it’s hugely exciting, too.

Why liveblogs almost certainly don’t outperform articles by 300%

In response to this study, linked to by journalism.co.uk among many others.

  1. The sample size is 28 pieces of content across 7 news stories – that content includes liveblogs, articles, picture galleries. That’s a startlingly small number for a sample which is meant to be representative.
  2. The study does not look at how these stories were promoted, or whether they were running stories (suited to live coverage), reaction blogs, or other things.
  3. The traffic sample is limited to news stories, and does not include sports, entertainment or other areas where liveblogs may be used, and that may have different traffic profiles.
  4. The study compares liveblogs, which often take a significant amount of time and editorial resource, with individual articles and picture galleries, some of which may take much less time and resource. If a writer can create four articles in the time it takes to create a liveblog, then the better comparison is between a liveblog and the equivalent amount of individual, stand-alone pieces.
  5. The study is limited to the Guardian. There’s no way to compare the numbers with other publications that might treat their live coverage differently, so no way to draw conclusions on how much of the traffic is due to the way the Guardian specifically handles liveblogs.
  6. The 300% figure refers to pageviews. Leaving aside the fact that this is not necessarily the best metric for editorial success, the Guardian’s liveblogs autorefresh, inflating the pageview figure for liveblogs.

All that shouldn’t diminish the study’s other findings, and of course it doesn’t mean that the headline figure is necessarily wrong. But I would take it with a hefty pinch of salt.

Requesting politely to stay in the dark will not serve journalism

Quote

At Salon, Richard constantly analyzed revenue per thousand page views vs. cost per thousand page views, unit by unit, story by story, author by author, and section by section. People didn’t want to look at this data because they were afraid that unprofitable pieces would be cut. It was the same pushback years before with basic traffic data. People in the newsroom didn’t want to consult it because they assumed you’d end up writing entirely for SEO. But this argument assumes that when we get data, we dispense with our wisdom. It doesn’t work that way. You can continue producing the important but unprofitable pieces, but as a business, you need to know what’s happening out there. Requesting politely to stay in the dark will not serve journalism.

- from Matt Stempeck’s liveblog of Richard Gingras’s Nieman Foundation speech

Junk data: why we still have no idea what the DfT’s most popular websites are

A couple of stories in the Telegraph and Daily Mail this week have hailed data released by the Department for Transport about the websites visited most often by workers at their department.

But if you look a little more closely at the raw data, it quickly becomes clear that these figures are being badly misrepresented by the newspapers involved. There’s a very important note on the last page of the data PDF (fascinatingly, missing from the Mail’s repost). It says:

Note : “number of hits” includes multiple components (e.g. text, images, videos), each of which are counted.

The difference between page views, visits and hits in web analytics is fairly important. Page views is the number of individual pages on a site that have been viewed; visits is the number of separate browsing sessions that have occurred. And hits is the number of individual files that are requested by the browser.

An individual page view can include dozens, or even hundreds, of hits. A single page view of the Telegraph front page, for instance, includes at least 18 hits just in the header of the page alone. That’s before we get to any images or ads. There are about another 40 image files called. It’s fair to suggest you could rack up the hits very quickly on most news websites – whereas very simple, single-purpose sites might register 10 or fewer per pageview.

Also important to note – if a website serves files from different sites – such as advertisements, or tracking codes – those sites will register a hit despite not never actually being seen by the person doing the browsing.

That explains why the second “most popular” site on the list is www.google-analytics.com – a domain that is impossible to visit, but which serves incredibly popular tracking code on millions of other websites. It’s probably safe to conjecture that it also explains the presence of other abnormalities – for instance, stats.bbc.co.uk, static.bbc.co.uk, news.bbcimg.co.uk, and cdnedge.bbc.co.uk, all in the top 10 and all impossible to actually visit. There are two IP addresses in the top 11 “most popular” sites, too.

As David Higgerson points out (in comments), there are some interesting patterns in the data.  But unless you know the number of hits per page, at the time the pages were viewed, as well as which ads were served from which other sites at the time, any straight comparison of the figures is meaningless. And the data itself is so noisy that any conclusions are dubious at best.

We can say that the BBC website is certainly popular, that the Bears Faction Lorien Trust LARP site probably got more visits than you might expect, and that civil servants do seem to like their news. Beyond that, the Mail’s claims of “cyberslacking”, of gambling sites (common advertisers) being popular and of there being six separate BBC sites in the top 10 are at best unsupported and at worst downright misleading.