Newsrooms still tend to be weird about data. Not external data, that comes from governments or companies as part of a news story – we’re getting much better at that. But internal data, the numbers we generate ourselves. Web stats tend to be left to teams who use them mostly for commercial stuff. They exist as Big Numbers that newsrooms want to increase but don’t really understand why or how to do so. Editorial people who want to know their data tend to be given the numbers that are useful for commercial or sales teams, because those are the ones that matter to the business’s bottom line; they don’t tend to get much training in how to dig into the numbers themselves, or in analysing and applying that data to their own domain. Which is, to be honest, a bit of a shame. Stijn Debrouwere recently wrote an excellent talk about cargo cult analytics – about the problem of newsrooms assembling data packages that look great, but that don’t bring anyone any closer to actually acting on the data in useful ways. The most frustrating thing about this situation is that most newsrooms are stuffed with people who spend all day approaching things with curiosity, interest and an eye towards discovering something of use. Newsrooms today tend to include data journalists, people with at least a basic stats background, people who know the internet incredibly well, people who are excellent at sifting information and then distilling it down to the bits that matter. But those skills aren’t always applied to the data news organisations themselves produce. So this is a basic guide for journalists who want to get started on web stats – not for the technical skills you need, but the sorts of questions to ask, ways to approach the problems involved.
Which numbers matter?
This is probably the most important question, and one most companies have a very hard time answering. It’s always tempting simply to look at the top line – the biggest collection of data you can find – perhaps split by referrer or by section if you’re working on a large site. But the top line isn’t generally very helpful, unless you know what it’s made from and how much of it is actually useful traffic. The definition of ‘useful’ varies depending on what you’re trying to do – useful for an editor and useful for an advertiser are not the same thing. Useful in live stats and useful in long term analysis are not the same. The numbers that matter aren’t going to be the biggest ones you can find, unsegmented page views or monthly unique users. Those hide a multitude of smaller and much more interesting numbers. Numbers with a financial value attached; numbers that are too small, or too big, or unbalanced in some way. Numbers that are interesting in part because they’re not what you want them to be. Numbers you can work to change. Those are the interesting ones. Commissioning editor? Try Stijn’s monthly active users to daily active users ratio. Front page editor? Try loyal bounces – people who hit your front page for a second or third time in a day but only view one page before leaving. Journalist? Try daily referrals from your Twitter account. There are dozens of interesting numbers, when you start looking. Try non-registered users who view more than three pages per visit. Try email traffic. Try people who didn’t come back last month. (That one will almost certainly upset you.)
What’s normal?
You won’t know whether you’re seeing something new or unusual until you understand what normal is. What does an average day look like? What shapes and patterns would you expect to see in the data? Is it normal to have 25% of your users on mobile, or 10% from Belgium, or are those surprisingly high or low? And what about other comparable sites – is your data broadly similar to your peers or are there unexpected elements that make you special?
Why’s normal?
This is a bit more difficult to work out, but it’s worth trying to understand. Why is the status quo the way it is? Why do you see spikes in traffic at 9am, midday, and midnight on weekdays? Why do you get most of your traffic from Google, or Facebook, or a suite of obscure cycling forums? This is your current audience: understanding why it operates the way it does, the context and the wider web in which your data fits, will help you understand which levers you can pull to change it.
What questions can I ask?
Just being in possession of a bunch of numbers isn’t that interesting. What matters is how you can use them. This is pre-interview prep work, and a basic data journalism skill. Which questions can you ask that are going to get you a story? How and why are more interesting than what and when, on the whole, but also much harder to answer. Knowing that something’s unusual, or knowing a trend exists, is less interesting than understanding why that is, whether it’s a good or bad thing, and how it might be changed.
What’s the context?
Contextualising data helps you understand it, and identify potentially-useful patterns. Seeing weird Google-related activity around a particular piece on a celebrity at certain times of day or week? Check the TV guide. I remember once discovering that a four-year-old article about young entrepreneurs was getting unexpected fresh traffic because it ranked well for the name of a man who featured in a Google Chrome TV ad – in which he Googles his own name. Every time the ad aired, we had spikes of people copying him and clicking through. That’s a great opportunity to make sure that piece has good related links, and acts as a decent first page for someone hitting the site. Think about seasonality, and geography. Seasonal traffic changes can be vital context, and so can the fact that seasons aren’t the same all over the world. Behaviour online isn’t divorced from behaviour offline, and TV in particular influences it far more than might be obvious (see also: Miley Cyrus). Knowing the context for a number helps you identify causes and work out whether something is a fluke, something you can influence, or something you can take advantage of.
Grind fine and experiment
Seeing behaviour on your site in real time is genuinely useful – if you have a newsroom setup that lets you react to it. There’s no point having a dataset that you can’t, or won’t, use. Watching a big number on a graph climb is great for morale, and watching it sink is hard work; but neither requires much response in most newsrooms aside from checking your watch to see if you’re on the way to a daily spike or a daily trough. Watching a small subset of the data change because of something you did, though, is immensely powerful. There are not many tools out there right now that let you dig deep enough to be able to understand what effects you’re having in real time. Chartbeat’s dashboard is, for all its prettiness, built around the big number; real time Google Analytics is built for watching, and not really built for publishers, though you can hack it with advanced segments to make it much more useful. If you’re lucky enough to have a tool that lets you go deep, don’t just use it to look at the big number at the top. Go for the detail. Grind fine. Experiment at small scale and document your results. And then spread out your discoveries to the rest of the newsroom. Use the numbers to change practice. Far easier said than done. This post was written with help from the folks at Help Me Write. If you want to suggest post ideas or encourage me to blog more frequently about stuff you’re interested in, I’d be massively grateful.