Good metrics vs bad measurement

My former colleague Chris Moran has lots of sensible things to say about what makes a good metric, as do the many people he’s enlisted in that linked post to talk about the characteristics they value in measurement. I wanted to build on one of them: the capacity for people to actually use the metric to change something.

Functionally, plenty of things aren’t easy to measure. Some are almost impossible — much as Chris says, I have lost count of the number of blisteringly smart people working out how to measure things like quality or impact when it comes to journalism. Anything that involves qualitative surveys is probably too high cost for a small project. Anything that requires you to implement completely new analytics software is unlikely to be valuable unless it’s genuinely transformative (and even then, you risk the business equivalent of redesigning your revision timetable rather than actually revising). Anything that relies on people giving an unbiased assessment of the value of their work — like asking editors to assign an “importance” score to a story, say, or Google’s now-defunct “standout” meta tag — is doomed to failure, because an individual can’t accurately assess the relative nature of their work in the context of the whole system. Key point from Chris’s post: if you were going to game your measure, how would you game it? Do you trust everyone involved to act purely in the interests of good data, even when that gets in the way of their own self-interest?

In one team I managed, I once ran an OKR that focused on making sure we were known and appropriately involved as internal experts by the rest of the organisation. We discussed how to measure that, and ended up deciding that we’d know if we were succeeding based on the number of surprises that happened to us in a week. We were accustomed to finding out about projects too late for us to really be helpful — and, to a lesser extent, we were finding that our work sometimes surprised other people who’d benefit from being involved earlier on.

How do you measure surprises? We could have spent weeks working that one out. But for the sake of just getting on with it, we built a Google form with three inputs: what’s the date, who was surprised, who did the surprising. Team leads took the responsibility of filling in the form when it happened. That’s all you really need in order to know roughly what’s going on, and in order to track the trajectory of a metric like that. But because we measured it — really, honestly, mostly because we talked about it every week as a measure of whether we were doing our best work, and that led to thinking about how we could change it, which led to action— it improved.

Conversely, if you don’t care about something you measure, it’s almost certainly not going to change at all. If you spend enormous organisational energy and effort agreeing and creating a single unified metric for loyalty, say, but then you don’t mention it in any meetings or use it to define success or make any decisions about your products or your output… why bother measuring it at all? Data in isolation is just noise. What matters is what you use it for.

So if you’re going to actually make decisions about quality, or impact, or loyalty, or surprises, the key isn’t to spend ages defining a perfect metric. It’s getting 80% of the way there with as little effort as you can pull off, and then doing the work. It means working out what information you (or your teams, or your editors, or your leaders) don’t have right now that they need in order to make those decisions. Then finding a reasonable, rational, most-of-the-way-there metric you can use that unblocks those decisions. Eventually you might find you need a better measure because the granularity or the complexity of the decisions has changed. But you might equally find that you don’t really need anything other than your first sketch, because the real value is in the conversations it prompts and the change in the output that happens as a result. Precision tends to be what data scientists naturally want to prioritise, but it’s usually missing the point.

Crossposted to Medium because there’s traffic and discussion over there.

13 things I learned from six years at the Guardian

… in which I went from SEO subeditor to executive editor for audience, via Sydney and New York.

This post is cross-posted from Medium for archival purposes.

I started at the Guardian in 2011 as an SEO subeditor, working out how to bring the Guardian’s journalism to the widest possible relevant audience; in 2013 I moved to Australia to launch the local edition, taking on a much broader audience development role. After nearly two years there, having built one of the most widely read news sites in the country, I moved to New York to do the same thing but with more resources (and a lot more news). Towards the end of 2015 I came home to take on the global challenge and bring a holistic approach to audience development to the broader Guardian. Now I’ve made the difficult decision to move on, and I’m leaving behind a brilliant set of people well equipped to take on the challenges of the future.

I’ve learned an enormous amount during my time and my travels, and I hope I’ve taught some people some useful things too. Here are 13 of the most important things I know now that I didn’t know six years ago.

1. Data isn’t magic, it’s what you do with it that counts.

There’s a tendency for news organisations (and a lot of other organisations) to get very excited and very suspicious around numbers. People who understand how linear regression works are clearly dangerous wizards, and getting involved with data at all used to be seen as something dirty?—?something that could taint you. This is patently daft, because numbers don’t remove people’s brains or their editorial sensibilities. We make better decisions when we’re better informed, and all data is is information.
The flip side of that is that data isn’t sufficient to make improvements in how we work or what we do. The only thing that matters is the decisions we take in response to the numbers. I’ve been lucky to be involved in the development of Ophan, the Guardian’s in-house live stats tool, and the most common misconception about it is that it’s just a data display. It’s never been that: it’s a cultural change tool. It’s not just about putting numbers into the hands of editorial people?—?it’s explicitly about getting them to change the way they make decisions, and to make them better. It’s a tool for enhancing journalistic instinct, and one of the reasons why we can be so cavalier about demonstrating it everywhere is that the commercial advantage it brings is not written on the screen. The advantage is in how we use it, and that’s a years-long project no other organisation will be able to imitate.

2. People are more important than stories.

You’d think this wasn’t controversial, but it is. Journalists have a tendency to work ourselves into the ground, to ignore our own needs and push ourselves incredibly hard to get stories. That’s part and parcel of the job, a lot of the time.
But if you’re a manager or an editor (or, more likely, both), you have to watch out for that tendency in others and in yourself. Good people who go above and beyond what’s asked of them for a story are worth protecting and supporting, and they are probably going to need some time to recover after massive events that take a lot out of them. They need to be able to take time out without feeling on edge about a story breaking that they might miss. Nothing is served by letting the best people burn out. Nothing is served by burning out yourself.

3. Management is a technology.

Management style is built, not intuited; it is actively and deliberately created, not naturally occurring. It is a technology, something that can be improved to make organisations more efficient or better, and that can be implemented in many different ways.

Making all managers within an organisation work out what management ought to be like for themselves is about as efficient as making every journalist design their own CMS. News organisations?—?especially on the editorial side?—?tend to have a healthy scepticism about management-speak and corporate bullshit, but that can’t be allowed to stand in the way of solid leadership approaches that can be universally understood and adopted.

4. Change is for everyone.

The news business has changed immeasurably in just the last decade, since I started. For those who started as journalists before the internet took hold, it can be almost unrecognisable. Change is constant, and innovation never ceases; there is a dramatic urgency about most news organisations’ efforts to change, and those on the cutting edge are often incredibly impatient for others to get on with it.

But if you find yourself thinking about how much everything needs to change, stop for a moment and look inwards at yourself. Chances are that you’re right?—?that everything does need to change, and that the folks around you are changing more slowly than you are. But that doesn’t mean that you don’t need to do your own work. You can’t always hurry things along, but often you can model the impact of those changes in your own way. Whether that’s altering your own newsgathering practices, implementing different techniques in your own team, or going out and getting the skills you think you might need tomorrow?—?you can probably make a bigger difference than you realise by working on yourself, not just the people around you.

5. Attention is the only thing that’s scarce on the internet.

You can get more of everything online except human attention. If you’re lucky enough to work in a business that aims to attract people’s attention for positive reasons?—?and good enough at what you do to succeed at it?—?then treat it with respect. The most important commodity most people have to spend online is their attention. If you want to gain their trust, don’t screw about with it.

6. Pivoting to video is not a strategy.

Video isn’t a strategy. “More video” isn’t a strategy. “More video with more video ads on it”: also not a strategy. What kinds of stories are you going to tell? Do people actually want those stories in that format? How are you going to reach people, how are you differentiating your work from all the other things on the internet, and why should anyone trust you in a market so crowded with terrible, useless video right now? Stop pivoting, start planning.

7. Platforms are not strategies, and they won’t save news.

Seriously. If someone else’s algorithm change could kill your traffic and/or your business model, then you’re already dead. Google and Facebook are never going to subsidise news providers directly, and nor should they. Stop waiting for someone to make it go back to the way it was before. If what you do is essential to your audience, so essential that their lives wouldn’t be the same without it, then you should be able to monetise that. If it’s not, your first priority should be to admit that and then get on with changing it.

8. Quality journalism can be a strategy.

Making good stuff that people want to read?—?or watch?—?is a valid strategy, if it also includes monetising that attention effectively. So is choosing which platforms to focus on based on where your intended audience is and what you can do with them there. Good journalism?—?especially good reportage?—?gives people something important for which there is no substitute. (So does good entertainment, of course.) Many people value it enormously and, if you’re known for providing it, they’ll come to expect it and trust you more as a result. There’s no law that says people will only read celebrity news or stuff you’ve nicked off the front page of Reddit.

The vast majority of the Guardian’s most read pieces of all time are high quality journalism on serious topics. Many of them are live blogs of breaking news. I remember very fondly launching a 7,000-word piece by the former prime minister of Australia at 10am on a Saturday, when the internet is basically empty, and watching it smash our local traffic records. I remember the day when a piece about the death of capitalism went viral. Not every big hit is a long read or a deeply serious bit of journalism, of course, but if you write for the audience you want, and you respect people’s attention and intelligence, you might be pleasantly surprised by the long term results.

9. The internet is made of humans.

You can’t predict the future, nor understand what scientific innovations might become dramatically important in the coming decades. You can maybe make some educated guesses about the next 18 months, but even that could be thrown out of the window by a major news event or a Zuckerbergian whim.

You can, however, understand a great deal about human motivations and behaviour, and filter your approach to new technologies based on what you know about people. A great deal of the work involved in predicting the future is really just understanding people and systems, and especially systems made up of people.

10. It’s often better to improve a system than develop one brilliant thing.

Making systems better is not particularly sexy work. It tends to be incremental, slow and messy, taking knotty problems and carefully unknitting them. In the time it takes to make a widely-used system very slightly better, you could probably make half a dozen gorgeous one-off pieces of journalism that the world would love.

But if you make the system better, you potentially make lots of people’s jobs easier, or you save dozens of person-hours in a month, or you make hundreds of pieces of journalism work slightly more effectively. It’s not flashy, and probably most people won’t even be aware of what you’ve done. Most organisations need people doing both, because without the brilliant beautiful one-off pieces, how would you know what the system needs to be able to do in the long run? But people who do the flashy things are plentiful, and people willing and able to graft on the stuff that just incrementally makes things better are in sadly short supply.

11. Radical transparency helps people work with complexity.

In a fast-moving environment where everything is constantly changing (eg: the internet, the news, and/or social media) you have no way of knowing what someone else might need to know in order to do their job well. The only way to deal with this is to be a conduit for information, and not bottle anything up or hide it unless it’s genuinely confidential. I can’t possibly know what information I come across might turn out to be helpful in a few months’ time, and I definitely don’t have the knowledge to do that for anyone else. People often need different data in order to get context for what they’re trying to achieve, and if you’re trying to communicate a specific message or a particular approach, you’re going to need to keep saying it over and over again. It’s basically impossible to communicate too much.

12. Most obvious dichotomies are false.

SEO isn’t dead; social isn’t pointless. Loyalty and reach both matter. Lifestyle journalism can exist alongside serious pieces. In fact, in both cases, the two apparent sides of the argument are interrelated in hugely positive ways, and elements of both will support the other. While we always need to be careful about what we prioritise and where we spend resources, it’s always worth thinking about the systemic ways that behaviours can reinforce each other and finding opportunities to efficiently do more than one thing.

13. What you say matters far less than what you do.

This should be obvious, but it probably isn’t. It doesn’t matter what you say you want, it’s what you do to make it happen that makes a difference in the world. You have so much power right now. It’s up to you to do something meaningful with it.

The New York Times package mapper

From Nieman Lab, an interesting look at how the NYT maps traffic between stories, and analyses why and how things are providing onward traffic or causing people to click away from the site.

One example has been in our coverage of big news events, which we tend to blanket with all of the tools at our disposal: articles (both newsy and analytical) as well as a flurry of liveblogs, slideshows, interactive features, and video. But we can’t assume that readers will actually consume everything we produce. In fact, when we looked at how many readers actually visited more than a single page of related content during breaking news the numbers were much lower than we’d anticipated. Most visitors read only one thing.

This tool’s been used to make some decisions and change stories, individually, to improve performance in real time. That’s the acid test of tools like this – do they actually get used?

But the team that uses it is the data team, not the editorial team – yet. Getting editors to use it regularly is, it seems, about changing these data-heavy visualisations into something editors are already used to seeing as part of their workflow:

we’re thinking about better ways to automatically communicate these insights and recommendations in contexts that editors are already familiar with, such as email alerts, instant messenger chat bots, or perhaps something built directly into our CMS.

It’s not just about finding the data. It’s also about finding ways to use it and getting it to the people best placed to do so in forms that they actually find useful.

Forensic stylometry

Fascinating post on Language Log about the analysis of Robert Galbraith’s The Cuckoo Calling, and how the analyst reached the conclusion that JK Rowling was a possible author.

For the past ten years or so, I’ve been working on a software project to assess stylistic similarity automatically, and at the same time, test different stylistic features to see how well they distinguish authors. De Morgan’s idea of average word lengths, for example, works — sort of. If you actually get a group of documents together and compare how different they are in average word length, you quickly learn two things. First, most people are average in word length, just as most people are average in height. Very few people actually write using loads of very long words, and few write with very small words, either. Second, you learn that average word length isn’t necessarily stable for a given author. Writing a letter to your cousin will have a different vocabulary than a professional article to be published in Nature. So it works, but not necessarily well. A better approach is not to use average word length, but to look at the overall distribution of word lengths. Still better is to use other measures, such as the frequency of specific words or word stems (e.g., how often did Madison use “by”?), and better yet is to use a combination of features and analyses, essentially analyzing the same data with different methods and seeing what the most consistent findings are. That’s the approach I took.

It’s interesting not just for its insight into a field that rarely comes into the public eye, but also for what’s written between the lines about how authors write. It suggests that, unless we really make an effort to disguise it, most writers have a linguistic fingerprint of sorts: a set of choices that we tend to make in roughly similar ways, often enough for a machine to notice when taken in aggregate. A writer’s voice goes beyond stylistic choices, genre and word choice, and comes down to the basic mechanics of the language they use.

Why liveblogs almost certainly don’t outperform articles by 300%

In response to this study, linked to by journalism.co.uk among many others.

  1. The sample size is 28 pieces of content across 7 news stories – that content includes liveblogs, articles, picture galleries. That’s a startlingly small number for a sample which is meant to be representative.
  2. The study does not look at how these stories were promoted, or whether they were running stories (suited to live coverage), reaction blogs, or other things.
  3. The traffic sample is limited to news stories, and does not include sports, entertainment or other areas where liveblogs may be used, and that may have different traffic profiles.
  4. The study compares liveblogs, which often take a significant amount of time and editorial resource, with individual articles and picture galleries, some of which may take much less time and resource. If a writer can create four articles in the time it takes to create a liveblog, then the better comparison is between a liveblog and the equivalent amount of individual, stand-alone pieces.
  5. The study is limited to the Guardian. There’s no way to compare the numbers with other publications that might treat their live coverage differently, so no way to draw conclusions on how much of the traffic is due to the way the Guardian specifically handles liveblogs.
  6. The 300% figure refers to pageviews. Leaving aside the fact that this is not necessarily the best metric for editorial success, the Guardian’s liveblogs autorefresh, inflating the pageview figure for liveblogs.

All that shouldn’t diminish the study’s other findings, and of course it doesn’t mean that the headline figure is necessarily wrong. But I would take it with a hefty pinch of salt.

Junk data: why we still have no idea what the DfT’s most popular websites are

A couple of stories in the Telegraph and Daily Mail this week have hailed data released by the Department for Transport about the websites visited most often by workers at their department.

But if you look a little more closely at the raw data, it quickly becomes clear that these figures are being badly misrepresented by the newspapers involved. There’s a very important note on the last page of the data PDF (fascinatingly, missing from the Mail’s repost). It says:

Note : “number of hits” includes multiple components (e.g. text, images, videos), each of which are counted.

The difference between page views, visits and hits in web analytics is fairly important. Page views is the number of individual pages on a site that have been viewed; visits is the number of separate browsing sessions that have occurred. And hits is the number of individual files that are requested by the browser.

An individual page view can include dozens, or even hundreds, of hits. A single page view of the Telegraph front page, for instance, includes at least 18 hits just in the header of the page alone. That’s before we get to any images or ads. There are about another 40 image files called. It’s fair to suggest you could rack up the hits very quickly on most news websites – whereas very simple, single-purpose sites might register 10 or fewer per pageview.

Also important to note – if a website serves files from different sites – such as advertisements, or tracking codes – those sites will register a hit despite not never actually being seen by the person doing the browsing.

That explains why the second “most popular” site on the list is www.google-analytics.com – a domain that is impossible to visit, but which serves incredibly popular tracking code on millions of other websites. It’s probably safe to conjecture that it also explains the presence of other abnormalities – for instance, stats.bbc.co.uk, static.bbc.co.uk, news.bbcimg.co.uk, and cdnedge.bbc.co.uk, all in the top 10 and all impossible to actually visit. There are two IP addresses in the top 11 “most popular” sites, too.

As David Higgerson points out (in comments), there are some interesting patterns in the data.  But unless you know the number of hits per page, at the time the pages were viewed, as well as which ads were served from which other sites at the time, any straight comparison of the figures is meaningless. And the data itself is so noisy that any conclusions are dubious at best.

We can say that the BBC website is certainly popular, that the Bears Faction Lorien Trust LARP site probably got more visits than you might expect, and that civil servants do seem to like their news. Beyond that, the Mail’s claims of “cyberslacking”, of gambling sites (common advertisers) being popular and of there being six separate BBC sites in the top 10 are at best unsupported and at worst downright misleading.