Good metrics vs bad measurement

My former colleague Chris Moran has lots of sensible things to say about what makes a good metric, as do the many people he’s enlisted in that linked post to talk about the characteristics they value in measurement. I wanted to build on one of them: the capacity for people to actually use the metric to change something.

Functionally, plenty of things aren’t easy to measure. Some are almost impossible — much as Chris says, I have lost count of the number of blisteringly smart people working out how to measure things like quality or impact when it comes to journalism. Anything that involves qualitative surveys is probably too high cost for a small project. Anything that requires you to implement completely new analytics software is unlikely to be valuable unless it’s genuinely transformative (and even then, you risk the business equivalent of redesigning your revision timetable rather than actually revising). Anything that relies on people giving an unbiased assessment of the value of their work — like asking editors to assign an “importance” score to a story, say, or Google’s now-defunct “standout” meta tag — is doomed to failure, because an individual can’t accurately assess the relative nature of their work in the context of the whole system. Key point from Chris’s post: if you were going to game your measure, how would you game it? Do you trust everyone involved to act purely in the interests of good data, even when that gets in the way of their own self-interest?

In one team I managed, I once ran an OKR that focused on making sure we were known and appropriately involved as internal experts by the rest of the organisation. We discussed how to measure that, and ended up deciding that we’d know if we were succeeding based on the number of surprises that happened to us in a week. We were accustomed to finding out about projects too late for us to really be helpful — and, to a lesser extent, we were finding that our work sometimes surprised other people who’d benefit from being involved earlier on.

How do you measure surprises? We could have spent weeks working that one out. But for the sake of just getting on with it, we built a Google form with three inputs: what’s the date, who was surprised, who did the surprising. Team leads took the responsibility of filling in the form when it happened. That’s all you really need in order to know roughly what’s going on, and in order to track the trajectory of a metric like that. But because we measured it — really, honestly, mostly because we talked about it every week as a measure of whether we were doing our best work, and that led to thinking about how we could change it, which led to action— it improved.

Conversely, if you don’t care about something you measure, it’s almost certainly not going to change at all. If you spend enormous organisational energy and effort agreeing and creating a single unified metric for loyalty, say, but then you don’t mention it in any meetings or use it to define success or make any decisions about your products or your output… why bother measuring it at all? Data in isolation is just noise. What matters is what you use it for.

So if you’re going to actually make decisions about quality, or impact, or loyalty, or surprises, the key isn’t to spend ages defining a perfect metric. It’s getting 80% of the way there with as little effort as you can pull off, and then doing the work. It means working out what information you (or your teams, or your editors, or your leaders) don’t have right now that they need in order to make those decisions. Then finding a reasonable, rational, most-of-the-way-there metric you can use that unblocks those decisions. Eventually you might find you need a better measure because the granularity or the complexity of the decisions has changed. But you might equally find that you don’t really need anything other than your first sketch, because the real value is in the conversations it prompts and the change in the output that happens as a result. Precision tends to be what data scientists naturally want to prioritise, but it’s usually missing the point.

Crossposted to Medium because there’s traffic and discussion over there.

Reach and impact: news metrics you can use

I’ve been thinking about this Tow study for a while now. It looks at how stats are used in the New York Times, Gawker and Chartbeat; in the case of the latter, it examines how the company builds its real-time product, and for the former, how that product feeds in (or fails to feed in) to a newsroom culture around analytics. There’s lots to mull over if part of your work, like mine, includes communication and cultural work about numbers. The most interesting parts are all about feelings.

Petre says:

Metrics inspire a range of strong feelings in journalists, such as excitement, anxiety, self-doubt, triumph, competition, and demoralization. When devising internal policies for the use of metrics, newsroom managers should consider the potential effects of traffic data not only on editorial content, but also on editorial workers.

Which seems obvious, but probably isn’t. Any good manager has to think about the effects of making some performance data – the quantifiable stuff – public and easily accessible, on a minute-by-minute basis. The fears about numbers in newsrooms aren’t all about the data coming to affect decisions – the “race to the bottom”-style rhetoric that used to be very common as a knee-jerk reaction against audience data. Some of the fears are about how analytics will be used, and whether it will drive editorial decision-making in unhelpful ways, for sure, but the majority of newsroom fear these days seems to be far simpler and more personal. If I write an incredible, important story and only a few people read it, is it worth writing?

Even Buzzfeed, who seem to have their metrics and their purpose mostly aligned, seem to be having issues with this. Dao Nguyen has spoken publicly about their need to measure impact in terms of real-life effects, and how big news is by those standards. The idea of quantifying the usefulness of a piece of news, or its capacity to engender real change, is seductive but tricky: how do you build a scale between leaving one person better informed about the world to, say, changing the world’s approach to international surveillance? How do you measure a person losing their job, a change in the voting intention of a small proportion of readers, a decision to make an arrest? For all functional purposes it’s impossible.

But it matters. For the longest time, journalism has been measured by its impact as much as its ability to sell papers. Journalists have measured themselves by the waves they make within the communities upon which they report. So qualitative statements are important to take into account alongside quantitative measurements.

The numbers we report are expressions of value systems. Petre’s report warns newsrooms against unquestioningly accepting the values of an analytics company when picking a vendor – the affordances of a dashboard like Chartbeat can have a huge impact on the emotional attitude towards the metrics used. Something as simple as how many users a tool can have affects how something is perceived. Something as complex as which numbers are reported to whom and how has a similarly complex effect on culture. Fearing the numbers isn’t the answer; understanding that journalists are humans and react in human ways to metrics and measurement can help a great deal. Making the numbers actionable – giving everyone ways to affect things, and helping them understand how they can use them – helps even more.

Part of the solution – there are only partial solutions – to the problem of reach vs impact is to consider the two together, but to look at the types of audiences each piece of work is reading. If only 500 people read a review of a small art show, but 400 of those either have visited the show or are using that review to make decisions about what to see, that piece of work is absolutely valuable to its audience. If a story about violent California surfing subcultures reaches 20,000 people, mostly young people on the west coast of the US, then it is reaching an audience who are more likely to have a personal stake in its content than, say, older men in Austria might.

Shortly after I arrived in New York I was on a panel which discussed the problems of reach as a metric. One person claimed cheerfully that reach was a vanity metric, to some agreement. A few minutes later we were discussing how important it was to reach snake people (sorry, millennials) – and to measure that reach.

Reach is only a vanity metric if you fail to segment it. Thinking about which audiences need your work and measuring whether it’s reaching them – that’s useful. And much less frightening for journalists, too.

Time vs the news

Jason Kint, in an interesting piece at Digiday, argues that page views are rubbish and we should use time-based metrics to measure online consumption.

Pageviews and clicks fuel everything that is wrong with a clicks-driven Web and advertising ecosystem. These metrics are perfectly suited to measure performance and direct-response-style conversion, but tactics to maximize them inversely correlate to great experiences and branding. If the goal is to measure true consumption of content, then the best measurement is represented by time. It’s hard to fake time as it requires consumer attention.

Some issues here. Time does not require attention: I can have several browser tabs open and also be making a cup of tea elsewhere. TV metrics have been plagued by the assumption that TV on === attentively watching, and it’s interesting to see that fallacy repeated on the web, where a branching pathway is as easy as ctrl+click to open in a new tab. It’s also easy to game time on site by simply forcing every external link to open in a new tab: it’s awful UX, but if the market moves to time as the primary measurement in the way that ad impressions are currently used, I guarantee you that will be widely used to game it, along with other tricks like design gimmicks at bailout points and autorefresh to extend the measured visit as long as possible. Time is just as game-able as a click.

 

It’s worth noting that Kint is invested in selling this vision of time-based metrics to the market. That doesn’t invalidate what he says out of hand, of course, but it is important to remember that if someone is trying to sell you a hammer they are unlikely to admit that you might also need a screwdriver.

In a conversation on Twitter yesterday Dave Wylie pointed me to a Breaking News post which discusses another time-based metric – time saved. It’s a recognition that most news consumers don’t actually want to spend half an hour clicking around your site: they want the piece of information they came for, and then they want to get on with their lives. Like Google, which used to focus on getting people through the site as fast as possible to what they needed. Or like the inverted pyramid of news writing, which focusses on giving you all the information you need at the very top of the piece, so if you decide you don’t need all the details you can leave fully informed.

There’s a truism in newsroom analytics: the more newsy a day is, the more traffic you get from Google News or other breaking news sources, the less likely those readers are to click around. That doesn’t necessarily mean you’re failing those readers or that they’re leaving unsatisfied; it may in fact make them more likely to return later, if the Breaking News theory holds true for other newsrooms. Sometimes the best way to serve readers is by giving them less.

Medium’s reading time

“I think of competing for users’ attention as a zero-sum game. Thanks to hardware innovation, there is barely a moment left in the waking day that hasn’t been claimed by (in no particular order) books, social networks, TV, and games. It’s amazing that we have time for our jobs and families.

“There’s no shortage of hand-wringing around what exactly “engagement” means and how it might be measured?—?if it can be at all.Of course, it depends on the platform, and how you expect your users to spend their time on it.

“For content websites (e.g., the New York Times), you want people to read. And then come back, to read more.

“A matchmaking service (e.g., OkCupid) attempts to match partners. The number of successful matches should give you a pretty good sense of the health of the business.

“What about a site that combines both of these ideas? I sometimes characterize Medium as content matchmaking: we want people to write, and others to read, great posts. It’s two-sided: one can’t exist without the other. What is the core activity that connects the two sides? It’s reading. Readers don’t just view a page, or click an ad. They read.

“At Medium, we optimize for the time that people spend reading.

Medium’s metric that matters: Total Time Reading

Medium, as a magazine-style publisher(/platform/hybrid thing), wants a browsing experience in which every article is fully read through and digested, and where the next piece follows on from the former serendipitously. News publishers don’t necessarily want that, or at least not across the board. For features the approach makes a lot of sense, but for news that’s geared towards getting the important facts across in the first paragraphs – even the first sentence – it’s fundamentally at odds with the writer’s goals. News that aims to be easy to read shouldn’t, and doesn’t, take a lot of time to consume. So generalist publishers have to balance metrics for success that are often in direct conflict. (This is one of many reasons why, actually, page views are pretty useful, with all the necessary caveats about not using stupid tricks to inflate things and then calling it success, of course.)

Newsrooms also have to use – buzzwordy as the phrase is – actionable metrics. It doesn’t matter what your numbers say if no one can use them to make better decisions. And newsrooms have something that Medium doesn’t: control over content. Medium doesn’t (for the most part) get to dictate what writers write, how it’s structured, the links it contains or the next piece that ought to follow on from it. So the questions it wants to answer with its metrics are different from those of editors in most newsrooms. Total time reading is most useful for news publishers in the hands of devs and designers, those who can change the furniture around the words in order to improve the reading experience and alter the structure of the site to improve stickiness and flow. Those are rarely editorial decisions.

The clue’s in the headline – it’s Medium’s metric that matters. Not necessarily anyone else’s.