Good metrics vs bad measurement

My former colleague Chris Moran has lots of sensible things to say about what makes a good metric, as do the many people he’s enlisted in that linked post to talk about the characteristics they value in measurement. I wanted to build on one of them: the capacity for people to actually use the metric to change something.

Functionally, plenty of things aren’t easy to measure. Some are almost impossible — much as Chris says, I have lost count of the number of blisteringly smart people working out how to measure things like quality or impact when it comes to journalism. Anything that involves qualitative surveys is probably too high cost for a small project. Anything that requires you to implement completely new analytics software is unlikely to be valuable unless it’s genuinely transformative (and even then, you risk the business equivalent of redesigning your revision timetable rather than actually revising). Anything that relies on people giving an unbiased assessment of the value of their work — like asking editors to assign an “importance” score to a story, say, or Google’s now-defunct “standout” meta tag — is doomed to failure, because an individual can’t accurately assess the relative nature of their work in the context of the whole system. Key point from Chris’s post: if you were going to game your measure, how would you game it? Do you trust everyone involved to act purely in the interests of good data, even when that gets in the way of their own self-interest?

In one team I managed, I once ran an OKR that focused on making sure we were known and appropriately involved as internal experts by the rest of the organisation. We discussed how to measure that, and ended up deciding that we’d know if we were succeeding based on the number of surprises that happened to us in a week. We were accustomed to finding out about projects too late for us to really be helpful — and, to a lesser extent, we were finding that our work sometimes surprised other people who’d benefit from being involved earlier on.

How do you measure surprises? We could have spent weeks working that one out. But for the sake of just getting on with it, we built a Google form with three inputs: what’s the date, who was surprised, who did the surprising. Team leads took the responsibility of filling in the form when it happened. That’s all you really need in order to know roughly what’s going on, and in order to track the trajectory of a metric like that. But because we measured it — really, honestly, mostly because we talked about it every week as a measure of whether we were doing our best work, and that led to thinking about how we could change it, which led to action— it improved.

Conversely, if you don’t care about something you measure, it’s almost certainly not going to change at all. If you spend enormous organisational energy and effort agreeing and creating a single unified metric for loyalty, say, but then you don’t mention it in any meetings or use it to define success or make any decisions about your products or your output… why bother measuring it at all? Data in isolation is just noise. What matters is what you use it for.

So if you’re going to actually make decisions about quality, or impact, or loyalty, or surprises, the key isn’t to spend ages defining a perfect metric. It’s getting 80% of the way there with as little effort as you can pull off, and then doing the work. It means working out what information you (or your teams, or your editors, or your leaders) don’t have right now that they need in order to make those decisions. Then finding a reasonable, rational, most-of-the-way-there metric you can use that unblocks those decisions. Eventually you might find you need a better measure because the granularity or the complexity of the decisions has changed. But you might equally find that you don’t really need anything other than your first sketch, because the real value is in the conversations it prompts and the change in the output that happens as a result. Precision tends to be what data scientists naturally want to prioritise, but it’s usually missing the point.

Crossposted to Medium because there’s traffic and discussion over there.

Conscious incompetence

Career changes are complicated, and since I started at the BBC I’ve gone through various emotional cycles. It’s simultaneously exciting and frightening to push yourself outside your comfort zone, and I think it’s fair to say that I’d gotten very accustomed to being an expert in my domain. Not being an expert all the time is still, after more than six months, something I’m getting used to.

I’m also in an unusual position in terms of how my job’s set up. Unlike my previous role, where I led a department of several dozen people, I currently have no direct line management reports – though plenty of leadership responsibility. I’m in a position where most of my time needs to be spent persuading and influencing, rather than actively doing, and that adjustment also takes some time and forces me to exercise different skills.

This week, thanks to a colleague, I was reminded of the concept of the four stages of competence. This is a longstanding theory about how we learn new skills and new things. We start with unconscious incompetence, where we have no idea of what we’re doing and we also don’t know how much we don’t know. From there we move to conscious incompetence – the “oh, shit” period – where the scale of what we don’t know starts to become clear. Then, with luck, we get to conscious competence, where we can do the thing but we have to focus and it takes effort. Then finally unconscious competence, where the whole thing just feels natural.

I think I’ve reached the point of conscious incompetence with a lot of new things. The key is finding joy in that, and knowing that it’s a crucial step towards building expertise.

Reddit meltdown: how not to build a community

Reddit is having a bit of a meltdown. Volunteer moderators have taken many of the site’s most popular and trafficked communities to private, making them impossible to read or participate in. Many others are staying open based on their purpose (to inform or to educate) but making clear statements that they support the issues raised.

The shutdown was triggered in protest at the sudden dismissal of Victoria Taylor, Reddit’s director of communications, who coordinated the site’s Ask Me Anything feature. But it’s more than that: the reason communities beyond r/IAmA are going dark is about longstanding issues with the treatment of moderators, communication problems and moderation tools, according to many prominent subreddit mods.

Really good community management matters. Communication matters. Being heard matters enormously to users, and the more work an individual is doing for the site, the more it matters to them personally.

Relying solely on volunteer moderators and community self-organisation limits what’s possible, because without the company’s support – both negative, in terms of banning and sanctioning, and positive in terms of tools, recognition and organisation – its users can’t effect significant change. What’s possible with buy-in from Reddit staff is far more interesting than what’s possible without – the AMAs Victoria supported are the prime example. It should be concerning for Reddit that there are so few others.

Communities grow and evolve through positive reinforcement, not just punishment when they contravene the rules. If the only time they get attention is when they push the boundaries, users will likely continue to push boundaries rather than creating constructively. They act out. Encouraging positive behaviour is vitally important if you want to shape a community around certain positive activities – say, asking questions – rather than focussing on its negatives.

That encouragement extends to offering the community leaders the tools they need to lead. The majority of moderators of Reddit’s default communities – the most popular ones on the site – use third-party tools because the site’s own architecture makes their work impossible. That should not be

And evolving communities need consistent procedures and policies, and those have to be implemented by someone with power as well as the trust and respect of the community. Power is relatively easy; any Reddit admin or employee has power, in the eyes of the community. Trust and respect is incredibly difficult. It has to be earned, piece by piece, often from individuals disinclined to trust or respect because of the power differential. That work doesn’t scale easily and can’t be mechanised; it’s about relationships.

Today’s meltdown isn’t just about u/chooter, though what’s happened to her is clearly the catalyst. It’s about the fact that she’s (rightly or wrongly) perceived to be the only Reddit admin to have both power and trust. She was seen as the sole company representative who listened, who worked with the community rather than above or around them. She was well-known and, crucially, well-liked.

Reddit needs more Victorias on its staff, not fewer. It needs more admins who are personally known within the community, more people who respond to messages and get involved on an individual level with the mods it relies on to do the hard work of maintaining its communities. It needs internal procedures to pass community issues up the chain and get work done for its super users and those who enable its communities to exist. It needs more positive reinforcement from those in power, especially in the light of increasing (and, I’d say, much-needed) negative reinforcement for certain behaviours; the community needs to see what ‘good’ looks like as well as ‘bad’. Not just spotlighting subreddits and blog posts about gift exchanges – actual, human engagement with the humans using the site.

Firing the figurehead for Reddit-done-right is not a good way to start.

Reach and impact: news metrics you can use

I’ve been thinking about this Tow study for a while now. It looks at how stats are used in the New York Times, Gawker and Chartbeat; in the case of the latter, it examines how the company builds its real-time product, and for the former, how that product feeds in (or fails to feed in) to a newsroom culture around analytics. There’s lots to mull over if part of your work, like mine, includes communication and cultural work about numbers. The most interesting parts are all about feelings.

Petre says:

Metrics inspire a range of strong feelings in journalists, such as excitement, anxiety, self-doubt, triumph, competition, and demoralization. When devising internal policies for the use of metrics, newsroom managers should consider the potential effects of traffic data not only on editorial content, but also on editorial workers.

Which seems obvious, but probably isn’t. Any good manager has to think about the effects of making some performance data – the quantifiable stuff – public and easily accessible, on a minute-by-minute basis. The fears about numbers in newsrooms aren’t all about the data coming to affect decisions – the “race to the bottom”-style rhetoric that used to be very common as a knee-jerk reaction against audience data. Some of the fears are about how analytics will be used, and whether it will drive editorial decision-making in unhelpful ways, for sure, but the majority of newsroom fear these days seems to be far simpler and more personal. If I write an incredible, important story and only a few people read it, is it worth writing?

Even Buzzfeed, who seem to have their metrics and their purpose mostly aligned, seem to be having issues with this. Dao Nguyen has spoken publicly about their need to measure impact in terms of real-life effects, and how big news is by those standards. The idea of quantifying the usefulness of a piece of news, or its capacity to engender real change, is seductive but tricky: how do you build a scale between leaving one person better informed about the world to, say, changing the world’s approach to international surveillance? How do you measure a person losing their job, a change in the voting intention of a small proportion of readers, a decision to make an arrest? For all functional purposes it’s impossible.

But it matters. For the longest time, journalism has been measured by its impact as much as its ability to sell papers. Journalists have measured themselves by the waves they make within the communities upon which they report. So qualitative statements are important to take into account alongside quantitative measurements.

The numbers we report are expressions of value systems. Petre’s report warns newsrooms against unquestioningly accepting the values of an analytics company when picking a vendor – the affordances of a dashboard like Chartbeat can have a huge impact on the emotional attitude towards the metrics used. Something as simple as how many users a tool can have affects how something is perceived. Something as complex as which numbers are reported to whom and how has a similarly complex effect on culture. Fearing the numbers isn’t the answer; understanding that journalists are humans and react in human ways to metrics and measurement can help a great deal. Making the numbers actionable – giving everyone ways to affect things, and helping them understand how they can use them – helps even more.

Part of the solution – there are only partial solutions – to the problem of reach vs impact is to consider the two together, but to look at the types of audiences each piece of work is reading. If only 500 people read a review of a small art show, but 400 of those either have visited the show or are using that review to make decisions about what to see, that piece of work is absolutely valuable to its audience. If a story about violent California surfing subcultures reaches 20,000 people, mostly young people on the west coast of the US, then it is reaching an audience who are more likely to have a personal stake in its content than, say, older men in Austria might.

Shortly after I arrived in New York I was on a panel which discussed the problems of reach as a metric. One person claimed cheerfully that reach was a vanity metric, to some agreement. A few minutes later we were discussing how important it was to reach snake people (sorry, millennials) – and to measure that reach.

Reach is only a vanity metric if you fail to segment it. Thinking about which audiences need your work and measuring whether it’s reaching them – that’s useful. And much less frightening for journalists, too.

Pocket Lint #27: dragons

If you’d like to get Pocket Lint as a regular-ish email on most Fridays you can sign up here. Delivery will be sporadic for a while.

Five stories about how, why and where we move.

That Dragon, Cancer – in its final hours of funding on Kickstarter.

“Mr. Jones, 64, has an intellectual disability and a swollen right hand that aches from 40 years of hanging live turkeys on shackles that swing them to their slaughter. His wallet contains no photos or identification, as if, officially, he does not exist. And yet he is more than just another anonymous grunt in a meat factory. Mr. Jones may be the last working member of the so-called Henry’s Boys — men recruited from Texas institutions decades ago to eviscerate turkeys, only to wind up living in virtual servitude, without many basic rights.”

What it’s like to be pregnant, bipolar and at significant risk of post-partum depression in the UK.

The truth about the teenager who disappeared in the Michigan State University steam tunnels, sparking a national panic about Dungeons and Dragons.

“Do your best. Every deity and the spirits of your dead comrades are watching you intently. It is essential that you do not shut your eyes for a moment so as not to miss the target. Many have crashed into the targets with wide-open eyes. They will tell you what fun they had.”

Parable of the polygons: a playable post on how a small amount of bias leads to segregation in society.

“Whites can live, love, study, work, play and die in segregation … and still profess that race has no meaning in their lives.”

The woman who saw dragons.

The Magic Circle is set inside a high-fantasy remake of a fictional 80s text adventure, also called The Magic Circle, that has been in production for many years. The original was made by Ish Gilder, “a mild-mannered game designer who won’t let us call him a genius” in the words of a fake website for the game. An audio diary early on (of course there are audio diaries) reveals that The Magic Circle has been in production for two decades. It’s vapourware. The environment resembles a whiteboard outline that has been erased and redrawn – a beautiful, sinewy vision of sketch lines and uncertain clouds, with only occasional flashes of colour showing live elements.”

The Rosetta landing, cartooned live, in gif form.

“The moon is much larger than it appears to be. This is worth remembering because next time you are looking at the moon you can say in a deep and mysterious voice, “The moon is much larger than it appears to be,” and people will know that you are a wise person who has thought about this a lot.

Poem of the week: Us Two, A. A. Milne

Game of the week: Detritus, which I made last time I moved 10,000 miles around the world.

Tumblr of the week is an Instagram instead: Follow Me.

Pocket Lint #26: implicit

If you’d like to get Pocket Lint as a regular-ish email on most Fridays you can sign up here.

“The 22 bus is the only route that runs 24 hours in Silicon Valley and it has become something of an unofficial shelter for the homeless. They call it Hotel 22.”

The problems with capitalism, as explained by a Minecraft hedge fund manager.

A medical actor who fakes illnesses so trainee doctors learn empathy. “Empathy means realizing no trauma has discrete edges. Trauma bleeds. Out of wounds and across boundaries. Sadness becomes a seizure. Empathy demands another kind of porousness in response.”

What does an orgasm look like?

Every email is a ghost story.

Harvard’s tests for implicit associations about race, gender and sexual orientation.

“With my form of multiple sclerosis, I quickly started to realise that while the disease would likely never kill me by itself, the sheer weight of time could still do some serious damage. You know. All the wonderful things I should have done before. All the terrible things that might now happen. Playing Spelunky’s genuinely refreshing in this regard: it’s a reminder that the only moment that really matters is the moment that’s currently unfolding. Strategy dries up and blows away in the present, and in its place you’re left with tactics, with what to do for the next thirty seconds. Forget the City of Gold – how do I handle this frog that’s blocking the exit? Forget my plans for the afternoon, what’s with this stutter?

On being a black male, six feet four inches tall, in America in 2014.

“Capturing Knight was the human equivalent of netting a giant squid. He was an uncontacted tribe of one.”

Revenge bento.

It’s a bad fence.

Poem of the week: Epistle: Leaving, Kerrin McCadden

Game of the week: Crossy Road

Tumblr of the week: To My Unborn Son

Pocket Lint #24: hills of beans

If you’d like to get Pocket Lint as a regular-ish email on Fridays you can sign up here.

“People do still donate at churches and other places. We literally have ten tonnes of beans. We have a ‘bean room’ at our central storehouse. People for some reason associate a foodbank with beans. But actually what we need is coffee, sugar, UHT milk, tinned fruit, tinned fish. A whole range of things. So we try and get the shopping list to people and ask them to buy from it. But whatever you do, you still get lots of beans.”

The rise and fall of Default Man: “When we talk of identity, we often think of groups such as black Muslim lesbians in wheelchairs. This is because identity only seems to become an issue when it is challenged or under threat. Our classic Default Man is rarely under existential threat; consequently, his identity remains unexamined. It ambles along blithely, never having to stand up for its rights or to defend its homeland.”

Of GamerGate and disco demolition.

Trouble at the Kool-Aid point.

The paranoid style in gaming misogyny.

I Know This Sounds Like Spam, But I Really Did Double My Mass In TWO WEEKS And Now Women Can’t Get Enough Of Me And I’m SCARED

If you only come into contact with one thing about GamerGate, make it this video.

Robert Webb on growing up male: ‘Nobody ever told me: you don’t have to waste years trying to figure out how to be a “man” because the whole concept is horseshit. We are people, individuals comprising a variety of sexes, races, shifting sexualities and all the rest of it. Every convention that tries to reinforce this difference is a step back. Notions of gender pointlessly separate men from women, but also mothers from daughters and fathers from sons. The whole thing is – at best – just a stupefying waste of everyone’s time.’

Shadows of Mordor, Watch Dogs, and the politics of NPC agency.

‘The first funeral parlour he went to, in Hoxton, east London, told him they needed £2,500 upfront for the church and the vicar. “I said: ‘I can’t afford that.’ They said: ‘You won’t be able to bury him without that money’,” he says, at his flat. His father’s finch whistles in a cage by the window. Griffin is a former landscape gardener who gave up his job to care for his father when he became unwell. Another firm quoted him costs of around £4,000, asking for £1,000 upfront. “I told the funeral office I would just go to the unemployment office to see what they can give me. They said, ‘Oh no, we need the money upfront’. That’s when I started to get worried.”’

Tip sheet and resources for journalists – and others – dealing with graphic images and material.

What to expect when the internet tries to ruin your life.

Poem of the week: Proserpina, Going Deeper, by Jack Hollis Marr

Game of the week: Realistic Kissing Simulator

No, good content is not enough for Facebook success

At the recent ONA conference, Liz Heron, who oversees Facebook’s news partnerships, came in for some questioning about how news organisations can do well on the platform – something that’s a cause of some consternation for many, as it becomes increasingly clear how important it is as a mass distribution service. This is one of her responses:

This is a familiar line from Facebook – I’ve been on panels with other employes who’ve said exactly the same thing. But while I have the greatest respect for Heron and understand that she has to present Facebook’s best side in public – and that a tweet may be cutting context away from a larger argument – this statement is demonstrably false. Even skimming the rather fraught question of what exactly “good” means in this context, it’s questionable whether quizzes and lists such as those that have brought Playbuzz its current success are in any meaningful way replicable for most news organisations.

It’s not that Playbuzz is “gaming the algorithm” necessarily, though it may be. It’s that the algorithm is not designed to promote news content. Facebook’s recent efforts to change that are, quite literally, an admission of that fact. Facebook itself knows that good – as in newsworthy, important, relevant, breaking, impactful, timely – is not sufficient for success on its platform; it sees that as a problem, now, and is moving to fix it.

In the mean time, creating “good” content will certainly help, but it won’t be sufficient. You can bypass that process completely by getting your community to create mediocre content that directly taps into questions of identity, like Playbuzz’s personality quizzes, and giving every piece absolutely superbly optimised headlines and sharing tools. You can cheerfully bury excellent work by putting it under headlines that don’t explain what on earth the story’s about, or are too long to parse, or are simply on subjects that people will happily read for hours but don’t want to associate themselves with publicly.

Time and attention are under huge pressure online. Facebook are split testing everything you create against everything else someone might want to see, from family photos to random links posted by people they’ve not met since high school, and first impressions matter enormously. “Good” isn’t enough for the algorithm, or for people who come to your site via their Facebook news feed. It never has been. Facebook should stop pretending that it is.

Further reading: Mathew Ingram has context and a longer discussion.

Reddit thinks it’s a government, but doesn’t want to govern

In non-spoof news, today Reddit’s CEO posted a blog post about why it wasn’t going to take down a community specifically devoted to sharing naked photos of celebrities acquired by hackers and very much not endorsed by those pictured. Then, having drawn a line in the sand, it promptly banned the community. That caused, unsurprisingly, a lot of users to react with confusion and not a little anger, pointing out – among other things – that ban was more than a little hypocritical if Reddit was going to continue not to police other problematic communities (pro-anorexia and self harm communities, for instance), and suggesting that Reddit’s response was only because of the status, profile and power of the victims in this instance (the site doesn’t take down revenge porn, for example). There’s been another round of explanation, which boils down to: Reddit got overwhelmed and therefore had to take action. That actually bolsters some of the arguments made by users – that it’s only the high-profile nature of this incident that forced action – but if the first post is to be believed, Reddit doesn’t see that as a problem. It wants the community to choose to be “virtuous” rather than being compelled to do so – it wants its users to govern themselves. But it also thinks it’s a government. Yishan says:

… we consider ourselves not just a company running a website where one can post links and discuss them, but the government of a new type of community. The role and responsibility of a government differs from that of a private corporation, in that it exercises restraint in the usage of its powers.

Yishan simultaneously argues that Reddit users must arrive at their own self-policing anarchic nirvana in which no bad actors exist, and that Reddit is not a corporation but a governing force which has both the right to police and, strangely, the responsibility not to do so. Of course Reddit is a corporation, subject to US and international laws. Of course its community is not a state, and its users are not citizens. Yishan is dressing up a slavish devotion to freedom of speech regardless of consequence as a lofty ideal rather than the most convenient way to cope with a community rife with unpleasant, unethical and often unlawful behaviour. Doxxing, revenge porn, copyright infringement so rampant it’s a running joke, r/PicsOfDeadKids: none of these things are dealt with according to the social norms and laws of the societies of which Reddit is, in reality, a part. Only when admins become overwhelmed is action taken to police its community, and at the same time the CEO declares the site to be, effectively, the creator of its own laws. This would be nothing but self-serving nonsense if it weren’t for the way it’s being used to justify ignoring harmful community behaviours. Reddit’s users are right to point out that the company only acts on high-profile issues, that Reddit’s lack of moral standards for its users allows these situations to develop and makes it much harder for the company to police them when they do, and that the site’s users suffer as a result of its haphazard approach:

This is just what happens when your stance is that anything goes. If you allow subreddits devoted to sex with dogs, of course people will be outraged when you take down something else. If you allow subreddits like /r/niggers,of course they’re going to be assholes who gang up to brigade. The fine users of /r/jailbait are sharing kiddy porn? What a shocking revelation. The point is, you can’t let the inmates run the asylum and then get shocked when someone smears shit on the wall. Stand up for standards for a change. Actually make a stance for what you want reddit to be. You’ll piss off some people but who cares? They’re the shitty people you don’t want anyway. Instead you just alienate good users who are sick of all of the shit on the walls.

If Reddit thinks it’s a government, it should be considering how to govern well, not how to absolve itself of the responsibility to govern at all.

Pocket Lint #21: sound and fury

If you’d like to get Pocket Lint as a regular-ish email on Fridays you can sign up here.

The Times of London is pumping increasingly frenetic typewriter noises into its newsroom.

An excellent primer on this week’s implosion in videogame culture.

Today, videogames are for everyone. I mean this in an almost destructive way. Videogames, to read the other side of the same statement, are not for you. You do not get to own videogames. No one gets to own videogames when they are for everyone. They add up to more than any one group.”

“When there’s no immediate threat to our understanding of the world, we change our beliefs. It’s when that change contradicts something we’ve long held as important that problems occur.”

Stereotype lift persists online with virtual, gendered avatars.

The Man Without a Mask: How the drag queen Cassandro became a star of Mexican wrestling.

How social media silences debate.

“Banning comments—or moderating with an iron fist—is not squelching honest and open debate in the public sphere, anymore than refusing to publish every letter to the editor, unedited, in a print publication. Telling people to take their bullshit to Reddit is not a harbinger of Orwellian dystopia.

Classic first lines from novels in emojis.

How to listen to the radio properly: BBC guidance from 1940.

Tumblr of the week: Slug Solos.

Poem of the week: Mr. Grumpledump’s Song, Shel Silverstein.

Game of the week: Gridland, a match-3 game with a building/fighting day/night cycle.