Data journalism at the Frontline

Last week I went to a Frontline Club event on data journalism. There’s a live stream of the talks and questions here along with a good brief rundown of who said what. It’s well worth watching all the way through for four good talks exploring four very different aspects of data journalism.

First up, Simon Rogers of the Guardian Datablog, talking about crowdsourcing, wading through the MP’s expenses releases and the Wikileaks Afghanistan files, and working to visualise the data they receive. My highlights:

  • A huge amount of the hard work done on the MP’s expenses project was done by one person. Some individuals give a great deal.
  • For the Wikileaks files, they set up tasks and asked people to complete them. Simon didn’t say whether this had much of an effect on bounce rates or completion – but it’s likely that they did. And if so, perhaps the next step is gamifying the process.
  • But the Guardian hasn’t yet had any useful data come from crowdsourcing. Maybe they’re asking the wrong questions or requesting that people do the wrong tasks; maybe the wisdom of crowds is a different beast from the action of crowds.

Second, Julian Burgess, currently developing at the Times and soon to be off to New York, talking about how to tackle data. This was perhaps the most directly practical talk, with pointers to some great tools and tips on techniques – his slides are online here. My highlights:

  • When presented with data, don’t panic. Take your time, work through a sensible series of steps to analyse and work out the best approach to what’s in front of you.
  • There are buckets and buckets of tools out there, most if not all of them free. They’re big, they’re powerful, they’re incredibly useful. I took away the impression that a key skill for data/CAR journalists is knowing which tools are going to be good for handling which datasets – when to use Wordle vs ManyEyes, when to use Freebase Gridworks rather than Google Spreadsheets.
  • Metadata. This is where my little linked-data brain lit up as a puzzle piece fell into place. Hidden metadata is still part of a dataset – and it can tell stories about how the data came to be in this place, in this way, shape and format. That’s useful.

David McCandless, whose gorgeous data visualisations can be found at Information Is Beautiful spoke about visualising data and the stories it can reveal. My highlights:

  • Visualising something really well takes a lot of time and a lot of hard graft. It takes – this is going to sound obvious – a vision.
  • Making something that’s both beautiful and conveys information is hugely difficult and walks a very fine line between appearance and utility, but it’s more than worth the balancing act when it works. Successful data journalism needs to be interesting, easy, beautiful and true.
  • You don’t just tell stories with visualisation, you find them too. Weird spikes, unusual patterns, data points that look like anomalies – they all prompt further questions. By asking why something is the way it is, you get stories.

Finally Michael Blastland, freelance journalist and creator of BBC Radio 4’s More Or Less, discussed the problems with numbers. My highlights:

  • Numbers are slippery. Where do they come from and why? Just like quotes from sources, every stat is compiled by someone with an agenda and a purpose and most of them are biased in ways we can’t begin to guess till we start digging. Don’t use data just because it’s convenient.
  • Sometimes the story behind the number is more interesting – and more in the public interest – than any story based on the number.
  • I need to learn more maths. Specifically, statistics.

Most journalists haven’t been taught the skills they need to do what these guys do (and they were all guys; that’s not a bad thing per se, but worthy of a note that more women on the stage would be good to see next time). Every speaker told the audience that there’s no special education required to work in their field. We don’t need to be programmers, or designers, or statisticians. But we need to be interested and open-minded and both willing and ready to learn.

But doing this well takes a team, and it takes time. Most journalists will never get the chance to learn or teach themselves. And even if they do – you can be a jack of all trades, you can take a project through from finding the numbers to analysing the data to making it look amazing and simple and easy to use, but it takes a harsh amount of time and is punishingly frustrating to do alone.

Data journalists need support. Time, resources, connections, and people. I’ve not yet met anyone who can do all of this – or even most – alone; certainly not in their spare time, working in the gaps, at the ends of long days. All the people who spoke at the Frontline Club were at the top of the market, doing brilliant work that reaches people, making useful journalism. We need more like them – but we also need the support systems that allow people like them to grow and thrive. Next time, I’d like to see conversations about company culture, about how to evangelise to your newsdesk, about time management and learning and how, exactly, we free up time and space for data work in newsrooms all over the country, from the ground up.

On a related note, the next meetup of Hacks and Hackers London is on October 20. If all goes well, I’ll be there. Come join in.

While We Were Here – turning a festival into a newspaper

What.

While We Were Here is a 16-page free souvenir newspaper with a print run of 4,000. It was put together by a small team of volunteers during this year’s Greenbelt Festival. It included a 4-page black and white comic pull-out in the centre of the paper. You can download a copy of the main paper or the comic in PDF formats.

Where.

Greenbelt Festival takes place over four days at the end of August every year at Cheltenham race course. There’s no accommodation on site that’s not under canvas – so the newspaper team were camping out on the course along with about 20,000 other festival-goers. We appropriated a small box that’s normally used for watching the races and turned it into a newsroom, with two design Macs and three or four laptops at any given time. There were not enough chairs, the carpet went half-way up the walls, and we were constantly watched by pictures of small men on large horses.

Who.

In total there were ten people involved in making the main paper. We didn’t have much to do with the comic guys – they did their own thing and arrived perfectly on time with all their spreads in PDF form. Our team was brought together by Matt Patterson as hands-on managing editor and James Stewart as hands-off. I was the editor. James Weiner and Paul Abbott worked on data and infographics for the paper. Ben WeinerWill Quirk, Geraldine Nassieu-Maupas and Oliver Mayes made up our design and layout team, and Wilf Whitty dealt with some last-minute front-cover design issues.

The rest of the team were primarily design-minded folks and I was (as far as I know) the only one with newsroom experience. As a result partly of that and partly the fact that I’ll organise anything if it stands still near me for long enough, I took charge of content planning and making sure we had something interesting, well-written and appropriate for print on every page.

Why.

As a tangible souvenir, something to commemorate the experience of being at Greenbelt for those who were there and something to express a little of what it was like for those who weren’t. Something that’s separate from the blog or the Flickr stream or the Twitter conversations, a document that physically exists and can be handed around families, shown to children, given to grandparents, in a way that the internet still can’t.

And, in a very real way, we did it because we could.

When. How.

I was one of the last of the team to arrive on site, on Friday morning. At 2.30pm the team met for the first time and found out our general brief. Over the next four hours we hammered out a page plan for the paper, focussing on what we felt were the major themes and events from the Festival that people would recognise and want to read about. We decided who would be covering what in terms of writing content specifically for the paper. I briefedthe Festival’s photographers about what we’d need and when. We made up a flat plan and stuck it to various pictures of horses, and I wrote up a schedule working backwards from our hard deadline – 6pm on Sunday.

We made the paper in just over two days. The design team did a lot of work on Friday night and Saturday morning putting templates and grids together, while I did vox pops and got quotes from various festival punters. I started to put content together on Saturday afternoon, which is when it became clear that we couldn’t use most of the content from the two people who were blogging the festival over the weekend. One person’s writing was very long-form, personal and intellectual, while the other’s was very short-form and timely – both made for great blog posts but wouldn’t work in print. I started roping in people to write reviews and snippets of content, as did managing editor James Stewart. The infographics team finally managed to get hold of some data they could use and started drawing golf buggies in Illustrator.

By Sunday lunchtime we had about half of what we needed copy-edited and in formats ready to put on the page, and we had two neat infographics ready to place. I spent the next three or four hours writing, helping choose pictures, deciding what content needed to go in which boxes, copy-editing and being very rude to other people’s work so it would fit in print-sized boxes, while next to me the layout team collaborated to pull it all in to InDesign and make it look perfect. By about 4pm we had collected all the content we needed; the next two hours involved me pacing around the newsroom, making sure we had everything in the right place, picking different pictures when the ones we had didn’t work out, and occasionally taking a seat and making changes to the text or the design when things simply wouldn’t fit right.

Matt started uploading it at about 6.45pm. Network sloth meant it finally finished at about 8pm. The printers in Peterborough turned their presses on for about a minute and a half, and we had a print run of 4,000 copies. Four hours later thanks to some strangers who drove through the night for us, it was back on site ready for the first copies to be distributed at the last show of the evening.

Lessons learned.

  • Planning is vital, much more so for print than for online journalism. If a blog post doesn’t go up or goes up late, few people will notice. If there’s a hole in your print paper, they definitely will. Thematic planning for something like this is crucial too – content should fit together, images should complement each other, pages should balance. That’s impossible to do with slapdash content delivered at the last minute.
  • Briefing, therefore, is another crucial element. You can’t simply say “Write me 450 words about the music scene.” You need to make deadlines clear and make sure you’ve agreed which bits of the music scene are necessary. You need to talk about tone, audience, readability, style, voice. You need to make clear what’s needed, even when you’re both up against deadline, so that the content you get back is useful and takes the minimum of editing or rewriting.
  • Build in redundancy. One of the reasons the paper worked well despite some of the content-related setbacks we had is that we did our best to get hold of more content than we needed – about half as much again. If I was doing it again I’d be shooting for twice as much, if not more. If it’s not used in the paper, it could go online; if it’s something that works better online, we wouldn’t have to force it into a print style. And if it doesn’t turn up, it doesn’t matter.
  • Get data well in advance. Infographics are awesome but they can’t be created without data. If you have a tight deadline and you’re including data-driven charts or graphics, that’s the bit you should sort out first. We didn’t, and that’s why we only have two in the paper.
  • Basic newspaper design skills are invaluable, even if you’re not a designer. If you’re planning content for pages, you need to understand how boxes fit together on a page, how headline size and positioning alters layout, what a baseline grid is, the difference between a 3-col and 4-col layout for a page, and a dozen other little things that don’t bother you while you’re writing but that become vital as soon as you’re laying out. You need to know the rules, what they are, how they can be bent and when they can be broken. Otherwise you end up coming in and asking questions like “Are we really wedded to a serif font?” and “Do we really need to lock to grid?” half an hour before final deadline. (Yes, this happened. No, it wasn’t me.)
  • If you’re distributing content across multiple channels, a convergent newsroom is potentially a huge timesaver. This would have prevented completely the problems we had with last-minute content and having to repurpose pieces that were not right for print in their original forms – but it takes a lot of advance planning. Having a pool of writers – not necessarily bloggers or writers for print, just writers – who could be briefed individually by the blog editor and the newspaper editor, and whose work could be pulled to be used in one or both formats, would have been very valuable. Doing the same with images and video could mean a converged team in three parts: content creators at one end, putting their work into a big pool; editors in the middle, picking out the best of the bunch or the most appropriate for their medium; and distributors at the other end, feeding that work into the newspaper, the blog, Twitter, Flickr, Vimeo, the various other channels including feeding out to the magazine shows and round-up events on site – and making it easy for the press office to pass out the best of what’s on offer too. I think this is the biggest thing I’ve taken from the experience – I grok convergence much better now I’ve seen it from the editor’s point of view.

The paywall debate

The Wall An interesting post extolling the virtues of the paywall by Julien Rath as part of journalism.co.uk’s excellent TNTJ group blog has really gotten me thinking. Not because I agree – far from it – but it’s finally forced me to put into words my own views on the massive paywall debate. I don’t like them. I don’t think that most papers have ever been bought on the basis of the news content – or even the op-ed and columns. (Sometimes the columns – Bridget Jones springs to mind – but rarely, and certainly not enough to subsidise an entire paper.) Asking people to pay on the web for things they don’t necessarily value enough to pay for in print – this seems pointless to me. There’s a laundry list of ideological complaints about paywalls. They trap journalism behind a wall, cutting off access to information in a terribly anti-open-web sort of way. They create gated communities where dissent is unlikely and where the turbulent streams of the open web can’t intrude – for better or worse. They ensure a sort of private members’ club that cuts off those who can’t or don’t want to pay, which can be a blessing or a curse depending on your point of view.

Ideology aside, my most basic reason for disliking paywalls is business based. We have declining circulation in print, which means very few new paper readers will come to our websites based on what we’ve put in our newspapers. One of the obvious ways to gather new readers therefore is online, getting young people used to seeing our content linked on Facebook, Twitter, social networks they belong to and appreciate, in the hope that we can drive brand loyalty through those platforms and maybe, eventually, a few of those people will start reading the paper. What happens to that model if there’s no accessible content online? It dies. What’s the plan to attract new readers to your brand above all others if it’s all behind a paywall? I haven’t yet seen one that works. It doesn’t matter how well-written or wonderful your editorials are – if no one can link to them they aren’t going to drive new traffic to your site. Breaking news content online will rarely if ever be unique outside exceptionally specialist circles. Commentary, analysis, feature articles are more “valuable”, but very rarely irreplaceable given the vast amount of alternative and specialist content available for free elsewhere. And many news consumers now read what their social circle reads and links. We come through that to like personalities or subject-specific content, but that’s not the same as a brand loyalty – I read Charlie Brooker and the Guardian Datablog regularly, but that doesn’t mean I ever read the Guardian homepage. Paying for the whole Times website when I just want Caitlin Moran doesn’t make a lot of sense to me – especially when I can’t search for Times content using my normal methods (Google) and no one else links me to it because it’s all behind a wall, so I’d have to go hunting for it specifically if I wanted to include it in my daily reading. If many other net users are like me then they won’t be willing to pay for a whole bundle when what they want is one strand. I’m more open to the idea of limited paywalls on sites like the proposed New York Times one, where only very regular readers – the folks who are already brand loyal – get charged for content. I still think they do more harm than good, because at that point you’re essentially punishing people for liking you too much. If the expectation is that content is free, suddenly charging is going to irritate people and drive them away from engaging too strongly. Yes, journalists need to be paid for what we do. We need to eat and live, after all. I’m interested in the idea of micropayment systems that let me pay pennies at a time for content from any one of hundreds of news sources – from specialist science papers via Athens through the Financial Times through the Sun, I suppose, pretty soon. I’m interested in untapped affiliation potential – ticket sales, restaurant bookings, holidays, iTunes links next to band reviews. We can still make money from picture sales, family notices and so on, but we can do it in new ways – like the death notices my paper has set up where a single payment gets you not just the notice in the paper but also a living page that remains as a permanent and changing tribute. And that’s before we get into serious targetted advertising solutions, or the content changes that have got the Mail Online to where it is today. [Edit to clarify: I’m not suggesting that any one of these is a magic bullet that will save the news industry. I’m simply pointing to possible multiple revenue streams that I feel are worth exploring to see whether they could go some way towards paying for news.] I’m not Rupert Murdoch. I haven’t sat in front of the figures or done the maths with real audience numbers, so like most other people I’m just having a good old reckon. Still, I reckon there are better ways forward than paywalls. What do you think?