Last week I went to a Frontline Club event on data journalism. There’s a live stream of the talks and questions here along with a good brief rundown of who said what. It’s well worth watching all the way through for four good talks exploring four very different aspects of data journalism.
First up, Simon Rogers of the Guardian Datablog, talking about crowdsourcing, wading through the MP’s expenses releases and the Wikileaks Afghanistan files, and working to visualise the data they receive. My highlights:
- A huge amount of the hard work done on the MP’s expenses project was done by one person. Some individuals give a great deal.
- For the Wikileaks files, they set up tasks and asked people to complete them. Simon didn’t say whether this had much of an effect on bounce rates or completion – but it’s likely that they did. And if so, perhaps the next step is gamifying the process.
- But the Guardian hasn’t yet had any useful data come from crowdsourcing. Maybe they’re asking the wrong questions or requesting that people do the wrong tasks; maybe the wisdom of crowds is a different beast from the action of crowds.
Second, Julian Burgess, currently developing at the Times and soon to be off to New York, talking about how to tackle data. This was perhaps the most directly practical talk, with pointers to some great tools and tips on techniques – his slides are online here. My highlights:
- When presented with data, don’t panic. Take your time, work through a sensible series of steps to analyse and work out the best approach to what’s in front of you.
- There are buckets and buckets of tools out there, most if not all of them free. They’re big, they’re powerful, they’re incredibly useful. I took away the impression that a key skill for data/CAR journalists is knowing which tools are going to be good for handling which datasets – when to use Wordle vs ManyEyes, when to use Freebase Gridworks rather than Google Spreadsheets.
- Metadata. This is where my little linked-data brain lit up as a puzzle piece fell into place. Hidden metadata is still part of a dataset – and it can tell stories about how the data came to be in this place, in this way, shape and format. That’s useful.
David McCandless, whose gorgeous data visualisations can be found at Information Is Beautiful spoke about visualising data and the stories it can reveal. My highlights:
- Visualising something really well takes a lot of time and a lot of hard graft. It takes – this is going to sound obvious – a vision.
- Making something that’s both beautiful and conveys information is hugely difficult and walks a very fine line between appearance and utility, but it’s more than worth the balancing act when it works. Successful data journalism needs to be interesting, easy, beautiful and true.
- You don’t just tell stories with visualisation, you find them too. Weird spikes, unusual patterns, data points that look like anomalies – they all prompt further questions. By asking why something is the way it is, you get stories.
Finally Michael Blastland, freelance journalist and creator of BBC Radio 4’s More Or Less, discussed the problems with numbers. My highlights:
- Numbers are slippery. Where do they come from and why? Just like quotes from sources, every stat is compiled by someone with an agenda and a purpose and most of them are biased in ways we can’t begin to guess till we start digging. Don’t use data just because it’s convenient.
- Sometimes the story behind the number is more interesting – and more in the public interest – than any story based on the number.
- I need to learn more maths. Specifically, statistics.
Most journalists haven’t been taught the skills they need to do what these guys do (and they were all guys; that’s not a bad thing per se, but worthy of a note that more women on the stage would be good to see next time). Every speaker told the audience that there’s no special education required to work in their field. We don’t need to be programmers, or designers, or statisticians. But we need to be interested and open-minded and both willing and ready to learn.
But doing this well takes a team, and it takes time. Most journalists will never get the chance to learn or teach themselves. And even if they do – you can be a jack of all trades, you can take a project through from finding the numbers to analysing the data to making it look amazing and simple and easy to use, but it takes a harsh amount of time and is punishingly frustrating to do alone.
Data journalists need support. Time, resources, connections, and people. I’ve not yet met anyone who can do all of this – or even most – alone; certainly not in their spare time, working in the gaps, at the ends of long days. All the people who spoke at the Frontline Club were at the top of the market, doing brilliant work that reaches people, making useful journalism. We need more like them – but we also need the support systems that allow people like them to grow and thrive. Next time, I’d like to see conversations about company culture, about how to evangelise to your newsdesk, about time management and learning and how, exactly, we free up time and space for data work in newsrooms all over the country, from the ground up.
On a related note, the next meetup of Hacks and Hackers London is on October 20. If all goes well, I’ll be there. Come join in.
There’s an NHS Local hackday on Weds Oct 20 in Bham and the second Bham Hacks/Hackers that evening if you want to come (translation: you’re hugely welcome, as always!)
I’ll have to pick whether I go to London or Birmingham it seems – thanks for the welcome. Does Brum H&H have a meetup page somewhere?
Nice write-up, mine is here: http://scijourntraining.wordpress.com/2010/09/23/data-at-the-frontline/
I’ve just started in a new position based at the Royal Statistical Society, coordinating science/stats training from journalists. Would love to hear from you about what and how you’d like to learn about stats.
I’d like resources I can refer to online. Glossary type things are always handy – what is a standard deviation and why do I care? What does “statistically significant” mean? What are false positives and what do they mean for my story? Those are pretty basic questions but a Stats 101 resource would get a lot of use from journalists working on science stories, I reckon.
Then there’s folks like me who want to learn data manipulation and might benefit from tutorials and examples. I’d love an online workbook – something I could work through that’d let me sharpen the skills I’ve got and practise the ones I don’t without potentially messing up a project by doing it wrong. Something practical.
Is that helpful?
Yes, thanks! Have you seen the Making Sense of Statistics booklet? (http://www.senseaboutscience.org.uk/PDF/MSofStatistics.pdf) I think it does some of what you ask. I’m also collecting together useful online resources on my website at http://scijourntraining.wordpress.com/resources