Turning data into stories

Topic Progress:

Click on the tabs to navigate this lesson.

Welcome to the world of data

Now that you know what data is, this lesson will show how it can be used to create public service stories by journalists and civil society organisations.

To begin with, we’ll look at open data and how it can be used to source and verify stories. A number of case studies will demonstrate how data from various sectors have been used to create data-driven stories.

In this lesson, you will learn:

  • How to use data to answer important global issues
  • What open data is
  • How to go through the data journalism process

Getting started

Progress on MDG 4, source The Guardian

Let’s take a look at an issue that’s tremendously important across Africa. One of the Millennium Development Goals (MDGs) that countries across the world set out to achieve by 2015 was to reduce the under-five mortality rate by two-thirds between 1990 and 2015. Has the world achieved this goal? Has Africa?

Data helps us answer this question. Not only that, it helps uncover insights that may help fill the gaps that remain in addressing this goal. Data journalists have used this data to tell engaging stories to help citizens and policymakers to make better decisions for the future. The image above is an example from The Guardian, which clearly tells the story of which regions did and did not achieve the MDG in this area.


What is “open data”?

Open data is data that has been collected by a government body, NGO or private organisation, and published online for others to use.

With a global push for open data, many governments and international organisations are creating their own open data portals with the data they have aggregated and opened. These portals are a source of rich information for civic engagement and generating public interest stories. These portals are by no means the only sources of data since much data lives on various government ministry websites.

Some important datasets that are (or could be) open come from personal data about individuals that has been aggregated and “anonymised” through a process of removing personally identifiable data. In other words, although you may have access to a dataset about HIV/AIDS infection rates, it should be impossible to work out if a specific citizen suffers from the disease using that data.

Much statistical information ultimately comes from surveys of individuals, but the end results are heavily aggregated so that individuals can’t be identified.

If you’re planning to publish your own data, and you should, you need to think carefully about whether or not individuals can be identified through is and how to protect privacy.

Even datasets that have been anonymised can sometimes be reverse engineered to reveal personal details that should not be in the public domain. Bear this in mind when publishing your own data – don’t inadvertently reveal private details about members of the public without good cause.

The data journalism process

The data journalism process, as described by Paul Bradshaw at Online Journalism Blog.

From child mortality to budgets, a number of issues of public interest can be understood and explained with data in a manner that’s engaging and incisive. So how does dense data in tables, say in an Excel sheet or a report, turn into engaging narratives and visuals? This involves a process that is being evolved across newsrooms in various countries – including in Africa. CSOs and NGOs have also applied similar methods to work with data. Here are the steps a typical process may involve:

  • Compile Data journalism begins in one of two ways: either you have a question that needs data or a dataset that needs questioning. Whichever it is, the compilation of data is what defines it as an act of data journalism.
  • Clean Having data is just the beginning. Being confident in the stories hidden within it means being able to trust the quality of the data – and that means cleaning it. Cleaning typically takes two forms: removing human error and converting the data into a format that is consistent with other data you are using.
  • Context Like any source, data cannot always be trusted. It comes with its own histories, biases, and objectives. So like any source, you need to ask questions of it: who gathered it, when, and for what purpose? How was it gathered? Who can explain the data?
  • Combine Good stories can be found in a single dataset, but often you will need to combine two together. After all, given the choice between a single-source story and a multiple-source one, which would you prefer?
  • Communicate In data journalism the all-too-obvious thing to do at this point is to visualise the results – on a map, in a chart, an infographic, or an animation. But there’s a lot more here to consider – from the classic narrative to news apps, case studies and personalisation.

In the next lesson, we’ll look at some specific case studies around what makes a good data story.