Open data workflow

Topic Progress:

Click on the numbered tabs to navigate this lesson.

How open data is created

Typically, the open data workflow consists of three separate roles.

Data producers are responsible for design, methodology, data collection, validation, and production of final datasets and supporting documentation.

Finished data then goes to data curators, who manage the open data catalogue and are responsible for data publishing, maintenance and distribution to end users.

End users access and use the data and may have questions or provide feedback to the curators, who can serve as intermediaries to the producers.

Data curators thus play a vital role in the virtuous cycle discussed in An introduction to open data.

Data workflow stages

Here is a summary of the open data workflow process:

Stage 1

Data producer: Design, methodology, collection, validation and production of data + supporting documentation.

Stage 2

Data curator: Manage open data catalogue + publish and distribute data.

Stage 3

Users: Access and use data.

Stage 4

Questions and feedback from users.

Data workflow in practice: Transport data example

Let’s use transportation data to illustrate a typical open data workflow.

Data producers in government ministries might start by collecting and calculating statistics covering multiple districts and modes of transport. Data producers then work with data curators to publish the finished data in standard open data formats with accurate metadata, which we’ll discuss later in this lesson.

Users can now access the data, and they may use it for analysis or to create new data services, such as travel guides or congestion forecasts.

Roles during the feedback stage

During the feedback stage, data producers and data curators will play specific roles in relation to user feedback.

Users may request additional documentation or data detail, or that the data be provided in a different format

It is then up to the data producer to make improvements for future updates. Data curators can work with the data producers to respond to these requests and make the data more useful in future.

In the case of something like transportation data,  updates are performed frequently. This process likely repeats itself on a monthly, weekly, or even daily basis.

For high frequency updates, the transfer from data producer to curator is often automated.

Test your knowledge

Open data workflow