Blog article: Migrate, Automate, Optimize: An easier, faster and more efficient way forward

Migrate, Automate, Optimize: An easier, faster and more efficient way forward

Article text

Open data has taken significant strides forward thanks to the efforts of City staff and agencies. Now, we want to improve the open data process for both producers and consumers. We want to reduce the amount of work it takes to prepare data for publication. We also want to reduce the time and effort required to clean and restructure data prior to it being ready for use.

First steps

We evaluated each dataset on toronto.ca/open to better understand where improvements can be made. Here’s a snapshot of where we are today:
  • Total datasets today on Toronto.ca – 292
  • Total unique dataset files – 1329
  • Total unique file formats – 21
  • Total % machine-readable – 40%
  • Total % in open format – 27%
  • Total % published last year – 38%
  • Total datasets: the total number of unique datasets hosted on toronto.ca/open

Glossary

Total datasets: a collection of files that together share the same meta data and are produced by the same source. Total unique dataset files: each dataset can comprise of multiple files. This metric refers to the total individual files a user can download on the portal. Total unique file formats: Datasets are provided in a range of file formats. The same data can be presented as a .CSV, an .XLS, or a geospatial format like geojson. Machine readability: In order for data to be truly ‘open’, it must be readily consumable digitally. This means that the files can be automatically read and processed by a computer. Open format: Data can be available in two types of formats: open and non-open (or proprietary). Proprietary formats are usually difficult to open without paid solutions like Microsoft Word. Open formats are editable and don’t need any specialty software for access.

What else did we learn?

Our assessment considered the overall design and function of the open data catalogue. Users explained to us that they struggled to find datasets with the existing search. They were also unsure of how to use some of the available file formats. This led us to ask ourselves: how can we make the process of finding and using data more intuitive? How can we publish readily machine-readable datasets without significant manual effort? Most importantly, how can we make it easier for technical and non-technical users to explore the true potential of open data? These questions were the key drivers for the migration, optimization and automation processes. The criteria we compiled will help guide staff on the steps required for a successful transition to the new portal. These efforts will result in higher quality, easier to use, instantly accessible open data.