Step 4: Maintaining open data


The open data journey does not stop once a dataset is published; divisions are expected to maintain, and where possible, improve, their existing open datasets over time. 

Roles and responsibilities

During the maintenance phase, divisions should… The Open Data Team provides support by… 
  • Ensure datasets are updated according to their stated schedules
  • Work to improve the data quality of published datasets if necessary
  • Build automated publishing pipelines for datasets that are being published manually, where possible and appropriate
  • Notify Open Data if a dataset needs to be revised, retired or removed
  • Notify Open Data if a source dataset is subject to disposition per the City’s retention policy
  • Providing guidance on how best to maintain open data and provide valuable experiences to data users
  • Collaborating with divisions to update or improve datasets
  • Collaborating with corporate SMEs regarding the retention requirements of datasets
  • Notifying divisions of any public requests or questions related to their data

Data quality guidelines

To help divisions assess data quality and identify opportunities to improve, every open dataset on the City’s portal is assigned an automated quality score based on five factors:  

  • Freshness (35%): is the data being updated according to the stated schedule?  

    For example, if a dataset is supposed to be updated weekly, but it hasn’t been updated in two months, its score will be reduced.  

  • Metadata (35%): has the required metadata for the dataset been provided by the data owner?  

    A dataset’s score will be reduced if required metadata fields are empty, or if insufficient metadata has been provided.  

  • Accessibility (15%): is the data appropriately tagged so users can easily find it, is it automatically updated, and can it be easily previewed or visualized by users?   

    A dataset’s score will be reduced if it lacks appropriate tags, requires manual updates, or is not stored in the Open Data database (which allows files to easily previewed and accessed in multiple formats).  

  • Completeness (10%): is the data, per the City’s policy, exhaustive? Or is data missing or inconsistent?   

    A dataset’s score will be reduced if contains more than 50% null values (null values indicate the lack of a value, which is not the same as a zero value).   

  • Usability (5%): is the data organized in a way that can be easily understood by users?   

    A dataset’s score will be reduced if less than 1/5th of the column names have meaningful English components, or if all of a single column’s values contain “NA” or a similar value.   

The scores for each individual factor combine to provide an overall score:  

  • Datasets with a score of 80% or above are gold   
  • Datasets with a score of 60% to 79% are silver  
  • Datasets scoring 59% or lower are bronze   

These scores and the information used to calculate them are shared publicly on each dataset’s page. 

Timeliness guidelines

When an open dataset is published, the owning division must specify an update frequency. This update schedule is included in the dataset’s metadata and communicated to users on the open data portal.  For example, the Open Data Team maintains an open dataset of web analytics for the City’s open data portal, and it is updated monthly.  

Divisions are responsible for ensuring datasets are updated according to their listed schedule. If a dataset is consistently not updated on schedule, its data quality score will be impacted.   

It is best practice for open datasets to be updated automatically and as frequently as possible, to ensure the public has access to the most timely and relevant data.   

If there is demand from the public or an identified business need to update a dataset more frequently than its listed schedule, the Open Data Team can help.