Step 1: Identify and prioritize data


The first step in the open data process is to consider the data you collect, analyze and use, and identify potential open datasets. Then you can decide how to prioritize those datasets for publication.

Roles and responsibilities

When identifying and prioritizing data, divisions should…The Open Data Team provides support by…
  • Continuously add to their division’s dataset inventory and, on an annual basis, prioritize the data in accordance with the City’s Open Data Policy, Data Accountability Framework and Information and Data Governance Policy.
  • Notify Open Data team about digital tools (e.g., interactive dashboards) to ensure embedded data is on the open data portal. (2019 GL8.22).
  • Ensure internal datasets that substantially inform staff reports are made available on the Open Data Portal or tracked for future publication. (2021 EX22.13).
  • Providing templates and tools to help divisions identify, inventory and prioritize potential open datasets.
  • Consulting with divisions on the appropriateness of data for publication, including by identifying possible privacy or other issues.
  • Helping divisions respond to any open data-related requests from Council or committees.
  • Sharing information about how divisional open data is being used as well as data requests from the public.,

Flagging open data in staff reports

City Councillors may expect data referenced in a staff report is also available on the Open Data Portal (see 2021 EX22.13).  

For assistance in assessing the appropriateness, value, and priority of data referenced in a staff report, consult our Guidance for Staff Report Writers.

We recommend assessing data that:

  • Is City-held;
  • AND substantially informs any “hot” or “major/strategic” report intended for a standing committee.  

In such cases, contact the Open Data Team and flag “Open Data Implications” in the Agenda Forecasting System.

Where appropriate, the Open Data Team will help develop an accelerated timeline to open the data together with or shortly after the report is published. In other instances, the potential open dataset should be added to a division’s data inventory, the contents of which are prioritized annually.  

Inventorying data

As part of their participation in the Open Data program, Divisions must create inventories of the datasets held in their trust. These inventories are reviewed annually and updated with new datasets surfaced through divisional activities and communications. The resulting inventories are also made available on the Open Data Portal. 

Datasets should not be excluded from divisional inventories based on privacy or confidentiality concerns. The goal is to provide a holistic sense of what data is available, so the City can make informed decisions about which data to prioritize for publication.  

Inventories should include key information about the data, including:  

  • The name and description of the dataset;  
  • The dataset’s sensitivity (including whether the source data contains personal or confidential information);
  • The relative value of the data to users and its publishing priority (see the prioritization section for more information);  
  • The data’s source system (e.g. in what enterprise system is the data stored), if applicable;  
  • Information about who owns, administers or stewards the data.  

Prioritizing open data

The City’s Open Data Policy focuses on quality over quantity. Divisions are not expected to publish high volumes of data, but they are expected to prioritize publishing high-value data, e.g. data that the public wants and is likely to use.   

When prioritizing potential open datasets – whether as part of the annual inventorying process or in more ad hoc cases such as staff reports or dashboards – divisions should use their best judgment to rank the data’s value and sensitivity. These rankings can then be used to calculate a dataset’s priority for publication.   

The process ensures datasets that are high value and able to be easily and responsibly opened published first. 

Value

Value is a measure of a dataset’s demand and potential for impact, and can be ranked high, medium or low.  

High
  • There are existing and ongoing requests for this data;
  • OR this data addresses pressing information needs or pain points (within or without the City);
  • OR the division has heard compelling examples of how this data could be used.
Medium
  • This data may be useful for other divisions or for people external to the City;
  • OR the division occasionally receives requests for this information;
  • OR they have heard some examples for how this data could be used.
Low
  • This data has unclear value for either the public or other City divisions;
  • OR the division has never received requests for this data;
  • OR they have never heard of a use case for this data.

Sensitivity

A dataset’s sensitivity is a determination of how suitable the dataset is for public release, and whether review or mitigation is required to safeguard private, confidential or sensitive information.  

For additional information on how to assess a dataset’s sensitivity, please consult the City’s Information Protection Classification Standard.

IM Protection ClassificationDescriptionSensitivityOpen Data Considerations
PublicRecords that are or can be available to the public without restriction, including any records that can be accessed without a Freedom of Information request or routine disclosure request.  LowCan be published as open data without review  
Routinely DisclosedOperational and administrative records that can be released without a Freedom of Information Request, including any information available in a division’s routine disclosure plan.  LowCan be published as open data without review  
Exempt-for-reviewRecords that are exempt under Part 1 of MFIPPA.

For examples, please consult the City’s Information Protection Classification Standard (page 8).
MediumMay be able to be published as open data, but only after review, and only if sufficient mitigations can be put in place  
ExcludedSensitive or confidential information that has restrictions on its access.

For examples, please consult the City’s Information Protection Classification Standard (page 8).
HighMay be able to be published as open data, but only after review, and only if sufficient mitigations can be put in place   
Personal or Personal Health InformationRecorded information about an identifiable individual.

For examples, please consult the City’s Information Protection Classification Standard (page 8).
CriticalCannot be published as open data unless personal information can be de-identified and/or aggregated, and only pending review by relevant SMEs  

Restricted data

For the purposes of inventorying and prioritizing data, divisions may also classify a dataset as restricted. This should only be done in cases where:  

  • The dataset contains information that, if released, could lead to harm to the public or jeopardize City operations;   
  • AND where no mitigations are available to effectively safeguard, de-identify or remove that information;
  • OR where mitigations would eliminate the usefulness of the dataset for the public.  

If a division decides to classify a dataset as restricted, it should be identified in their inventory, marked as restricted and a rationale for the restriction must be noted.  

Priority

A dataset’s priority is the order in which it should be made available as open data relative to other datasets belonging to the division. A dataset’s initial priority is automatically calculated based on its value and sensitivity. 

Value
Low Medium High
Sensitivity Low P2 P2 P1
Medium P3 P2 P2
High/Critical P4 P3 P2
Priority 1
Public datasets that are in high demand should be first on the list for publication.
Priority 2
These datasets range from high to low demand and may have to be thoughtfully published to protect private and sensitive information.
Priority 3
These datasets should not be prioritized for publication unless all P1 and P2 datasets have been published.
Priority 4
These datasets do not need to be published until the rest of the datasets in your inventory have been opened. In some cases, we may not choose to publish these datasets or may only make them available internally.

While this provides a simple way to evaluate a dataset’s priority, other factors may affect a division’s ability to publish a dataset.   

For example, if it is very easy to publish and maintain a P2 dataset, it may be moved up to P1. Or if there are major concerns about data quality or accuracy for a P1 dataset, it may be wise to adjust it to P2 until those concerns are addressed.   

Divisions are only expected to provide an adjusted priority in cases where that adjustment influences their annual publication plan (e.g. when a P1 dataset is adjusted to be a lower priority for that year). If a dataset’s priority is adjusted, a rationale should be included in the division’s inventory.   

The table below lists some considerations that may influence the decision to move an individual dataset up or down the priority list.  

Factors that may increase priorityFactors that may decrease priority*
  • The data is easy to publish and will require little effort to update or maintain.
  • The data pertains to activities with strong ties to the City’s Corporate Strategic Plan and priorities.
  • Releasing the data would support or enable socio-demographic analyses (including race, gender and disability).
  • The data is frequently used in the preparation of staff reports.
  • The data is used to enable applications or visualizations, including dashboards, on Toronto.ca or other public-facing City communications.
  • Elected officials have requested the data through motions at Council or committee.
  • There is an imminent migration to a new backend system (and this would create additional work by automating the publication twice).
  • You have major data quality or accuracy concerns such that the data is not usable.
  • The data is not available in a structured manner (e.g. it’s not in a database or unformatted spreadsheet).
*If any of these concerns are present, data owners should raise them with the Open Data Team, as support may be available to help address the issue.