Step 2: Developing open data
Once data has been identified, high-priority datasets must be prepared for publication on the Open Data Portal. The Open Data Team will work with divisions to decide on the structure of the open dataset, develop good metadata, mitigate privacy or sensitivity concerns, and provide context to help the public understand and use the data effectively.
Roles and responsibilities
| During the development phase, divisions should… | The Open Data Team provides support by… | 
  | 
  | 
Technical guidelines
To participate effectively in the open data program – and to reduce the administrative burden of publishing and maintaining open datasets – divisions are expected to provide data in accordance with the City’s open data guidelines.
The guidelines are maintained by the Open Data Team and represent the technical best practices of open data programs across Canada, and around the world.
If a division has concerns about meeting any of the guidelines, the Open Data Team can help.
File formats
The City’s open data portal can host nearly any file type. However, to provide the best possible experience for open data users, the Open Data Team recommends data be provided in one of the following formats:
- CSV
 - JSON or GEOJSON (provided there are no nested structures within the file)
 
While it is not recommended, Divisions can provide open data in Microsoft Excel format (such as XLS), provided the data is tabular and unformatted. By ‘unformatted’ we mean:
- Cells should not be merged;
 - There are no line breaks nor summary rows within the data;
 - There are no charts, graphics, or other inserted features or media in the file.
 
If a division cannot provide data in the formats listed above, the Open Data Team can work with them to find alternate methods of hosting the data; there may however be restrictions on how the data can be presented, maintained and used.
Geographic Data
If an open dataset contains geographic data – e.g. its data could be put on a map – it should be prepared according to the following guidelines and best practices.
If you are unsure how best to format geographic data, please contact the Open Data Team or the City’s Geospatial Competency Centre.
Coordinate Reference System
If a dataset includes GIS coordinates, they should use the WGS84 reference system (that means EPSG 4326). If coordinates are not WGS84, there will be limitations about how the data can be presented on the open data portal. A city address is not the same as coordinates; coordinates can be represented by latitude and longitude, which tend to be numbers that can look like (for example): -79.4037270745336, 43.729805682334.
Geocoding addresses
Many datasets may contain address data. To provide the best experience for users of open data, it is recommended that address data be converted to coordinates; this is called geocoding. You can do this manually by using this intranet site managed by the Geospatial Competency Centre.
If you want your data geocoded automatically, reach out to the Open Data Team to discuss options.
Data accuracy guidelines
Datasets published on the open data portal should be accurate. However, given the volume and complexity of City data, accuracy can be challenging to define.
In general, if data is considered accurate enough to be used in City decision-making or reporting activities, then it can likely be released as open data.
Minor concerns about data accuracy should not be considered a blocker to opening data; such concerns can often be addressed by providing the appropriate metadata or context on the dataset’s page.
For example, Toronto Water provides Sewer Gravity Mains while making clear that these datasets are not a substitute for diligent field examinations, and Toronto Shelter Services’ Daily Shelter and Overnight Service Occupancy dataset notes the data may not reflect actual capacity at City shelters and should not be used to determine whether a bed is available.
The Open Data Team can also help you determine the best method for mitigating any risks associated with data quality issues:
- Describing the limitations of the dataset in the metadata. Concerns can often be addressed by providing the appropriate metadata or context on the dataset’s page – to encourage responsible use by the open data community.
 - Providing a reliable subset of data. Divisions can also consider providing a subset of a data source if the data is of high-value data (e.g., some trails but not all).
 
If a division has serious concerns about the accuracy of a dataset, they may delay publication until accuracy issues can be addressed. If the data is considered high priority, divisions should prioritize addressing those concerns as soon as possible.
Metadata guidelines
Because open data is used by the public for a wide variety of purposes, it is important that open datasets possess good metadata. Sharing information about what the data is, how it was collected, and its technical specifications helps people use open data effectively and responsibly.
Metadata is also used to populate a given dataset’s page on the open data portal.
In general, data submitted for publication on the open data portal should conform to the City’s Descriptive Metadata Standard.
Open datasets must include the following metadata fields:
- Dataset Name: A plain-English name that will be used both as a title for your dataset page and as a part of a unique ID for this open dataset.
 - Owner Division: The division that owns this data. If multiple divisions are involved, you should identify the division most associated with the Public Contact Email below.
 - Informational Website: A webpage where people can learn more about this data or your division.
 - Public Contact Email: A group or shared email address (not an individual person’s email address) wherein questions associated with this data can be managed. Questions received here that are about the Open Data Portal, and not the data, can be forwarded to the Open Data Team.
 - Dataset Description: A block of formattable text that describes and contextualizes a dataset. This is where data owners can address questions from the public before they come to your inbox.
 - Subject: A list of keywords that describe the topic and content of the dataset. For more information on keywords, see the City of Toronto’s subject thesaurus and metadata standard.
 - Created: The date the open dataset was first published; this field will be added when the dataset is published to the Open Data Portal.
 - Update Frequency: A schedule for how often the dataset is updated (e.g. in real time, daily, weekly, monthly, annually, etc.)
 - Classification: Classifications can be found in Toronto Municipal Code Chapter 217, Schedule A, an authority comprising a description of a body of records, a retention period for those records and a disposition rule stating whether, at the expiry of the retention period, the records are to be destroyed or preserved by the City of Toronto Archives.
 - Limitation of Use: A block of formattable text that outlines restrictions on the use of the dataset, and/or known limitations in its application, interpretation or use.
 - Excerpt: a short, one-sentence version of your description.
 
You can also connect with the Vocabulary and Metadata Program in Corporate Information Management Services, City Clerk’s Office (cvsupport@toronto.ca) for assistance creating quality, accurate metadata for your open datasets.
Privacy guidelines
The City must balance its commitments to transparency and accountability with its duty to protect sensitive, confidential, personal, or personal health information, aligned with the obligations found in the Municipal Freedom of Information and Protection of Privacy Act (MFIPPA) and Personal Health Information Protection Act (PHIPA).
The City does not publish open data in cases where such information cannot be adequately protected, or where the privacy protections would make the dataset unusable.
Divisions should work with the Open Data Team and other Corporate SMEs (e.g. City Clerk’s Office, Legal Services or the Office of the Chief Information Security Officer) to ensure data published on the open data portal does not contain:
- Information deemed exempt-for-review, excluded, personal information or personal health information under the City’s Information Protection Classification Standard. See our section on sensitivity for more information about how the standard intersects with open data.
 - Information subject to a mandatory MFIPPA exemption, such as third-party information and information received in confidence from other governments.
 - Data that could result in harm to the public or jeopardize City operations if released.
 - Data collected by a third party, where the data-sharing agreement does not allow the City to make that data public.
 
Just because a dataset contains any of the above information does not mean it cannot be modified and made available as open data. Datasets can often be altered to remove sensitive information while still providing value to users.
For example, Toronto Fire Services’ Emergency Incident dataset includes locations of paramedic calls, which are considered personal health information. Fire Services worked with Paramedic Services and the Open Data Team to aggregate location data into intersections and forward sortation areas (the first three digits of postal codes). The resulting dataset protects personal privacy while still fulfilling Fire Services’ commitment to open data.
Methods to protect privacy in open data
The Open Data Team, in collaboration with the City Clerk’s office, can help you determine the best method for de-identifying or safeguarding your data, including, but not limited to:
- De-Identifying, masking or suppressing data: Manually or automatically removing personal, sensitive or confidential information before publication.
 - Aggregating data: Reducing the precision of numerical or demographic data, such as age or income, or aggregating location data to protect privacy (e.g. converting addresses into major intersections or forward sortation areas).
 
If a division believes the release of a dataset may pose a security, privacy, or confidentiality risk, they should consult with their Legal Services solicitor or the City Clerk’s Privacy Unit (privacy@toronto.ca). The Open Data team can also propose mitigations, or help connect divisions with the relevant subject matter experts for consultation or review.
For a a deeper dive into data-related privacy issues, consult the Ontario Information and Privacy Commissioner’s De-identification Guidelines for Structured Data.
Third-party data guidelines
The Open Data Portal typically hosts data created by and/or owned by the City. However, there may be cases where hosting third-party data creates value for users.
In order for third-party data to be considered for hosting on the portal, the following criteria should be met:
- A division has an existing relationship with a data partner and is using their data in a way that meaningfully contributes to City decision-making or service delivery.
 - The division believes that having their partner’s data accessible on the portal would streamline or add value to their operations.
 - The data is not readily available, or available only in a limited form, elsewhere.
 - The data is determined to be of interest or utility to the public.
 - The external data partner is open to having their data hosted on the portal and can commit to regularly updating and maintaining the data.
 - The data is able to be shared publicly by the City, according to all relevant laws and policies.
 
If a division is interested in hosting third-party data on the Open Data Portal, they should contact the Open Data Team.