Toronto’s $1.7B Parking Enforcement System

Toronto’s parking ticket dataset — 37 million records across 17 years — is one of the largest municipal enforcement datasets in Canada. It tracks how we share curb space, how bylaws are enforced, and where the city focuses its attention.

Open data should tell stories, not sit in spreadsheets. The City provides a CSV file with 37 million rows, but that alone doesn’t tell you much. So I built a system that turns that mountain of tickets into something anyone can explore: a searchable map and a set of live analytics showing how parking enforcement actually works in Toronto.

The Challenge: 37 Million Rows, 0 Context

Here’s what’s in the raw data:

  • 37 million tickets (2008–2024)
  • 658 different infraction types
  • 722,000 variations of street names
  • $1.72 billion in total fines
  • 2.29 million unique violation-street combinations

If you’ve ever opened one of these datasets, you know the problem. It’s complete, but it’s chaos. Streets are misspelled. Codes don’t line up. And simple questions — Where do most tickets happen? — take minutes or hours to answer.

That’s what I wanted to fix. I wanted a way to ask, “What does enforcement in Toronto actually look like?” and get a clear answer in under a second.

Step 1: Start With Trustworthy Data

Before doing anything else, I checked the data quality. All 37 million records had valid infraction descriptions. What I found was there were no blanks or corrupted fields, which saved me a ton of time!

Then I started matching violations to streets and locations. That’s when something interesting showed up: 4700 Keele Street alone had over 100,000 tickets, most for parking on private property. It was an early sign that enforcement isn’t evenly spread across the city.

Step 2: Build the Foundation

Once I knew the data was solid, I focused on making it usable.

Street names were a mess — “King St W,” “King Street West,” “KING STR W,” and a dozen other forms, all meaning the same place.

To fix this, I used Toronto’s Centreline dataset as the reference and ran fuzzy matching using Levenshtein distance to find the closest real street for every ticket.

That process collapsed 722,000 messy variations into about 10,000 clean street segments. Suddenly, maps worked & geographic patterns became visible.

Database and Caching

The database runs on PostgreSQL + PostGIS for spatial queries. I added materialized database views for common summaries — top violations,  total tickets, fines, exact locations of your top streets in view.

Then I layered Redis caching on top. The cache holds the most frequent queries — things like “most ticketed streets” or “top neighbourhoods” (among others)

Cached results refresh automatically every day or week, depending on the query. The goal was simple: no waiting to see your results.

Step 3: Show the Patterns

Numbers tell you what happened, but visuals show you how.

Here are some of the charts I built:

Each chart tells a small part of the story: Toronto’s enforcement system is predictable, consistent, and concentrated.

Step 4: What the Data Says

A few big takeaways stood out:

  • Enforcement is stable. After 2015, total volume barely changed year to year.
  • A handful of violations dominate. Ten types of infractions make up 80% of tickets.
  • Summer is the busy season. Warm months bring 60% more enforcement.
  • Geography matters. A few streets — and even individual addresses — drive a disproportionate share of all enforcement activity.

Why This Matters

This dataset doesn’t explain why certain areas get more attention — that needs more context, like traffic, zoning, or complaints data — but it gives everyone a factual starting point.

Key Metrics

MetricFinding
Total Tickets37.0 M (2008–2024)
Distinct Violations658 descriptions, 270 codes
Average Fine$46.39
Fine LevelsMost fines are between $30 and $50; only 5% go above $100.
Yearly TrendTicketing grew fast from 2008–2014, then leveled off.
Top Violation“Park-Signed Highway Prohibit” (3.3 M tickets)
Seasonal ShiftSummer ≈ 60% higher than winter
Normalized Streets~10 K canonical segments

You can explore the same data on open.toronto.ca.

The full analysis and exports live at github.com/monuit/toronto-parking.

Questions or ideas? Reach out at hi@monuit.dev

About Me

I’m Mohammad Abdulhussain (Mo) — a data scientist who works on civic analytics and open government. I built this project to make Toronto’s enforcement data accessible and useful, not just available.

When I’m not cleaning datasets, I’m usually thinking about how cities can use open data to make fairer, smarter decisions.

What other portals are doing that we think is cool! 

Imagine trying to bake a cake without ever tasting other cakes from other bakers. You’d figure out some things on your own, but you’d probably miss out on tricks, shortcuts, and flavours that others have already mastered. That’s what it’s like running an open data portal without looking at what the best ones around the world are doing. 

We wanted to see what’s out there, how other portals are set up, the tools and features they use, and the ways they make data useful. So we went on a bit of a global tour, scanning some of the most respected portals from national sites to smaller city portals. What we found gave us fresh ideas and a better sense of where we could go next. 

A global tour of portals worth learning from 

From the largest players like the European Union, the UK, and the US federal government, to national leaders like France, Finland, and Singapore, and then down to city portals in New York City, San Francisco, Vancouver, Ottawa, and Seattle, we chose these for their track record of innovation, the myriads of their datasets, and how they’ve turned open data into something more than just a catalogue.  
 

PORTAL URL PLATFORM 
NYC, USA data.cityofnewyork.us  Socrata 
Gov.UK  data.gov.uk  CKAN 
U.S. Data Portal  data.gov  CKAN 
Helsinki (Finland) hri.fi  CKAN 
Paris (France) opendata.paris.fr  CKAN 
European Data Portal  data.europa.eu  CKAN 
San Francisco, USA  data.sfgov.org Socrata 
Singapore data.gov.sg CKAN 
Vancouver, CAN  opendata.vancouver.ca  CKAN 
Seattle, CAN City of Seattle Open Data portal Socrata 

Off the bat, we found that across the list, these portals: 

  • Publish high volumes of quality datasets. 
  • Offer advanced search, visualization, and developer tools. 
  • Have governance models and active engagement strategies. 
  • Influence global best practices through open data policy (shout out to SF – you rocked there!) and technology. 

What we’re doing right! 

One thing this review made clear is we’re not starting from scratch. Many of the features we saw elsewhere are already part of our own portal, from solid metadata standards to clear publishing processes. And in some areas, we’re ahead of the pack. Our Data Quality Score is something we didn’t spot on any other portal. It’s a simple but powerful way to help people quickly understand the reliability of a dataset, something most sites leave to guesswork. 

Main features we saw on most portals: 

At a quick glance, a list of features stood out page, after page. Things that make a portal easier to use, more transparent, or more interactive, like:  

  • Tagging and categorization improvements 
  • Better search, filtering, and browsing  
  • User submission tools or request tracking 
  • Dashboard metrics 
  • Dataset versioning and change history 
  • Dedicated sections for APIs and other technical documentation 
  • Built-in tools for creating graphs, maps, and charts directly from datasets 

Without further ado, I’ll briefly display some of the key features we found across the board.  

New York City has a a full catalog overview with compliance metrics for agency datasets. This includes: 

  • Scheduled dataset releases – tracks upcoming datasets each agency plans to publish  
  • Total dataset inventory – shows how many datasets each agency has made available  
  • Delayed releases – flags datasets that were planned but not published on time  

A nice feature that helps users ask the right question, get filtered to the proper person and helps track the requests coming in is a standard form to collect requests. NYC helps you:   

  • Request a new dataset   
  • Ask a question 
  • Report an error 
  • General inquiry 

Another Government links their divisions/agencies’ datasets right along their page. Some may even use the terms like “you might also like…” if a dataset is related to another one. For example, a tree permit dataset might show you a related dataset called ‘tree canopy’.  

In Finland, they have chosen to highlight their top dataset of the year, including honourable mentions – which I think flexes partnerships and increases participation to release more data. 

In Paris, they outline categories of tools to use:   

  • Create a map  
  • Create a chart  
  • Access all APIs  
  • Documentation 

In Singapore, they make it easy to share data by letting you embed charts and tables directly into blogs or articles. It’s a great way to make data more accessible—not just for experts, but for anyone curious about the numbers without having to dig through raw datasets. 

One of the slickest data features comes from Vancouver’s Open Data portal. Shout out to my homeboy Canadian Province. Their dataset page has everything from currency, accuracy, # of downloads, search words, change logs, last modified, websites for more information and more. 
 
 

But THE cake is Seattle’s Homepage. I really like how clean and easy-to-read this page is; more icons, fewer words. It highlights key sections like “About Open Data,” “API Docs,” and “Suggest a Dataset” right in the middle, making them easy to find.  Maybe I am being subjective, but I really like the flow.

I also appreciate how they promote related services like the FOI office, which ties in well with open data. It’s a layout that feels more intuitive and user-friendly. They also highlight different dashboards and agency links, which is a great collaborative tool for users who want a one-stop shop.  

Exploring global open data portals reminded me that innovation thrives on collaboration and curiosity. By studying what others are doing well (from NYC’s compliance dashboards to Singapore’s embeddable charts) we’ve gathered fresh inspiration to improve our own portal. Toronto’s already ahead in some areas, and with ideas like dataset request tracking and our unique Data Quality Score, we’re not just keeping up, I think we’re helping set the pace. 

What features do you think we should introduce or incorporate next on our Toronto Open Data portal? 

Toronto Open Data Awards 2025 

Purpose

Recognizing projects and people that push open data forward in Toronto. From impactful community tools to smart visualizations, this is your chance to showcase how open data made a difference. 

Key Dates 

Nominations Open: June 2025 
Deadline to Submit: November 30, 2025 @ 5:00 p.m. (EST) 
Winners Announced: January 2026 
Award Ceremony: February/March 2026 

Where

Winners and select submissions will be featured on the City of Toronto Open Data Portal and showcased in the Open Data Gallery. Awards will be presented at a ceremony in early 2026, with possible citywide representation & delegates. Winners will also be highlighted across our social media channels

How to Apply

You can apply using the form here. Submissions and nominations will be open for five months. Feel free to nominate others or share the form with anyone working on something noteworthy. 

Criteria for Submission

To help us fairly evaluate submissions, please structure your nomination to answer the following questions. Note: All projects will be judged using a rubric aligned to these questions.

Criteria 1: Impact 

  1. What problem does this project aim to solve? Provide background and a clear problem statement. 
     
  1. What potential impact has the project made? Include any known measurable outcomes, estimated cost/time savings, downloads, improvements in quality of life or any other metrics. 

Criteria 2: User-centricity 

  1. Who is the user or audience? Briefly describe the target users the project was built for. Include equity considerations if applicable. 
     
  1. What kind of user engagement or feedback was part of the work? Describe the nature and level of collaboration with users or partners. 

Criteria 3: Innovation 

  1. What resources and timeline were involved? Share context about the effort required, collaborators, and how the project came together. If appropriate, tell us about the technology used to execute the project. 
  1. What makes this project stand out? Highlight the ‘X-factor’ that differentiates it from others. How does it make use of open data in a new or novel way?  

Award Categories

The top awards will be given to the three highest scoring projects, and honorable mentions will be given to the projects that score highest in each of the three criteria (e.g. the most impactful project, the most user-centred project and the most innovative project).  

We will also award an honorable mention to the top-scoring project submitted by a post-secondary student. 

Eligibility

  • Open to everyone. Any individuals, students, developers, researchers, civic groups, public servants, companies, & start-ups. 
  • Projects must use at least one Toronto open dataset from open.toronto.ca
  • Submissions can be self-nominated or nominated by others. 
  • Projects from the last 3 years to present are eligible. Repeat entries welcome from past years 
  • Active project submissions from past years are eligible to enter, past winners are not eligible to submit the same project.

Need Help

We’re happy to answer questions. Email us at opendata@toronto.ca 
Submit by November 30, 2025 @ 5:00 p.m. (EST) 


 

Exploring the Future of Open Data: Hosting Community-Generated Datasets on Toronto’s Open Data Portal 

Written by Toronto’s Open Data Team member – Toronto Urban Fellow Angel Li.


When we think about open data, we often think about government-published datasets—like transit schedules, water quality data, or demographic information. But what if Toronto’s open data portal could go beyond simply offering city-owned data and start hosting datasets created by community organizations, researchers, and academic institutions? 

That’s the big idea the Open Data team has been exploring. Bringing community-generated data into the City’s open data portal could lead to positive impacts for civic engagement, innovation, and collaboration.  

However, it also comes with challenges that must be carefully considered to ensure it’s done equitably and responsibly. 

Learning from Other Cities 

For the purpose of this blog, community-generated data refers to data voluntarily created by non-governmental entities, such as community organizations, private entities, and academic institutions, without government direction or oversight. 

To get a sense of what’s possible, we looked at how other jurisdictions approach community-generated data. Here are some key insights: 

  • France has developed a national geo-referenced address database where citizens can report address information, helping improve data quality and accuracy. 
  • Spain operates a national open data portal that includes datasets from the private sector and academia, not just government agencies. 
  • Finland allows anyone to upload datasets to its open data portal. While this open model fosters inclusivity, it has also led to challenges with data ownership and content moderation. 
  • Ottawa and Montreal take a more controlled approach, primarily sharing datasets from organizations that already have formal partnerships with the city. 

Each of these models presents different benefits and challenges, and there’s no one-size-fits-all solution. The key takeaway? There’s value in exploring this idea further, but careful planning is necessary to understand if – and how best – it could work in Toronto. 

What the Community Thinks 

We also spoke with members of Civic Tech Toronto (CTTO), a local group dedicated to civic technology and data projects, to gain insights from those actively involved in creating community-generated data. Their response? Enthusiastic but cautious. Here’s what they highlighted: 

Potential Benefits 

  • Access to Resources: Community groups often lack the resources to maintain and host large datasets. The City’s open data portal could ease this burden. 
  • More Data, More Collaboration: Easier data sharing is expected to generate more datasets, spark new partnerships and drive innovative projects. 
  • Greater Visibility: It is believed that the City’s platform could help community-generated datasets reach a wider audience. 
  • Credibility and Validation: Data owners feel that having their data hosted on the City’s portal could increase its legitimacy, encouraging policymakers to use it in decision-making. 

Potential Challenges 

  • Political Sensitivities: Many of the groups creating community data are advocacy organizations who want the City to adopt a specific policy or approach. There’s still an open question about whether it’s appropriate for the City to host such data, and how that relationship might work. 
  • Data Ownership and Control: How much control would community groups have over updating or removing their data? 
  • Sustainability: Could data owners commit to keeping their datasets current and accurate over time, in line with the City’s approach to its own data? 

Moving Forward Responsibly 

The idea of making Toronto’s open data portal a place to find information about the City, not just from the City is interesting,  but it requires a thoughtful approach.  

Based on our research so far, here are some potential paths forward: 

  1. Begin with Trusted Partners: Should the City decide to host community-generated data, it might be wise to follow other jursidctions’ lead and start by collaborating with well-established organizations that have strong data governance practices in place. This could include universities, or larger NGOs that already have existing data relationships with the City.  
  1. Develop Clear Inclusion Criteria: Regardless of where a dataset comes from, it’s important that users feel that data on the portal is reasonably accurate and trustworthy. But at the same time, the City shouldn’t be assuming responsibility for third-party data. Navigating that starts with developing clear standards and expectations for that community-generated data would need to meet to be included on the open data portal. 
  1. Communicate clearly: Users of the open data portal should be able to easily understand whether a dataset comes from the City or a third-party. That way they can decide how and how best to use the data. If the City does dip its toes into hosting community-generated data, we should clearly differentiate it from City-owned data, and include a disclaimer noting that hosting data doesn’t imply endorsement, and that third-party organizations are responsible for their data’s quality and content. 

What’s Next? 

We’re continuing to explore the feasibility of hosting community-generated data on the open data portal. There’s potential value in making the City’s open data ecosystem more inclusive, but it’s essential to strike the right balance between inclusivity, accountability, and utility. 

If you have thoughts on this initiative, we’d love to hear from you! How do you see community-generated data playing a role in Toronto’s open data landscape? Reach out to opendata@toronto.ca with any questions or ideas —let’s keep the conversation going!  

Announcing the 2024 Toronto Open Data Award Winners

Hi, everyone! We’re excited to share the highlights and wrap-up of our first-ever Open Data Awards. It’s been an incredible journey, and we couldn’t have done it without all of you who participated and supported this initiative.

Let’s dive into what happened and what we’ve learned along the way. But first, here are the winners of the 2024 Toronto Open Data Awards.

The Public Project Winners:

  • Winner: Automatic Detection & Display of Unplanned TTC Detours (The Transit App and Toronto Transit Commission)  – This project tackled the challenge of unplanned TTC detours by enhancing the Transit app to provide real-time detour maps and updates. By extending TTC’s GTFS (General Transit Feed Specification) data to include detour details and using machine learning to detect unplanned route changes, the app ensures riders have accurate, dynamic information about service disruptions and temporary routes.
  • Honourable mention: Open Water Data (Mitch Bechtel) – This platform informs open water swimmers and other water recreation users with a wealth of information about popular beaches in Toronto, and around the world so they can make informed decisions about where and when to swim outdoors. The goal was to make existing information available from various sources easier to find and understand, and to gather and share information that is not otherwise available.

The City (internal staff, division teams) Project Winners:

  • Winner: Social Development and Finance Administration’s Neighbourhood Wellbeing Data Suite  – This interactive mapping platform utilizes neighbourhood level data, to map out and layer data on a wide range of indicators related to health, housing, safety, and overall quality of life. Users can quickly find what neighbourhood they live in and retrieve relevant socioeconomic indicators that they are interested in. This tool is a common platform used by both community members and internal City staff to power targeted neighbourhood planning for a healthier, more equitable city. 
  • Honourable mentions (there was a three-way tie):

    • Toronto Public Health’s Open Data App (Internal TPH Open Data app facilitates data access for a number of projects without the need for any development effort. Over 250 distinct restful endpoints are available, providing data from 12 datasets, serving multiple internal business teams, making data available to TPH websites like Dinesafe, Bodysafe, Swimsafe, ChemTRAC and also providing data to the Open Data group and other external partners. Over 100,000 daily requests are serviced).

    • Transportation Services’ Open DatasetsThe Transportation Data & Analytics Unit has published over 10 transportation & mobility datasets. Developing modern tools for staff to manage and access data, upgrading legacy systems, and prioritizing data quality, automation, and open access.

    • Housing Secretariat’s DashboardThis unit released datasets, dashboards, and maps on key topics such as the stock of affordable homes, the pipeline for new housing, and the housing waitlist. These tools were designed to meet the needs of Torontonians through clear and accessible data storytelling.

The Student Project Winner:

  • Student Project Award: Toronto SUMO Networks – an open-source project designed to simulate and analyze traffic networks for the City of Toronto using the Simulation of Urban Mobility (SUMO) tool. 

Submissions and Participation

We closed the submission form on December 4th and received an impressive variety of entries. In total, we received 51 external, internal City and nominated project submissions.

These projects covered a range of subjects, showing the diverse ways people are using open data to solve real-world problems. We received a wide array of projects with topics focused on recreation and leisure, transit and mobility, urban planning and housing, public safety, health and community services, environmental awareness. Stay tuned for many of the projects to be showcased on our revamped Gallery page.

Toronto, data, project, bicycle, using, transit, parking, information, public, provide, shelter, safety, goals, issues, bike, residents

Reviewing the Entries

The immediate Open Data team spent time carefully sifting through every submission. Using a detailed rubric, we scored and ranked the projects to narrow the pool to the top finalists. Our team’s expertise—ranging from back-end developer to policy analyst, transformation consultant, and communications—helped ensure a balanced review process. We narrowed it down to a shortlist of top projects, and then we brought in a panel of esteemed external judges (see their bios below ⏬) to evaluate the finalists and decide the winners. These judges spent a day with us, providing their insights and perspectives. It was a collaborative process that not only refined the selection, but also gave us valuable lessons for next time.

Judging Criteria

Here’s the rubric we used to evaluate all the external submissions:

Impact (30%): Does the project address an identified civic issue and provide evidence of its impact or uptake?

Innovation (30%): Does the project use open data in new ways to create insights, solutions, or enable future innovation?

Design (20%): Is the project thoughtfully designed, intuitive, and accessible for diverse users?

Community Engagement (20%): Does the project consider and involve users and the community meaningfully throughout development?

Lessons Learned

Some of our key takeaways from running this award:

Communication is key: Clear guidelines and examples would make it easier to assess submissions. Sharing the rubric before submissions start would help applicants tailor their responses to the criteria, making them more focused and easier to compare.

Comparing Projects: Open data is used in so many ways—from startups to students and hobbyists, building everything from user tools to backend systems. Comparing such different projects wasn’t easy, and we’re going to put some more rigorous thought into how we categorize projects, refine criteria for submissions, and recognize the varieties of projects in the future.

Clearer submission questions & criteria: A key takeaway is that submission questions should guide applicants to provide more context about the problem they’re addressing, who their users are, and an objective view of their work. This will help us better understand their projects and evaluate them fairly.

~~~~

So, we aim for progress, not perfection! It’s clear there’s a lot of subjectivity in reading, critiquing, analyzing, and scoring these applications. Most projects showcased great vision and an impressive variety of approaches. These lessons will help us create clearer guidelines, improve fairness in assessments, and better celebrate the incredible work being done with open data.


A Big Thank You to Our Judges

Helen Huang is a social impact entrepreneur driven by the belief that innovation happens best when diverse perspectives come together. As the Co-Founder of Co.Lab, she has pioneered an experiential learning platform that empowers non-traditional talent to thrive in the tech industry. Recognized as a Forbes 30 Under 30 Honoree and DMZ Woman of the Year, Helen combines her product experience from Microsoft and Zynga with her passion for inspiring others to create positive change through technology.

Dr. Mark Fox is a Distinguished Professor of Urban Systems Engineering, Professor of Industrial Engineering and Computer Science, and Founding Director of the Centre for Social Services Engineering at the University of Toronto. He is actively involved in the development of international standards for city data  based on his urban ontology research, including ISO/IEC 21972 “Upper level ontology for smart city indicators” and the ISO/IEC 5087 series of “City data model” standards. He has published over 250 papers. 

Candice Sarnecki is the Sr. Director of Sales at Miovision responsible for the Sales team managing Canada and the 22 largest cities in North America solving congestion and safety challenges in support of Vision Zero commitments.  Candice joined Miovision in 2023 after 22 years in the telecommunications industry supporting Sales and Business Development for IoT domestically and globally and global carrier relations.  Candice has her masters in Urban Planning and is a long time resident of the Greater Toronto Area. 

Dorothy Eng is the Chief Executive Officer of Code for Canada, a national nonprofit using tech and design to improve life in Canada. She works with governments, nonprofits and corporations to develop public-interest technologies which make the delivery of services to the public more effective, efficient and better meet people where they are at. 

Keith McDonald has a storied past with the City including being part of the early days of the Open Data initiative and ending up as our first Open Data lead in 2014. Keith retired in 2017 when he founded “the literacy AI project” to teach community audiences about the impacts of AI – becoming a bridge between AI creators and citizen consumers. 

Fartash Haghani is the Director of Enterprise Data and AI for the City of Toronto, Fartash leads transformative initiatives to unify city data, enhance cross-departmental collaboration, and responsibly integrate AI technologies. With a rich background in software engineering, data science, and AI, he is dedicated to improving municipal operations and citizen services through innovation. 


Rachel Weiss (she/her) is a Business Intelligence Consultant in the City of Toronto’s Data for Equity Unit. She is passionate about leveraging data to identify and address inequalities. Before joining the public service in 2022, Rachel worked as a mixed-methodology researcher and data consultant in the private sector. 


James Elliott has twenty-five years of experience at the City of Toronto, James brings in-depth knowledge of the City’s diverse geospatial datasets. Since 2015, he has been part of the Geospatial Competency Centre, overseeing the Topographic Mapping program and Aerial Imagery datasets. Recently, he has led various geoanalytics initiatives, such as SolarTO, to find better, more cost-effective ways to deliver services to Toronto’s citizens. Outside of work, he enjoys exploring the world and capturing its beauty through photography. 


John Griffin is a Program Manager at Open North where he works with local governments across Canada. He is committed to helping facilitate the direct connection between local governments and the communities they serve through improved data governance practices and capacity building. 


What’s Next?

We’re planning to feature all the submissions in an online gallery launching next month. It’s our way of recognizing everyone who contributed and inspiring others to explore open data.

We will invite the 2024 award winners to present their projects to senior management teams, division data managers, and staff (details to be finalized). They will receive recognition and their awards during this event.

To everyone who submitted, judged, or cheered us on: thank you for making this a success! Stay tuned for the revamped gallery and updates on this year’s awards.

Top Open Data Moments in 2024

Public Presentations

In September, Technology Services invited Civic Tech Toronto to host their weekly hack-nights out of Metro Hall, creating an opportunity for engaged, tech-savvy residents and public servants to meet and learn from each other. Over four weeks, over 200 people joined in online and in-person.

The events were a chance to showcase innovative technology work happening at the City, including our latest climate models, how we’re using data to optimize our vehicle fleet or improve shelter services, and our efforts to bring 311 into the digital era.

If you’d like to learn more about Civic Tech Toronto, attend one of their upcoming hacknightsjoin their Slack, or just stay tuned; we’re hoping to host the group again in 2025!



The Requested Datasets

In 2023, Toronto City Council mandated that Toronto Open Data share a list of datasets that they plan on publishing. In May of 2024, Toronto Open Data published our Toronto Open Data Intake dataset. This dataset lists every unit of work (ie: tickets in a queue) that the Open Data Team has worked on and is working on to investigate, publish, or update datasets on the portal.

That dataset, while it published all the details someone would need to see the Open Data Team’s current and past workload, was not convenient for users to get quick insight from. To address this, the team added user interfaces over this dataset, and put them throughout the portal.



The 2024 Open Data Awards

This November marked the launch of the Toronto Open Data Annual Awards, celebrating innovative uses of open data. The campaign encouraged submissions from internal staff, residents, students, and businesses who demonstrated impactful applications of the City’s datasets. Judging and final award winners to be announced early 2025.



Data Quality Score Update

Building on the Dataset Quality Framework introduced in 2019, the team made significant updates to the scoring system in 2024. These updates help ensure datasets meet higher standards and better serve users’ needs.



Future Path

Near the end of this year, we also did a team theory of change activity aimed at linking our everyday activities with the big-picture goals of the Open Data team.

How does our work uplift important metadata standards amongst our colleagues help ensure open datasets can be used to create a better City?

How does our work with open-source software like CKAN contribute to the open government / open source / open data ecosystem locally and abroad?  

We’re still finalizing the map, but hope it’ll be a useful tool going forward, as we think more about how we monitor our impact and create feedback loops to continuously improve the program.


Community Datasets

Discussions and insights have been gathered from the experiences of Open Data teams in Ottawa, Montreal, and Finland on integrating community datasets. A policy recommendation paper on community datasets, including a jurisdictional scan, is currently in development. Once completed, it may be shared as a blog post or considered for inclusion in the Open Data policy framework.



By the Numbers

  • We handled a total of 212 tickets (individual request or inquiry submitted to the Open Data team) in our queue, among them 47 open data inquiries were made
  • 120 significant updates to existing datasets
  • We’ve introduced 48 brand new datasets
  • Out of all the tickets received, we’ve “closed” 50 of them – This closure indicates instances where no further action was necessary or possible in response to the requests made. 


The Who’s Who…

We welcomed 8 new members to the team in 2024, who are doing awesome and much needed work! Here they are, in no particular order.

  • Luke Simcoe – Who’s handling our Policy Refresh and and a go-to resource across multiple projects
  • Veronica Yeung – Who’s working on a number of service design tasks, including designing the connection of staff reports to council and committee to Open Data
  • Angel Li – Our Toronto Urban Fellow who’s helping us handle parts of our Policy Refresh
  • Adam Foord – Our frontend developer who’s helping publish our intake queue and refresh our dataset page
  • Jamie Beverley – Our backend developer who’s helping us scale the portal to handle bigger data and complicated workloads
  • Anson Liang – Who’s helping us design, build and scale infrastructure
  • Swati Arora – Who’s helping build a data modeling practice at the city
  • Mohamed Shakeel – Who’s also helping us design, build and scale infrastructure

We are excited for this growing and talented team. Can’t wait to see all the great things they pump out in 2025.

Toronto Open Data Team
The Toronto Open Data Team 2024 (not all members pictured). From left to right: Yanan, Angel, Luke, Mohammed, Denis, Mac, Reham, and Veronica.

Announcing the Toronto Open Data Awards

We’re excited to share something new with you! Whether you’re a journalist, academic, volunteer, data enthusiast, a data whiz, a community organizer, data lover, parent, youth, city staff, CEO, (how many more titles can I come up with?!) or just passionate about using data to improve our city, this is your chance to make an impact – and get recognized for it!

To celebrate the 15th anniversary of Toronto’s Open Data program, we’re launching the Toronto Open Data Awards! (throw the confetti!).

This is our inaugural year and we want to honour projects done since the program’s inception. If you’ve ever used Toronto’s Open Data to solve a local problem, build a cool tool, or create an insightful visualization—whether recently or in the past—this Award Submission Campaign is for you!

As part of the City’s Open Data Master Plan, we’re rolling out these awards to not only increase awareness of open data (that’s Master Plan goal 4b) but also to incentivize teams to publish data (Master Plan goal 2c). The idea is simple: by recognizing outstanding projects from both external contributors and internal teams, we’re helping to show just how transformative open data can be.

And here’s the fun part – whether your project is small, big, or somewhere in between, we want to see it! If you don’t have a project to submit yourself, you can still get involved by nominating someone else’s work. Sharing is caring, after all!

What We’re Looking For

We’re on the lookout for projects that bring Toronto Open Data to life. When we evaluate submissions, we’ll be considering: (These are just some ideas to get you started—you don’t need to cover them all!)

  • Are you using Toronto Open Data in a new way or presenting it from a fresh perspective?
  • Is your project visually engaging and user-friendly? (think apps, maps, visualizations)?
  • Does your project offer insights, tutorials, or code to help others learn?
  • Does the project help users better understand a specific dataset or solve a problem or important civic issue relevant to Toronto?
  • Does your project benefit or involve Toronto’s community in a meaningful way?

How It Works

  • Submissions Open: From November 1st to 30th, you can submit your project or nominate someone else’s work using our simple submission form (coming soon).

  • Judging: In December, a panel of judges made up of community leaders and open data champions will review the top submissions.

  • Awards: In January, the top projects will receive special recognition, be invited to present to our Senior Leader Teams and be featured on our revamped Toronto Open Data Gallery, launching in January 2025.

Stay tuned for more updates, and don’t forget to follow us on X and keep an eye on our home page for all the latest announcements. We can’t wait to see the amazing projects you’ll submit! Submissions open November 1st!

SPREAD THE WORD!!!

A Decade of Progress on City-Owned Buildings and Facilities

Sami El Sabri is a recent graduate from the University of Toronto’s Faculty of Arts and Sciences, specializing in public health and computational cognitive science. With strong skills in data-driven insights and statistical analysis, Sami is eager to advance public health research. This paper was prepared for Professor Rohan Alexander’s Methods of Data Analysis (STA302) course at the University of Toronto, applying analytical techniques to real-world urban issues.


This post explores the trends in renewable energy installations on City-owned buildings and facilities in Toronto over the past decade. Using data from the City of Toronto’s Open Data Portal, this analysis examines whether the city is on track to meet its commitment to an energy transition. This data is easily accessible at this link and has been analyzed using R and its various statistical analysis packages, along with QGIS for producing map visualizations.

Renewable energy is vital for the environmental health of a city like Toronto. Adopting renewable energy sources reduces greenhouse gas emissions, improves air quality, and fosters a more sustainable urban environment (Perea-Moreno et al., 2018). These environmental benefits are essential for the physical and social well-being of urban populations in the long term. Additionally, the equitable distribution of renewable energy installations ensures that all communities within the city can reap the benefits of cleaner energy sources, which can lead to improved economic opportunities and quality of life for residents (Robinson, 2020). The change must start somewhere, and when local governments take a top-down initiative, they can shift the culture and mobilize public support for renewable energy (van Staden, 2017).

The findings of this analysis highlight a significant increase in renewable energy installations both in quantity and quality, particularly in 2017 and 2018, reflecting the city’s intensified efforts towards its carbon neutrality goals. The data also reveals a diversification in the types of installations and a strategic expansion beyond the downtown core.

Data Source

For this study, I used a dataset from the City of Toronto’s Open Data Portal, specifically focusing on renewable energy installations on city-owned buildings and facilities. The dataset, titled “Renewable Energy Installations,” was last updated on April 22, 2022, and includes data from 2010 to 2019 collected by the Facilities Management Division. It details various types of renewable energy systems installed, such as photovoltaic panels, solar pool heating, and geothermal systems. It also shows if an installation fell under the FIT (Feed-In Tariff) or microFIT programs, which incentivize the use of renewable energy by guaranteeing long-term financing and contracts for both large and small installations, as well as energy storage solutions and Net Metering (NM) installations.

Key Findings

One of the most interesting findings from the data is the variation in the size and type of renewable energy installations over the years.

Figure 1: Size of new City-Owned Renewable Energy Installations in Toronto between 2010 and 2019 with fitted regression line.

Figure 1 illustrates the sizes of new installations over time, showing significant variation, particularly in 2016 and 2018. Due to this inconsistency, a simple trend line cannot reliably predict an overall increase in installation size over the years. When we look closer at the data, we see that the size of installations is closely related to the type of installation. For instance, in 2017, most new installations were smaller MicroFIT systems, which resulted in smaller average sizes for that year (see Figure 2). In contrast, 2018 had more large FIT installations, leading to larger average sizes. This variation highlights how the city is balancing both small-scale and large-scale renewable energy systems to diversify its energy portfolio.

Figure 2: Count of new City-Owned Renewable Energy Installations in Toronto by type of installation between 2010 and 2019

Figure 3 maps the geographical distribution of renewable energy installations across Toronto. Installations are evenly spread throughout the city, with strategic efforts to expand into peripheral areas. Interestingly, most of the city’s larger installations are in these areas, such as North York and Scarborough (Figure 4). As of 2019, there were still notable areas with low installation densities such as Toronto-Danforth, Etobicoke Centre, Scarborough-Rouge Park, and Don Valley City Wards. However, this expansion over time demonstrates a deliberate strategy to ensure equitable access to renewable energy across the city, reflecting a commitment to inclusive and widespread sustainability efforts.

Figure 3: Geographical Distribution of City-Owned Renewable Energy Installations in Toronto between 2010 and 2019

Figure 4: Geographical Distribution of City-Owned Renewable Energy Installations in Toronto by size

Discussion

While this dataset and analysis provide valuable insights into the Toronto City Government’s dedicated efforts to expand and diversify the city’s renewable energy infrastructure, it’s important to acknowledge the limitations. Since the dataset primarily focuses on city divisions and excludes data from other city agencies or public-private partnerships involved in renewable energy initiatives, the analysis may overlook significant installations managed by other entities. However, the approach taken in this analysis can be applied easily to similar datasets, such as considering installation types when interpreting size trends.


Moreover, the data only covers installations up until 2019, leaving out more than five years of potential progress. Toronto’s efforts are part of a broader commitment to addressing climate change, aligning with Canada’s $964 million investment in renewable energy projects under the Paris Agreement. As the largest city in Canada, Toronto stands at the forefront of sustainable urban development with initiatives like SolarTO and the Conservation Authority’s Renewable Energy Program. The city’s Net Zero Carbon Plan aims to achieve net zero emissions in city buildings by 2040. While this goal and current action are commendable, the city must continue or even intensify its efforts to ensure it meets its ambitious targets and sets an example for other cities worldwide. This analysis contributes to a better understanding of Toronto’s commitment to renewable energy, paving the way for informed discussions on the city’s progress and future endeavours in combating climate change.

Sources

UrbanToronto: Seeing the Future of the City with Data

A great example, partnership and arguably a great use-case of how open data is used in big Cities. One of our key collaborators in this endeavor is UrbanToronto, a comprehensive resource for tracking new developments across the Greater Toronto Area.

This blog post delves into our successful partnership with UrbanToronto, highlighting how our open data fuels their innovative services and how, together, we are helping citizens and professionals alike see the future of our city through data. Written by UrbanToronto.


If you ever search for a new condo development, you’ll likely come across UrbanToronto. We track every large new project in the GTA (and beyond) in four different ways: a quantitative database page for the project; a discussion thread in our highly active discussion forum; a pin on our detailed map; a news story about the project written by one of our journalists.

All four of these information services rely heavily on open data from the City of Toronto. We transform this data into a standardized format to allow searching, filtering, and plotting, which fuels the rest of the services we offer. 

While our business is predominantly based on data today, it wasn’t always the case. Here is the story of how open data transformed UrbanToronto’s business. 

History of UrbanToronto

You know how there are train geeks, movie geeks, and history geeks—people who love to learn and talk about these topics in great detail?

Skyscraper geeks exist, too: people who love to talk about the newest high rise developments in the city. But unlike most discussions on TV or the newspapers about condos, skyscraper geeks don’t care so much about prices. Instead, they would focus on the design of the building, the construction process, and the urban planning involved. 

UrbanToronto began more than 20 years ago as a discussion forum for skyscraper geeks in Toronto. Much like the city we cover, our business has grown a lot in that time, too. 

Soon after the forum started taking off, we added a news component to the website. We cover breaking news in the development industry, as well as feature articles highlighting new technologies, new policies, and innovative builders, suppliers, and designers in the field. 

Our Data Origin Story 

As the popularity of our community grew, so did the needs of our forum members and journalists. We have thousands of threads for individual projects, and some have hundreds of pages of comments with thousands of posts. The posts include construction photos, but also images of the architectural plans, and other data about the project. If you wanted to know how tall a building was, but there were 300 posts into the discussion, you would struggle to find the information. 

Enter the UrbanToronto database, version 1.0. Especially popular projects got a dedicated page that listed their crucial information: some renderings, the heights, unit mix, and the developer and architect. All of this data came from City of Toronto planning documents. What’s more is that we also started plotting these projects on a map, colour-coded by construction status. That way, you could see where the big projects in the city were going up. 

Soon enough, this database grew to over a thousand projects. At this point, the database itself, and especially the map that was built on top of it, had grown a dedicated user base of their own. We realized it made sense to invest more into our map and database service, which involved hiring a bigger team, investing in new technologies, and expanding the scope and depth of our data. 

Today, UrbanToronto data is available for free, as well as a premium subscription package called UTPro. We track over 5000 projects across the Greater Golden Horseshoe, although due to the City of Toronto’s excellent open data products, our coverage in Toronto is the deepest and most accurate. 

What Data Does UrbanToronto Track? 

Unlike other real estate data providers, we rely almost exclusively on publicly available documents. While we track a wide variety of sources, including building permits and the heritage registry, our main source of data are development applications: rezoning, site plan approvals, Official Plan amendments, and so on. 

There is a vast wealth of information in these applications, and many people and organizations link to and track them. However, they can comprise 60 different documents, most of which are in PDF format. This lowers the digital legibility of the documents, which is where UrbanToronto comes in. Through a combination of manual and automated processes, we read every one one of those PDF and input the data into a standardized format which makes filtering, sorting, and plotting the data much easier. 

Our data is used by three types of users: (1) those looking to buy or develop new properties, including developers, urban planners, land assemblers, and retail condo investors; (2) realtors, tradespeople, and suppliers looking for leads on new developments and projects under construction; and (3) thousands of enthusiasts who, for their own interests, consult our map and database to stay up to date with what’s going on in their neighbourhood.

The Future for UrbanToronto and Open Data

While we are constantly expanding our database with new projects (as well as updating existing projects), our focus has historically been on “large” developments—typically townhouses and above. As the City is changing policy to permit more infill development, our database will be expanding to include these smaller projects as well. We are also in the process of expanding the features of our map to include more functionality in terms of filtering and exporting the data, as well as new layers to supplement the investment decision processes. 

We’re looking forward to continuing to build our relationship with the City’s Open Data team, as we deepen our coverage of development in Toronto.

How the City is Winning the War Against Lead Contamination in Drinking Water

Our next student guest blog was written by Abbass Sleiman, a third-year undergraduate student at the University of Toronto in the Mathematical Applications in Economics and Finance Specialist. Abbass has experience as a Teaching Assistant in mathematics and wrote this paper for STA302H1 (Methods of Data Analysis) at the University of Toronto with Professor Rohan Alexander.


Background

Lead in drinking water isn’t something most of us think about when we turn on the tap – it certainly hadn’t crossed my mind until I read a CBC News article from 2014. The article, aptly titled “High lead levels found in some Toronto drinking water”, claimed that after analyzing 15,000 water samples provided to the city by homeowners between 2008-2014 through the Residential Lead Testing Program, 13% of Torontonian households exceeded Health Canada’s standards for lead exposure – an alarmingly high portion when you consider the potential health risks associated with lead exposure.

Lead exposure can have serious health consequences, especially for children. It can cause damage to the brain and nervous system, slowed growth and development, learning and behavior problems, and decreased IQ. For adults, long-term exposure can lead to cardiovascular issues, kidney damage, and reproductive problems. This, coupled with the fact that lead can’t be seen, smelled, or tasted, means that testing our water becomes all the more crucial. So, how did the city of Toronto address this issue? Back in 2011, Toronto’s City Council took action against this silent threat by approving a strategy to reduce lead in our drinking water. Fast forward to 2014, and the city began adding phosphate to the water treatment process which forms a protective coating in the pipes, helping to keep lead from leaching into the water we drink every day.

But did it work? To find out, I delved into a dataset of 12,810 water samples from Toronto homes, collected between 2014 and 2024. My goal was to determine if the phosphate treatment made a difference in reducing dangerous lead levels in our water, how efficient it may have been at doing so, and whether certain areas in Toronto are more at risk than others by analyzing the data using the programming language R.

Understanding Lead Levels

To understand the impact of lead concentrations on our drinking water, we need to first understand how these levels are measured. Lead concentrations are generally measured in parts per billion (ppb), effectively how many tiny amounts of lead are present in every billion drops of water, or equivalently in micrograms per liter.

Back in 2014, Health Canada set the safety threshold of lead exposure in water at 10 ppb, meaning that the CBC news article’s claim that 13% of Toronto households were facing high lead levels essentially meant that 13% of households provided water samples with a lead concentration of at least 10 ppb. However, in 2019, Health Canada tightened the standard, lowering it to just 5 ppb. So, when we talk about ‘high lead concentrations’, we’ll distinguish between whether we’re referencing the older threshold or the updated one.

The Data

The data used in this analysis was derived from Open Data Toronto, under “Non Regulated Lead Sample”. Published by Toronto Water, this data features data from Toronto’s Residential Lead Testing Program – the same source data used in the CBC news article. It includes details on various houses’ lead concentrations based on water samples provided by the households themselves. The data is refreshed daily and the particular dataset used was up-to-date as of January 22, 2024.


The raw data set features the lead concentration in parts per million (ppm) of 12,810 water samples, where 1 ppm is equivalent to 1000 ppb, the date that each sample was collected, and the household’s partial postal code (only the first three digits of the resident’s postal code for privacy reasons).

Cleaning the Data

Given that I was interested in the after-effects of the phosphate addition to the drinking water treatment process in 2014, all entries from 2014 were eliminated to ensure the analysis focused on water samples taken after the policy was put into effect. Additionally, entries with missing values were removed, and the column for lead concentrations was converted to ppb for consistency.


Extreme outliers were also excluded to avoid skewing the analysis. In the context of this study, I defined a lead concentration to be an outlier if it was any value exceeding (and including) 100 ppb, 20 times Health Canada’s standard of 5 ppb, making it reasonable to assume that the value is so extreme as a result of an error in the particular household’s sample collection process. This left us with a cleaned dataset of 9,302 samples ready for analysis. A sample of this cleaned data can be seen in Table 1 below, and all observations are visualized in the scatter plot in Figure 1.

Summary Statistics of the Data

Before diving further into the analysis, it was important to examine the structure of the dataset, particularly the number of observations per year, to ensure that any conclusions drawn later down the line would be based on a sufficient amount of data. Table 2 shows us that we have access to much fewer data points in the year 2020, likely as a result of the COVID-19 pandemic, and only a single observation available for 2024, meaning that any information pertaining to that particular year is likely meaningless and should be taken with a grain of salt.

Then, to get a better sense of the data as a whole, both the mean and standard deviation of lead concentrations were calculated and are presented in Table 3. The mean lead concentration of all samples was approximately 1.04 ppb, well below both the old and new safety limits. However, the standard deviation of the lead concentrations, essentially a measure of how spread out the lead concentrations are from the mean, was a rather large 4.05 ppb, indicating considerable variability.

Examining the Portion of Households Exceeding the Lead Concentration Limit

I was mainly interested in whether the portion of households exceeding the lead concentration limit of 10 ppb had changed from the past portion of 13%. Additionally, it was important to examine the portion of households exceeding the newer limit of 5 ppb. To do so, I decided to look at the distribution of households across four lead concentration categories (<5 ppb, 5-10 ppb, 10-20 ppb, >20 ppb), all of which are shown in Table 4.

Using this information, we can see that the vast majority of water samples (98.73%) contained a lead concentration below the previous limit of 10 ppb, a striking improvement from the original 87%. Even more so, approximately 97.14% of water samples are below the new limit of 5 ppb. Overall, this provides some fairly compelling evidence in favor of the benefits of adding phosphate to the water treatment plan.

Investigating the Relationship Between Time and Lead Concentration

But how quickly did the lead concentrations improve? Were they instantaneous or were they gradual? These are key questions to consider when evaluating the effectiveness of the phosphate treatment. To answer this, I created a scatter plot (Figure 2) to see how the mean lead concentration changed across each year.

We can see a clear and consistent decline over time, with 2015 featuring the highest mean lead concentration of 1.53 ppb, and 2024 with the lowest at a mere 0.15 ppb (though the 2024 data point is based on only one observation and should be viewed cautiously). To gain a deeper insight, I examined how the proportion of households with water samples exceeding lead concentrations of 10 ppb and 5 ppb changed over time in Figures 3 and 4, respectively (note that 2024 was omitted given that there is only one observation).

Though these figures show a slight rise in mean lead concentrations from 2019-2022, we can still see an overall fairly consistent decline over time in the portion of households exceeding both the 10 ppb and 5 ppb limits. Moreover, the portion of households across all years with high lead concentrations still remains well below the 13% found in 2014, indicating that the addition of phosphate was indeed effective in improving water quality.

Exploring the Relationship Between Location and Lead Concentration

Lastly, I wanted to uncover whether certain locations were more prone to having contaminated water than others. Given that the dataset had dozens of unique partial postal codes, I grouped them by their first 2 characters (for e.g. M2L and M2K would both fall under the “M2-” group) to create a more readable figure. Doing so allowed me to calculate the mean lead concentration for households in each group, showcased in Figure 5.

While we do see some variation in lead concentrations between locations, with the highest mean concentration being 1.27 ppb in the “M6-” group and the lowest being 0.29 ppb in the “M8-” group, the range of concentrations is within 1 ppb. This means that we can safely conclude that location doesn’t significantly impact the likelihood of a household having high lead concentrations.

Summary

Consuming water contaminated with lead can cause serious health issues, including brain damage and slowed growth, especially in children. To tackle this problem, Toronto started adding phosphate to its drinking water treatment process in 2014 to reduce lead contamination. After analyzing water samples from post-2014, it’s clear that mean lead concentrations have decreased, and fewer households are exceeding the safe lead exposure limit. There’s also little evidence to suggest that where the water samples were taken from has a significant impact on lead levels.

Next Steps

Given that the data used was based on samples collected by the individuals residing in each household, future analysis could be better improved by incorporating sources of data from more controlled sources, collected by qualified individuals.

Another improvement could be the use of time-series data showcasing changes in lead concentrations over time from the same source. This could provide deeper, and possibly more accurate, insights into the effects of various water treatments of water quality and safety.

Exploring Equity in Child Care: A Data-Driven Analysis of Access and Demand

Thomas Fox is a 4th year undergraduate student at the University of Toronto’s Faculty of Information. His research focuses on information policy and the human impact of socio-technical systems. This paper was written for Professor Rohan Alexander’s course Worlds Become Data (INF312) at the University of Toronto. 


This post explores issues of equity, access, and demand surrounding licensed child care spaces throughout Toronto’s 25 wards. By examining information made available through Open Data Toronto related to licensed child care centres as well as the demographic information related to each ward from the 2021 Canada census, a detailed analysis is made possible. This data is easily accessible on Open Data Toronto’s website and is a useful resource for researchers and community members to explore and locate child care information. An interactive map can be found in the “Data Preview” tab at this link: https://open.toronto.ca/dataset/licensed-childcare-centres/.

Child care is essential to the social and economic health of a community such as the City of Toronto. Access to child care has been shown to have a positive impact on occupational and educational opportunities for parents, especially those in lower income brackets (Gunaseelan 2021). These economic advantages for parents and families bring benefits to their physical and social well-being. Child care access also has a positive impact on the health and development of children attending these facilities, especially those who are most vulnerable (Rhijn et al. 2021; Underwood and Frankel 2012). Equitable access to child care is therefore a vital facet of community health and development.

The findings of this analysis demonstrate inequitable access to child care across the city of Toronto based on each ward’s average household income, English language prevalence, and proportion of the population identifying as racialized. As child care plays a central role in the social and economic well-being of communities and has an especially positive impact on vulnerable populations, these findings support measures and initiatives aimed at ensuring more equitable access to child care in the city of Toronto.

The “Licensed Child Care Centres” data set is provided to Open Data Toronto by the City of Toronto’s Children’s Services division. The data is updated on an ongoing basis with the data used in this post being current as of April 19, 2024. The data set contains one entry for each licensed child care facility in Toronto. Variables in the data set include facility names and addresses, the ward number in which each facility is located, and the total number of individual child care spaces at the facility, amongst others. The data set also contains each facility’s operation type, whether it be non-profit, commercial, or public (City operated). Information about whether each facility has a fee subsidy contract or participates in the Canada-Wide Early Learning & Child Care (CWELCC) system is also included. Figure 1 utilizes the data set’s location information to map each of Toronto’s licensed child care facilities.

Figure 1. Map Showing the Location of Each Licensed Child Care Centre in Toronto.

The “Ward Profiles (25-Ward Model)” data set is provided to Open Data Toronto by Toronto City Planning. The data set of interest found through this resource is the “2023-WardProfiles-2011-2021-CensusData” set which is used to determine demographic information of interest for each ward. Variables of interest for this analysis include ward numbers and names, total population, population under 15, average yearly household income, number of households where English is spoken most often, and population identifying as racialized. This data set is useful when exploring relationships between social or demographic factors and the various data sets made available through Open Data Toronto.

To explore the effect that household income, language, and racialized population may have on child care access, both data sets are used to determine the number of children under the age of 15 for every existing child care space in each ward, as shown in Figure 2. These values help indicate the demand for child care spaces across Toronto’s 25 wards.

Figure 2. Number of Children for Every Existing Licensed Child Care Space in Each of Toronto’s 25 Wards

Figure 3 aims to assess the relationship between average household income and children per child care space in Toronto’s 25 wards. The plot shows a negative correlation between average household income and children per child care space. The seven wards with the lowest average household income have an average of 6.8 children per child care space, while the seven wards with the highest average household income have an average of 3.9. These findings suggest that there is increased competition for child care spaces in wards with lower incomes and decreased competition for spaces in wards with higher incomes.

Figure 3. Relationship Between Income and Child Care Spaces

Figure 4 explores the relationship between English-speaking household populations and children per child care space in Toronto’s 25 wards. The plot shows a negative correlation between the proportion of English-speaking populations and children per child care space. The seven wards with the lowest proportion of English-speaking households have an average of 6.42 children per child care space, while the seven wards with the highest proportion of English-speaking households have an average of 3.9. These findings suggest that wards with large proportions English-speaking population proportions have decreased competition for child care spaces.

Figure 4. Relationship Between Language and Child Care Spaces

Figure 5 examines the relationship between racialized populations and children per child care space in Toronto’s 25 wards. The plot shows a negative correlation between the proportion of populations identifying as non-racialized and children per child care space. The seven wards with the lowest proportion of non-racialized residents have an average of 6.62 children per child care space, while the seven wards with the highest proportion of non-racialized residents have an average of 3.8. These findings suggest that wards with large proportions of their populations identifying as non-racialized have decreased competition for, and increased supply of child care spaces.

Figure 5. Relationship Between Non-Racialized Population by Ward and Child Care Spaces.

As outlined in Figure 3, a negative correlation exists between average household income and children per child care space across Toronto. In Figure 4, a similar trend is displayed with fewer children per existing child care space in wards with higher proportions of English-speaking households. Figure 5 shows a negative relationship between the proportion of residents identifying as non-racialized and the number of children per space. These findings suggest wards with higher incomes, higher proportions of English-speaking households, and lower proportions of racialized residents contain fewer children per existing child care space, and therefore, increased accessibility to licensed child care.


These findings are troubling for a variety of reasons. Children with at least one parent who speaks a language other than English in the home benefit disproportionately from child care access when compared to children from English-speaking households (Maki Park and Giang 2022). Income disparity related to child care access is especially concerning as both dual language and racialized children are more likely to experience poverty (Tome 2021). With Toronto having the highest rate of income inequality between racialized and non-racialized individuals in Canada, these findings are particularly relevant (Tome 2021). As quality child care is an instrumental facet of community, family, and individual health, these findings support any steps taken to ensure equitable access to quality child care across the city of Toronto.

This post was derived from the paper: “Inequitable Access: An Analysis of Licensed Child Care in Toronto’s 25 Wards in 2024”, where these topics are explored in more detail. The paper can be found here: https://github.com/ThomasWilliamFox/child_care_access.git.

Gunaseelan, Vinusha. 2021. “A New Normal for Child Care in Canada: Accessible, Affordable, Universal. Wellesley Institute.”https://www.wellesleyinstitute.com/children-youth/a-newnormal-for-child-care-in canada-affordable-accessible-universal/; broken link.

Maki Park, Jacob Hofstetter, and Ivana Tú Nhi Giang. 2022. “Overlooked but Essential: Language Access in Early Childhood Programs.” https://www.migrationpolicy.org/sites/default/files/publications/mpi_ecec-language-access-2022_final.pdf.

Rhijn, Tricia van, Kathryn Underwood, Elaine Frankel, Donna S. Lero, Karen Spalding, Magdalena Janus, Martha Friendly, and Arlene Haché. 2021. “Role of Child Care in Creating Inclusive Communities and Access for All.” Canadian Public Policy 47 (3): 399–409. https://doi.org/10.3138/cpp.2021-010.

Tome, Samantha. 2021. “Racialization of Poverty.” https://horizonsforyouth.org/blog/racializationofpoverty#:~:text=Racial%20disparities%20therefore%20ccur%20in,the%20%27racialization%20of%20poverty%27.

Underwood, Kathryn, and Elaine B Frankel. 2012. “The Developmental Systems Approach to Early Intervention in Canada.” Infants & Young Children 25 (4): 286–96. https://doi.org/10.1097/IYC.0b013e3182673dfc.

Disease Outbreak Concerns in Toronto’s Long-Term Care Homes

Our first student guest blog was written by Benny Rochwerg, a fourth-year undergraduate student at the University of Toronto, St. George Campus in the Chemical Physics Specialist, Statistics Minor, and Mathematics Minor programs. He has several years of professional and volunteer tutoring experience in mathematics, physics, and chemistry. This paper was written for STA302H1 (Methods of Data Analysis) at the University of Toronto with Professor Rohan Alexander.


Background

Following the World Health Organization’s COVID-19 pandemic proclamation in 2020, 81% of deaths from COVID-19 in Canada occurred among long-term care residents. Since Toronto recently experienced high influenza and COVID-19 activity, it is critical to evaluate disease outbreaks in Toronto healthcare facilities.

Toronto Public Health defines an outbreak as “a localized increase (e.g. in an institution, or a specific ward or floor within an institution) in the rate of infection or illness, above that which is expected.” To gain insight into this issue, the R programming language and several R packages were used to examine Toronto Public Health “Outbreaks in Toronto Healthcare Institutions” open data from 2023.

Results

Figure 1. Number of outbreaks at each Toronto healthcare location type in 2023

As demonstrated in Figure 1, the majority of Toronto healthcare outbreaks in 2023 occurred in long-term care homes, followed by retirement homes, chronic care hospital settings, acute care hospital settings, psychiatric hospital settings, and transitional care facilities.

Figure 2. Number of outbreaks of each type in Toronto healthcare facilities in 2023

Also, Figure 2 highlights that approximately 95% of Toronto healthcare outbreaks in 2023 were respiratory with relatively few enteric outbreaks or other outbreak types.

Figure 3. Number of outbreaks for each first known cause in Toronto healthcare facilities in 2023.

Moreover, Figure 3 shows that COVID-19 was the first known cause of approximately two-thirds of Toronto healthcare outbreaks in 2023. In contrast, other agents were represented to a much lesser extent.

Discussion

As illustrated in the Results section, most of the Toronto healthcare facility outbreaks occurred in long-term care homes, were respiratory, and had COVID-19 as their first known cause. These outcomes may have been exacerbated by the Government of Ontario’s 2022 decision to eliminate mandatory masking in long-term care homes for visitors and caregivers. This is despite the fact that long-term care residents tend to be at least 65 years old as of 2018, an age group that is more susceptible to worse health impacts from COVID-19.

The outbreak data examined here was likely an underestimate of the true total given that asymptomatic disease may not have been detected and recorded and that secondary drivers of each outbreak were not assessed. Consequently, long-term care home disease outbreaks in Toronto and in Canada overall should be investigated to gain a better understanding of this significant issue.

Attribution Statement

“Contains information licensed under the Open Government Licence – Toronto.”


Original Paper

This blog post is a summary of my 2024 paper titled “Long-term care homes were hit hardest by 2023 disease outbreaks in Toronto healthcare facilities” (available here). The GitHub Repository associated with this paper is available here.

Wrapping Up the Year: Our Journey in 2023 

As we enter a new year, it’s a great time to reflect on the strides we’ve made in enhancing our services and engaging with our community. Our team has been buzzing with activity throughout 2023, and we’re excited to share some highlights with you.


Community Engagement and Knowledge Sharing 

Our commitment to staying connected with our audience and sharing knowledge has remained a priority. We’ve published blog stories from our community showcasing how the public uses open data.  We’ve also facilitated and attended a two-day hackathon at Brain Station, delivered two Civic Tech Toronto presentations:  How good is Toronto’s open data? and How does Toronto use open data? and did a CKAN Monthly Live presentation. We also attended the AWS Summit 2023 and the Big Data and AI Conference in Toronto to keep our fingers on the pulse of our technical environment.   

Our presence at conferences like the AWS Summit 2023 and the Big Data and AI Conference has allowed us to engage and learn from others in the field. 


Enhancing Our Digital Presence 

This year saw significant updates to our portal pages, ensuring they remain clean, user-friendly, and up-to-date. We’ve also refreshed our gallery, showcasing new apps and maps that utilize our data, demonstrating the practical applications of our work. 


Enhancing our Data Management and Quality Score 

In a major step towards efficiency and scalability, we moved our intake process to JIRA. This enhances our ability to communicate, track, update, and be accountable for the work we do.  

Moreover, we’ve upgraded our Data Quality Score (DQS) methodology, ensuring the highest standards of data integrity and reliability. 


Technical Advancements 

Our technical team has been particularly busy. We’ve automated parts of our JIRA publishing pipeline, significantly reducing manual work in the data publishing process.  


Public Queue  

We’re gearing up to display the status history of our dataset intake process openly providing both a comprehensive dataset and an interactive dashboard. This way, you’ll be able to see where we stand in our progress of publishing or updating datasets on the portal.  


By the Numbers 

To give you a sense of our productivity this year, since December 2022: 

  • We’ve handled a total of 199 tickets in our queue, addressing a wide range of requests and inquiries. 
  • 102 significant updates to existing datasets – this includes updating schemas, metadata, or formats. This does not include adding records, which we do dozens of times daily
  • We’ve introduced 32 brand new datasets. 
  • Out of the tickets received, we’ve “closed” 42 of them. This closure indicates instances where no further action was necessary or possible in response to the requests made. 

As we bid farewell to 2023, we’re grateful for the challenges we’ve overcome and the milestones we’ve achieved. Our journey continues, and we look forward to another year of innovation, engagement, and excellence in serving our residents, visitors, and the general public.  

How Fast Can You Go Places Using the TTC?

Welcome to our newest blog post, where we’re exploring a game-changer in Toronto’s public transit scene! Meet time2reach, an interactive transit map created by university student Henry Nguyen. This clever tool uses data from Open Data, including TTC Routes and Schedules, along with information from other regional transit agencies, to give you the best travel routes and times across the city. From Scarborough to Clarkson, time2reach makes navigating Toronto’s public transit system a breeze. Whether you’re a daily commuter or just planning a day out, this map is your new go-to guide. Join us as we delve into how this innovative map is simplifying travel in Toronto! Written by Henry Nguyen.


Time2reach is a map that shows you how fast you can travel from a place using public transit. The more “yellow” an area is, the quicker you can reach it from your starting point. Hover over any point to see which trains or buses you need to take to get there. The default starting location is at Union Station. To change the starting point, double click anywhere on the map. The map updates to show commute times from your starting point.

The interactive map uses a colour-coded scale from yellow — which indicates shorter trip durations — to dark purple, which indicates longer trip durations. 

Just for fun, I included a few filters that let you choose your mode of transit. For example, if you want to exclude Pearson Express and GO trips, then you can deselect the agencies from the right menu. The resulting map without GO Transit looks like this:

As another example, to see where you can go from Sheppard-Yonge without using any subway line (e.g. Line 1 is down), then you can also unselect the “Subway” mode.

The resulting map looks like this:

There are a lot of other filters for you to play around with as well! For example, you can change the time (midnight transit options are different!) and adjust the maximum duration of the trip as well. 

Just for fun, I also tried to make an animation of the transit access map over the course of a day. I thought it’d be interesting to see how transit changes around our commuting patterns. Here is the 24 hour animation for Toronto.

If you noticed the ~15 minute blips at Weston / Pearson Airport, that’s because of the Union Pearson express! The trains have a 15 minute frequency, so it’s extremely quick to reach Pearson Airport if you time the train just right. You might also notice lower frequency blips near the GO Stations in Mississauga or on the Lakeshore line. If you time your departures just right to catch the train, then it’s extremely fast to reach any station along the line.

Just for comparison, here is a similar 24 hour timeline for New York City.

The animations show the “heartbeat” of each city. Frequent rapid transit makes large distances feel small. It’s also cool how this perceived distance changes over the course of a day. The Finch area is normally very accessible, but not past midnight when the trains stop running. But above all, the animations show how transit infrastructure connects neighbourhoods. This is the best evidence for running buses and trains more frequently, or for 24-hour transit. The city literally becomes farther without transit.

Open Data for Open Water Swimmers

Dive into the world of ‘Open Water Data,’ a must-have digital companion for every open water swimmer and enthusiast. Developed by Mitch Bechtel, this platform offers real-time data on water quality, temperature, and weather conditions for beaches in Toronto and around the globe. With an interactive map to explore popular swim spots and an array of data sources, you’re assured a safer and more informed swim. Let’s explore this essential tool that’s making waves in the swimming community.


Open Water Data informs open water swimmers and other water recreation users with a wealth of information about popular beaches in Toronto, and around the world.

Overview

Water quality, temperature, waves and weather are all critical to open water swimmers. Open Water Data informs open water swimmers and other water recreation users about:
• Water quality pass/fail
• E. coli & enterococcus levels
• Water temperature
• Wave height, period and direction
• Wind speed, gusts and direction
• Air temperature
• Weather

The website can be accessed at https://www.openwaterdata.com/

The Map

The map displays nearby beaches and swim spots. Map pins indicate nice places to swim, boat or just relax by the water, and if recent water quality tests passed (green), failed (red) or are not available (grey).

Beach Details

Click a map pin to get more details, including the latest known conditions, historic graphs, open data downloads, and even see where people have been swimming in the water. A beach photo, description, features and links to relevant information are also available for most beaches.

Swim Data


If you have a Garmin smartwatch, you can also track and share your swims with others. Review past swims, including locations, routes, distance, duration, pace, heart rate, strokes per minute and water temperature. Your swim paths can also be shared anonymously to let others see where people are swim at the same location.


Data Sources

In addition to swim paths and water temperature data collected by participating swimmers, data is collected from a number of third-party data sources, including:

City of Toronto
City of Hamilton
Niagara Region
Swim Drink Fish
Surfrider Foundation
WindFinder
Weather API
• and many more municipalities and agencies around the world

Smart Buoys

Open Water Data has also designed, built and deployed several IoT water temperature buoys at beaches throughout the Greater Toronto Area, which update the website with near-realtime water temperature measurements at their corresponding locations. This is particularly useful in Lake Ontario, where the water can “flip” causing the temperature to drop more than 10 degrees in a matter of hours. For example, check out early October at Kew-Balmy Beach.

Open Data

All open data collected by the site from various data sources, including the smart buoys, is aggregated and shared as open data. Each open data beach measurement can be downloaded in CSV and JSON format from the corresponding beach location and data tile. In addition, open data is also available from a powerful API, available upon request. For more information about open data available from the Open Water Data website and API, see https://www.openwaterdata.com/open-data/.

Photo Attribution

Promo Photo by Todd Quackenbush on Unsplash.

BikeSpace – Mapping Toronto’s bike parking needs

Cycling is a pivotal aspect of urban commuting, but one recurring challenge for riders is finding suitable parking. Even in a city as expansive as Toronto, safe and accessible bicycle parking spots are often difficult. Imagine a tool that highlights those gaps, allowing cyclists to voice their parking concerns and provide actionable data to the city. Enter BikeSpace: a community-driven solution that pinpoints where Toronto can improve its bike parking infrastructure. Through this web app, riders can report issues and suggest potential parking spots, actively shaping Toronto’s cycling future. Dive in to discover how BikeSpace works, its roots in the Civic Tech community, and what the future holds for Toronto’s cycle enthusiasts. Come out and meet (support) the volunteers behind this app – http://civictech.ca


(more…)

Towards an updated Data Quality Score in Open Data

Why and how Open Data Toronto is updating its score to assess data quality  

In 2020, we at Open Data Toronto started assigning and publishing Data Quality Scores (DQS) for a number of datasets. Now, in 2023, we’re updating some of the finer details of how we calculate and present these scores so to benefit data users and owners alike. 

How it started 

Way back when our portal started, we measured success in number of datasets; the more open datasets we had, the better we thought we were doing. 

This was wrong for a couple reasons: 

  1. Users don’t care how many datasets we have per se – they care about the datasets they want 
  2. Having more datasets doesn’t necessarily mean more data is being used 
  3. Data publishers thought that publishing more was key, and we worried this incentivized them to publish more datasets instead of better datasets 

If we stayed the course of “more is better”, then we risked making a swamp; a catalog of hundreds of out-of-date datasets in less-than-open formats, with no metadata and no context. It would be like a huge library, but you can never find what you want, and most books are in a language you can’t even read.  

Aiming to curtail this, we started assessing a basic measure of “quality” of our datasets. If you’re curious about our initial inspiration for doing DQS and how it worked, see the articles below for more: 

In short, though, we started with evaluating 5 general “dimensions” of data quality: 

  • Usability – how easy is the data to work with? 
  • Metadata – Is the data well described? 
  • Freshness – Is the dataset up to date? 
  • Completeness – Is there lots of missing data? 
  • Accessibility – Is the data easy to access for different kinds of users? 

We would evaluate the data based on those 5 dimensions, assigning it a score for each dimension. We would then sum those scores, weighing each dimension differently, to create a final score (out of 100) and grade (Gold, Silver, or Bronze) based on that score. 

We would recalculate that score for every dataset we had in our database (which is not every dataset – some are stored in files – more on that later) every week and present the grade on each dataset page. 

This is a screenshot of the dataset page and where the DQS is displayed. This is a current view.

How it’s going – We changed how we calculate scores

We’re keeping the paradigm of 5 dimensions. This is largely for consistency’s sake, but also because the existing dimensions do a good job of organizing what we’re measuring. We did; however, change how each dimension was weighted. 

We have changed the underlying metrics quite a bit; we’ve added some new ones and edited or removed some of the old ones.  Dimension scores are based on the mean of all its metrics’ scores. The dimension name, its weight in the overall score, and the exact metrics being evaluated are below: 

Freshness (35%) 

  • Has the data been refreshed on schedule?
    • For example, if a dataset is supposed to be updated weekly, but it hasn’t been updated in 2 months, it gets its score penalized 
  • Has the data been left unrefreshed for more than 2 years? 

Metadata (35%) 

  • Has all required metadata been provided by the data owner?
    • To be specific, on the left hand side of each page we show some information about the dataset, like:
      • Which division owns the data? 
      • How can this data be categorized in the context of our other datasets (is it about public health? Transportation? Parks and Recreation? Etc) 
      • Is there a website where users can learn more? 
      • And so on… 
    • If any of these are missing, this score gets penalized proportionally to the number of missing metadata 
  • Is the contact email associated with the dataset connected to the data owners team, or is it to a placeholder email like opendata@toronto.ca? 
  • Is the “Learn More” URL a valid URL? 
  • Are data definitions missing?
    • Each column in a dataset has an English definition. If those are missing, the score gets penalized 

Accessibility (15%) 

  • Are there any tags (keywords associated with a dataset to make it easier to find) on the dataset?
    • We use these tags behind the scenes to help the search bar on our homepage find you the datasets you’re looking for 
    • If there are no tags, this score is penalized 
  • Is this dataset manually updated by Open Data or automatically updated?
    • Some datasets, behind the scenes, stay up-to-date automatically. Others need to be manually updated by the Open Data team. This latter group gets penalized 
  • Is the data stored as a file or in the Open Data database?
    • If data is stored in the Open Data database, our site can provide it to you in multiple formats and give you a preview of the data 
    • Data not in the Open Database will be penalized 

Completeness (10%) 

  • Does the data consist of more than 50% null values?
    • This is penalized based on the percentage of missing values, so long as that value is over 50% 

Usability (5%) 

  • Do columns have meaningful names?
    • Scores are penalized if less than 1/5th of columns have meaningful English components 
    • Scores are penalized based on the number of columns in the dataset 
  • Do columns have constant (each value is the same) values?
    • If all of a single column’s values contain, for example, “NA”, the score is penalized 
    • Scores are penalized based on the number of columns in the dataset 

The change in weight is important! Since our first iteration of the DQS, we’ve learned that users value the freshness and metadata of datasets a lot (no one wants stale data, and people want to understand data they’re consuming). For that reason, we made those dimensions responsible for the lion’s share of a dataset’s DQS score. 

Similar to before, we assign the grade based on set thresholds:  

  • 80% score and above gets gold 
  • 60% to 79% score receives silver 
  • Everything else under 59% gets bronze. 

We changed how we display scores.

Previously, each dataset page showed its grade. We’ve kept that, but we’re adding a new “Data Quality” section of the page, where we break down the details of why a dimension was given its score. 

This is a screenshot of the DQS displayed on the open data package page. This displays the score of each criteria and its explanation.

In this new section, we detail when the DQS was last refreshed, what its grade was, but also what its score was. Then, below that, we show each dimension score, its definition, and why the dimension was not scored at 100%.  

In our initial user testing, a lot of people wanted to see information about metrics, thresholds, and weighting directly on the page, so we added it into info icons scattered throughout. 

As we did before, we put all these scores together into a single “Catalog Quality Scores” dataset on our portal. It can be downloaded like any other dataset on our portal, and it contains dimension scores, and the reasons why scores weren’t 100%, over time. 

We changed what gets a score 

Before, we would only score datasets that were in Open Data’s database. If the data was only a file (it was labelled as a “Document” on our portal, and could only be downloaded in a single format) we would not give it a DQS. We changed that – now it will receive a score. It’s important to note here that Documents will be scored on only 3 dimensions; automatically scoring Completeness and Usability with files of a varying number of formats is difficult, so we put it out of scope. 

Additionally, some of our datasets will have multiple downloads, or “resources”. We used to give a dataset an overall DQS, but now we give each resource a DQS. 

Who is DQS for? 

We made DQS with the data consumer in mind, for sure. However, this newer iteration of DQS also considers the data owner, too. 

You’ll notice that some of the new metrics, while insightful for data consumers, aren’t always actionable for data consumers. Take the example of a broken “learn more” URL. Sure, it’s good to know, but it probably won’t change how someone uses the data once they get their hands on it.  

The data owner, though, will now have this identified for them once it occurs. Additionally, we think the idea of improving public-facing scores and grades will incentivize data owners to keep data quality high.

Finally, this metric is for us at Open Data Toronto, too. The “Catalog Quality Scores” dataset mentioned above will let us monitor data quality over time and identify trends in where data quality issues are.  We’re hoping that if we identify trends now, we can get ahead of them in the future. 

Thoughts for future iterations 

We know this won’t be the last version of DQS. There are so many ideas that we knew we wouldn’t be able to stuff into this deployment. These include:

Reusable DQS Logic 

Our logic 100% can be reused by anybody – there’s nothing unique about it, and we share it on our GitHub. Because the code is tailored to Open Data Toronto’s environment, though, you can’t really copy-paste/fork much of it. 

We considered turning this code into a CKAN extension so that other organizations that leverage the same portal backend as us (province of Ontario, Government of Canada, and dozens more) can also reuse our DQS easily.  

DQS over Time 

We currently track DQS for each dataset over time. However, we don’t visualize this anywhere on our portal. We’d love to add this so that both owners and consumers can get an idea as to whether a dataset is collecting dust or being kept evergreen. 

Considering Popularity/Usage 

We currently track dataset usage on our portal. However, we didn’t integrate those statistics into how we calculate DQS. In our heads, it was tricky to balance the importance of how “clean” or “up-to-date” a dataset is versus how popular it is, and then combine that into a single, coherent score… especially one that would enable data owners to improve their data. That being said, we would like to make that dataset usage more visible to our users, be it as part of the DQS or otherwise. 

Measuring if a dataset can be combined with others 

This is a computationally expensive and somewhat complicated idea, but we know it’s possible. If we could integrate foreign key analyses between attributes in many datasets, it would be a really useful metric, either for DQS or otherwise. 

We have to emphasize that we know this would be a tall order. However, getting this kind of information gives us line of sight on value-adds to data, and would let us enable our users to make simple data models based on our catalog. Finally, this would be a useful tool for us to identify datasets that are related to, or even duplicates of one another. 

When Pandemic Met Data – A Journalist’s Journey into the Open Data Portal

In this blog post: Matt Elliott, Publisher of City Hall Watcher newsletter and Toronto Star contributor, shares his experience of how the city’s Open Data Portal transformed his reporting during the COVID-19 pandemic. Stranded without traditional municipal news to cover, Elliott dove into a wealth of available datasets, discovering a myriad of untold stories about Toronto. Grab a cup of coffee, sit back, and join Matt as he takes you on an intriguing journey of discovery, right here in Toronto.


(more…)

Decoding Transit Delays: A Data-Driven Dive into the Toronto Transit Commission (2014-2022)

TTC streetcar

We are delighted to welcome Ehsan Kaviani, a seasoned data analyst, as our next guest blogger. Kaviani offers a deep-dive into the Toronto Transit Commission (TTC) public transportation system. His comprehensive data analytics report scrutinizes subway, streetcar, and bus delay times from 2014 to 2022. Through a series of data visualizations, Kaviani identifies significant factors influencing delay times and provides insightful recommendations for enhancing service quality. Enjoy reading through his analysis, as he explores TTC’s operations and the power of data analytics in driving urban efficiency. Get in touch with Ehsan: https://www.linkedin.com/in/ehsankaviani/ and as always, if you have a powerful story to tell through your data analysis, we’d love to feature your story.


(more…)

Measuring Sound from the Bedroom Window

In this captivating blog post, Ingrid, a local resident, shares her personal journey of uncovering the impact of noise pollution on her life and her quest to better understand and quantify its effects. Amid the COVID-19 pandemic, Ingrid noticed changes in her soundscape that negatively affected her quality of life, leading her to take action. Ingrid’s story aligns with the goals of the Open Data Master Plan, advocating for data-driven solutions and emphasizing the importance of including community-sourced and third-party datasets. This approach not only encourages innovation but also bolsters civic engagement, empowering citizens to actively contribute to improving their urban environment. Ingrid’s efforts to collect citizen-submitted data showcase the power of grassroots initiatives and the significance of community involvement in creating better urban spaces.

So, if you too have embarked on your own journey of data collection and would like to share your story, please reach out to us at opendata@toronto.ca. We’d love to hear from you! Now, dive into her story.


My story (Ingrid’s story)

Before COVID I started to notice changes in my soundscape. When I was in my bathroom I could hear a car’s muffler on the highway, bikes and cars started to be so loud that the noise interrupted my conversation with my partner on the 25th floor of my condo. I couldn’t enjoy my balcony anymore.  Then they started to wake me up at night, more frequently. My residence hadn’t changed, so something in my environment had. 

One morning around 3am, during the lockdown when our streets and highways were mostly empty, I was woken up by one single motorbike on the highway. I heard it for 10 minutes (for about 7 km) and knew where it was by the sound. For example, it got softer when it was in a depression and louder when it used an elevated ramp to connect to another highway. 

This one driver, this one engine, this single source of sound, disturbed my sleep and surely the sleep of many others. I wondered how many people’s sleep had been disturbed? How loud was it? This was one driver passing by my place, one time. What about when there are more? How many people does one motorbike, or one modified vehicle affect in its daily use? What are the health effects of this noise, other than feeling terrible the next day? I decided to learn about sound – and unwanted sound – called noise. 

Health Effects of Noise

Toronto Public Health commissioned a Noise Study in 2016, the results are in How Loud is Too Loud.  I learned noise pollution is the #2 Urban Environmental Health Hazard, right after air pollution. Negative health effects start at 55 decibels (db) and the World Health Organization (WHO) recommends 45 db for a restorative sleep. Not only does noise impact our quality of life, but also has other health impacts, such as cardiovascular effects, cognitive impacts, sleep disturbance and mental health effects. The Ontario Ministry of Environment and Climate Change has recommendations for road-related noise thresholds: for sensitive land uses, such as residential uses, mitigation measures are required if outdoor levels at the centre of a window or door opening exceed 55 dBA daytime or 50 dBA nighttime. The study found that 88.7% of the population is estimated to be exposed to levels above 55 dBA during the day, and 43.4% is estimated to be exposed to above this level at night.

In the report, heat maps showed that our highways and arterial roads produce between 70 to 90 db and were loud, yet they didn’t reflect what I was living with. I wondered if I was alone and started tweeting about excessive vehicle noise. I learned that I was not the only one annoyed at this small percentage of people who created this unnecessary noise that was impacting our health and quality of life. My anxiety climbed along with my level of annoyance at the increasing frequency of the noise events most days and nights.

Data Collection

My background in HR and IT taught me that data tells a story, so I knew that I needed to start collecting data about what I was experiencing. I bought an environmental sound meter in July 2021 and immediately felt vindicated when it logged multiple noise spikes through the night, going from 60 or 70 db to 80 and even over 90db.  

Then a friend forwarded a link to Toronto Metropolitan University’s YouTube on the Health Effects of Urban Traffic Noise featuring Tor Oiamo. Not only was it informative, when he said that “we don’t know enough about noise at the bedroom window” I shouted, “Yes!” The models use averages, because without access to a “bedroom window” this was the best data they had. Here was my niche. First, I needed to measure my soundscape and eventually also from other people’s bedroom windows to understand what we live with. Modelling can only go so far, it was time to collect citizen data. 

I learned that when taking measurements, the average sound level (LEQ) is like a high water mark on a lake, obscuring what lies below. And it was those events that maybe didn’t break the surface, but were audibly different frequencies and events that annoy us and impact our health. 

I was lucky enough to have my meter on a balcony when the Gardiner was closed for the Ride for Brain Health.  In the graph below, you can see the drop of the average when the Gardiner was closed at 2:30am, which exposed the resulting noise events that a model will not show. The orange line is the World Health Organization’s recommended level for a restorative sleep – 45 db. The measurement below was taken with an Environmental Sound Meter from Cirrus Research.

These spikes are caused by vehicles and motorbikes.  Combined there were over 70 spikes between when the highway was closed at 2:30am and reopened at 2pm. Looking at the spike at 12:30 am from 78 to 98 db, our perception of that loudness is not an increase of 20 db. We perceive that to be an increase 12x louder than it was at 78 due to sound being logarithmic, for example to go from from normal conversation to someone pointing a hairdryer directly at your ear. Now it was easier to “see” what we feel and live with.

In the graphs above, these spikes are caused by vehicles and motorbikes.  Combined there were over 70 spikes between when the highway was closed at 2:30am and reopened at 2pm. Looking at the spike at 12:30 am from 78 to 98 db, our perception of that loudness is not an increase of 20 db. We perceive that to be an increase 12x louder than it was at 78 due to sound being logarithmic, for example to go from from normal conversation to someone pointing a hairdryer directly at your ear. Now it was easier to “see” what we feel and live with.

The graph below is from the location with the loudest spikes, which was the 15th floor at Sherbourne and Adelaide. Over the course of 7 days there were at least 44 spikes over 90 db (double the WHO recommendation) both day and night, due to sound bouncing off of the buildings. This measurement was taken with a noise sentry meter built by Convergence Instruments.

I have only one location that met the WHO guideline of 45 db, a leafy residential neighbourhood, The Kingsway.

I have now collected 1,000s of hours of recordings from my balcony and other locations, that produce csv’s of sound levels by the second.  I am working with people from Civic Tech Toronto to build a database to query this data and to possibly make it available to the public.

City of Toronto Noise Bylaws

The city’s noise bylaws were being discussed at City Hall in Spring of 2022, specifically vehicle noise. The city received over 900 emails regarding noise from combustion engines. I heard from both the city and people that catching moving violations is difficult for everyone and that residents had lost hope in reporting noise complaints. 

To help people know that they were not alone, and maybe to help TPS target problem areas, I created the Not 311 Noise Report. Through this web app, people can identify the location of the noise source, what produced it and when. Different from 311, people can see the location of all noise reports in the Noise Report Dashboard. In two months, we had almost 1,800 reports. In the graph below you can see a chart of the different noise sources and day of the week; cars, pickup trucks and motorcycles being the most common complaints.

This graph shows the day of the week and number of reports spikes of noise. Being vehicle the loudest.

I am pleased to announce that I have just released a more comprehensive  “Not 311” Noise Report in advance of the full launch on May 4th, in conjunction with  Hot Docs Citizen Minutes Program.

You can view the Not 311 Noise Report Dashboard and learn more here: No More Noise Toronto.

Conclusion

We learned during COVID that cities aren’t loud, vehicles are loud. And I also learned that what changed in my environment was the removal of the Drive Clean program by the Provincial Government, eliminating vehicle inspections. 

With the transition to EVs there is an opportunity to advocate for stronger noise regulations on all levels of government to ensure that those who need to hear vehicles do, and those of us that don’t need to hear them for our safety – don’t.  Dodge should not be able to produce a vehicle that emits 126db of unnecessary vanity noise. 

We also know that Toronto is building more condos than any other city in North America and these people will be living above any type of noise mitigation (trees, green spaces & noise barriers). While noise dissipates over distances, clearly one engine can be heard from 7 km away and be heard over 50 stories high. This will place a greater burden on our healthcare system and reduce the quality of life and negatively impact the health of those living within earshot. 

We can make a healthier, calmer city and citizen submitted data is the story that will help us do that.

This supports the city’s vision of an inclusive and collaborative open data ecosystem, where diverse contributors can actively participate and add value to the wealth of information available to the public.

Portal Analytics

As members of the Open Data Team, we were thrilled to dive into our own dataset and analyze the numbers to gain insights about our own portal’s performance and impact. The process was enlightening, and we were eager to share our findings with others.

To shed light on our journey of self-discovery, we interviewed Reza Ghasemzadeh, a technical expert on our team who led the analysis. Our conversation delved into the details of his analysis, revealing key findings and insights that we believe will be valuable when working with this dataset. Let’s first define some terms before jumping in:

  • A session is a set of interactions from a single identified user in a given time frame to each page.  
  • A visit refers to a single instance of a user accessing a webpage
  • Download refers to clicking on the “Download” button on our dataset page 
  • A dataset (dataset package) is a collection of structured or unstructured data 
  • A file is a collection of information or data that is typically characterized by a specific format (ex. csv, xlsx, etc)
  • Dataset visits are analyzed for the entire year while the dataset downloads are analyzed from March to December of 2022 (due to the system not tracking any data from January and February of 2022). 
  • At the time of analysis, we assumed the City had 41 divisions; however, we confirmed the City has 44 divisions, some which were added in 2022. We used the number 41 because at that time it represented the number of different Publishers/Divisions we had in the open data portal.
  • Civic issues are based on datasets that have been #tagged with a civic issue. Datasets that are not tagged will not be part of this analysis. 
  • The data is available for you to analyze here: Open Data Web Analytics – City of Toronto Open Data Portal 

Top 10 Divisional Datasets visits

The City has 41 divisions (not including Agencies, Boards & Corporations). Some have plenty of datasets; other divisions don’t have as much. Our first exercise involved examining the traffic and engagement patterns of different datasets across the various divisions that publish data throughout the portal.  

The total number of dataset visits on our portal was 174,593 sessions in 2022. To put this into perspective, we hosted around 500 dataset visitors, on average, every day in 2022.  

This analysis helped us answer questions such as, “which divisions attract the most traffic?” The following table displays the most popular divisions based on the sum of dataset visits (sorted by total sessions) in 2022: 

Owner division Sessions total  Sessions share pct 
Transportation Services19888 11.39 
Social Development, Finance & Administration 19631 11.24 
Toronto Public Health 16405 9.4 
City Planning 13498 7.73 
Information & Technology 13210 7.57 
Municipal Licensing & Standards 12788 7.32 
City Clerk’s Office 11848 6.79 
Toronto Police Services 11019 6.31 
Shelter, Support & Housing Administration 9302 5.33 
Parks, Forestry & Recreation 8314 4.76 
Table: City of Toronto Divisions & total sessions and sessions share percentage

The sessions share percentage (pct) is simply derived by dividing the divisional value by total value. This is done to make sense of the numbers.  

Reza thought it would be interesting to create a cumulative sum plot of these shared percentages, after sorting them from highest to lowest. To do so, he took the highest share percentage (11.39%) as the first point (X=1) in our plot. So, the first point has x=1 and y1=11.39%. For the second data point (X=2), he added the second highest share percentage (11.24%) to all the values bigger to it. So, y2=11.39+11.24=22.63%. He continued this until he exhausted all 41 division rows. 

The resulting graph presented below, shows us that the top 4 divisions (out of 41) are responsible for 40% of visits; the top 10 divisions attract nearly 80% of our portal dataset visit traffic in terms of total number of sessions.  

Four City Divisions account for forty percent of all web sessions to open.toronto.ca; while ten Divisions account for nearly eighty percent.

Analyzing this data provided us with some insights into which datasets are most popular or useful for different groups of users, as well as identifying potential areas for improvement or further development, which led us to our next analysis.

Top 5 most visited dataset packages 

We then asked, which datasets/packages are the most frequently accessed, what types of data are most popular, and which areas of interest are most represented among users? Analyzing the top 5 most visited dataset packages on the open data portal provides valuable insights for optimizing the user experience, as well as informing strategic decision-making around resource allocation. 

Currently we are hosting more than 430 dataset packages on our portal.

 It was interesting to find that the top dataset package visited – Neighbourhood Profiles – did not correlate to the top division visited – which was Transportation Services, as seen in the above analysis of Top divisional dataset visits. 

The top visited datasets were the following: 

Neighbourhood profiles – The Neighbourhood Profiles provide a portrait of the demographic, social and economic characteristics of the people and households in each City of Toronto neighbourhood. The data is based on tabulations of 2016 Census of Population data from Statistics Canada.  

3d Massing – This is a geospatial 3D ESRI shape / 3D CAD format file of building shapes for City of Toronto 

Outbreaks in Toronto Healthcare Institutions – This dataset includes list of outbreaks in Toronto healthcare institutions – including hospitals, long-term care homes, and retirement homes – which are currently active, those that have been declared over for the current calendar year. Year-to-date data for the current year are updated weekly, each Thursday. 

Identifying Trends Across Datasets 

We also analyzed the trendline of the top 5 most visited datasets of 2022 based on session count. We observed that Outbreaks in Toronto’s Healthcare Institutions was particularly popular in January, which coincided with the peak of cold and flu season. Additionally, Elections saw a surge in views in October, aligning with the timing of the fall 2022 elections. Interestingly, 3D massing maintained consistently high levels of views throughout the year, suggesting a high level of interest among urban planners, architects, real estate developers, and geospatial analysts.  

From the above plot, we could also identify potential trends or patterns in user behavior, such as whether certain datasets are more likely to be accessed during certain times of the year.  

Top 5 most downloaded datasets  

As we delved into the realm of data analytics, it was intriguing to explore the trends in dataset downloads. The accompanying bar plot represents the total number of downloads for each available file within a dataset, highlighting the five most frequently downloaded datasets in 2022. A noteworthy observation is that while some datasets (dataset pages) may have high “visits,” the most downloaded datasets may differ significantly, as evidenced by the inclusion of only two datasets, “outbreaks in Toronto healthcare institutions” and “neighbourhood profiles,” in the top five downloaded datasets. 

Top 5 most downloaded files 

This analysis helped us understand which files are the most popular, what types of data are in high demand, and which areas of interest are most represented among users. It could also help identify areas for potential improvements, such as improving the accessibility or usability of certain files. 

Outbreaks in Toronto Healthcare Institutions was by far the most accessed dataset file in 2022, more likely because of the surge of the COVID virus.   

Civic issues people are most concerned about 

The Civic Issue Campaign aimed to prioritize the release of datasets based on five key civic issues: affordable housing, poverty reduction, fiscal responsibility, climate change, and mobility. These issues were identified through a combination of strategic priorities for the city. We’ve written previously about them and our civic issues campaign. The campaign considered the impact of these issues to determine which datasets to release next. 

According to the available data and related tags, it appears that certain civic issues are of greater interest to people than others. These issues have been ranked based on the number of times they were visited, with Mobility being the most popular, followed closely by Affordable Housing, Poverty Reduction, Climate Change, and finally Fiscal Responsibility.

There are a few possible reasons why Mobility is the most searched civic issue. One reason could be that Toronto is a large and densely populated city, which often leads to traffic and transportation being major concerns for residents. Another explanation may be that the Transportation Services division was the most visited divisional data on the portal, which suggests that people are particularly interested in issues related to transportation in Toronto. 

(Keep in mind not every dataset is associated with a civic issue tag and therefore this is only a count from 189 out of 443 datasets, that’s 42% of the entire catalogue as of when this blog was written). 

Based on the insights and findings shared in this blog post, what are some potential actions or use cases that you can think of for these analytics? How might this information be helpful to you or your organization? We’re always happy to collaborate.

3 Questions…

Why reinvent the wheel? We have some frequent questions that were asked through our Reedit forum administered a few years back with the team. We’ve sifted through them and found 3 more general questions we’ve received throughout the years. Also, check out our FAQ page with some other common questions. What are some of your specific questions that you might have for our team? Send it to us via opendata@toronto.ca.

What do you recommend as a tutorial to best explore your data? For example, when someone first stumbles upon a dataset, how do you recommend going about extracting valuable information from it?

Everybody has their own approach to data exploration, since it’s a bit of art and science. Although we don’t have a video, we are creating “data stories” to share how we analyze data. These are a new concept and we are still refining them:

Further, here’s roughly what we suggest when you first stumble upon a new dataset:

  1. Learn about the context. Why was it collected, and how? What are the known limitations of it? This usually helps minimize confusion later on in the process (e.g. maybe there is some data missing, or values are defaults, or how it was collected changed at some time period so standardization will be needed)
  2. Review the data attributes (e.g. columns in a table) to get an idea of what the data contains. I make note of datetime fields at this stage because durations (i.e. time between datetimes) may be possible.
  3. Identify questions I would like to answer. The focus is not on what can actually be answered; rather, this helps not be too narrow our way of thinking in the beginning. If in a team, we do this separately first for better and more diverse ideas.
  4. Narrow down to questions I think can be answered with the data and given the timeframe available. These are not really “final”, but they provide guidance while exploring the data because without them it’s too easy to get caught in a never-ending data exploration cycle. The questions also serve as a finish line to avoid this.
  5. Prepare the data. This includes initial cleaning such as standardizing date formats and ensuring the attributes are treated correctly (e.g. numbers are not treated as text); as reshaping it so it can be visualized; and transforming it by creating attributes as needed, such as duration from date fields.
  6. View each feature individually to better understand it and identify outliers (e.g. if it’s a number, I’d look at the distribution of values). Profiling tools such as the Python Pandas Profiling library (https://github.com/pandas-profiling/pandas-profiling) make this easy (we are also working on the Pandas Exploration Toolkit but it’s very early stages and still customized for our use: https://github.com/open-data-toronto/petk)
  7. Visualize the data, now that it’s prepared to work with the software, using the research questions guide the exploration. As I learn more about the data throughout this process I update my questions and assumptions. Here’s an example of a visualization dashboard for data exploration from our first data story, built in Tableau: https://public.tableau.com/profile/carlos.hernandez#!/vizhome/BuildingPermits-SampleExplorationDashboard/BuildingPermits-DataExplorationDashboard
  8. I make note of all the exceptions, assumptions, and questions about the data that have come up from steps 5-7 to bring up to the expert of the dataset, if fortunate enough to have access to one.
  9. After all this, with a much better understanding of the data, full-fledged analysis starts. I’ve depicted it to be a linear process for ease of communication but it’s a very cyclical process.

How can we get more involved with using Open Data?

Come out to meetups, hackathons, events, and co-design sessions! We’d suggest finding a social issue you’re interested in, such as ridesharing (for example), and thinking about the ways in which you can participate meaningfully. Everyone comes with a breadth of skills that can make them integral to planning processes, whether that’s research, analysis, product design or facilitating conversations.

Since there’s a lot of public concern surrounding data collection, analysis, and usage; do you publish a document with best practices for your purposes?

Our mandate is to help release good quality data that has no confidential or privacy content. The data released is collected from around the City and our agencies. The divisions are the data stewards and subject matter experts of the data collected and maintained in City repositories.

When we initially embark with a division on releasing a data set, we collect all sorts of metadata around that particular data. Examples are: Collection method, storage location, descriptions, limitations, data dictionaries, etc. We publish readme files or data dictionaries that help the user understand the content of the data. If a user has any further needs or clarification, we help them get in touch with the division who supplied the data for more information or analysis. We also host our policy document on the open data site.

We’re back and we’ve missed you…

It’s been a minute, but we’re back ON! We’re a little older and more mature. Lots has happened in the last couple years, but I think the good news is that we’re still kicking and UBER interested in what you’ve been up to. We’re sure you’re also wondering how all the data the City collects has/is being used? Well, we’re curious too!

As members of the open data community and gate-keepers of the data, we are well aware of the power that data can have in driving change and improving our City. In recent years, Toronto has been at the forefront of this movement, with a renewed interest and focus on analyzing open data to gain new insights into the functioning of the City and identify areas for improvement.

We hope you’ll be seeing us take up more space here, as the open data Toronto portal is an exciting development for those of us who are passionate about using data to gain insights and drive change. With the abundance of data now available through the portal (over 400 to be exact), there are countless opportunities to explore and analyze.

We’re excited to share your stories on our portal, and to hear from our readers about their own experiences using open data.

We’re particularly interested in publishing analysis on topics that has affected our beautiful City, but even world-wide:

  1. How open data is being used in the fight against COVID-19, such as tracking the spread of the virus and identifying hotspots
  2. The impact of open data on small businesses and start-ups, such as helping them to identify new opportunities and improve their operations
  3. The use of open data in urban planning and transportation, such as identifying areas for improvement and reducing congestion
  4. The housing crisis & identifying neighbourhoods that need more resources.

If you have any ideas or analysis that you would like to share, please don’t hesitate to reach out to us. We’re always looking for new perspectives and insights on how Toronto’s data sets are being used.

We are all familiar with the endless possibilities that open data can offer, and Toronto’s success in this area is a testament to the hard work and dedication of the open data community – that’s YOU! With open data initiatives constantly evolving and growing, we can expect to see even more innovative and impactful uses of this valuable resource in the future. Let’s continue to push the boundaries of what open data can do and drive positive change in our cities. Get in touch with us opendata@toronto.ca or on twitter https://twitter.com/Open_TO!

Analyzing results from the civic issues campaign survey

Background: What is the civic issue campaign?

In 2019, the Open Data team launched a campaign in order to help us identify as well as prioritize the release of high-quality, in-demand data linked to the City’s 5 priority civic issues. By doing this, not only do we look to target the release of datasets that align to these issues, but we also want to understand how these datasets may help mitigate or solve these issues. The Mayor’s Office as well as the Open Data team have identified the 5 priority areas as: Affordable Housing, Poverty Reduction, Climate Change, Fiscal Responsibility and Mobility. Together, these efforts align to our commitments outlined in Section 1b of our Open Data Master Plan.

Data Collection Methodology

Through the creation and distribution of a public questionnaire, we collected a total of 875 dataset requests, each of which indicates the type of data a respondent would like to see available, what civic issue(s) the dataset aligns to, as well as their specific output.
Screenshot of a Checkmarket question asked the user to identify the data they feel should be made available in order to address fiscal responsibility issues.
We shared the questionnaire through our social media channels (Twitter, Linkedin) and newsletters (The Open Data Update, TransformTO) to the public between July 16 and October 2nd 2019. Each request submitted through the questionnaire will be measured against a recently developed priority framework which uses an algorithm to determine which datasets are most important to release. This allows us to be more strategic when engaging with Divisions/Agencies, allocating time and resources towards the release of datasets that provide the greatest value. The prioritization framework will eventually allow us to begin assigning a ranking score on all requests we receive. We plan to make these scores publicly available, which will provide greater transparency around the requests we receive and their place in the queue. The algorithm used to calculate priority is made up of 4 groups, and each element within each group has a unique weighting factor:
  1. Output Source: is the data in a database somewhere or is it in a spreadsheet on someone’s desktop? This impacts the level of effort needed to prepare the dataset for publication, therefore flat files will render a lower score.
  2. Civic Issue: does the dataset align to a civic issue? Since each civic issue renders an equal score of 1, priority in this category is given to datasets that align to the greatest variety of civic issues
  3. Requester: who requested the dataset? Is it requested by council or a member of the public? Different requesters are given different scoring.
  4. Output: what will it be used for? Education? Media? Government city report? Is it for research? Will it be used to create a by-product, like an app?

Priority framework algorithm used to calculate data request scores

Group Element Weight
Civic issue Affordable housing 1
Civic issue Climate change 1
Civic issue Fiscal responsibility 1
Civic issue Mobility 1
Civic issue Poverty reduction 1
Output Application development 0.69
Output City 0.75
Output Education 0.61
Output Media 0.59
Output Personal 0.37
Output Research 0.59
Requester Council 1
Requester Decision support 0.85
Requester Public 0.75
Requester Other 0.25
Source Yes 1
Source No 0.25
Ultimately, the higher the score, the greater priority the dataset holds. This gives the Open Data team a more strategic approach when identifying datasets for release through the Portal.

Limitations:

There are a few limitations we encountered that required us to modify the application of the priority framework for the purposes of this campaign:
  • Since the questionnaire was distributed publicly through newsletters and social media, we classified the requestor type as public for each response received.
  • The questionnaire was designed to collect dataset requests per each civic issue, rather than having the ability to indicate alignment to multiple civic issues for a single request. To keep things simple, each request was given an equal score of 1 under the category of civic issue alignment.
  • Respondents of the questionnaire were not asked to identify the type of source system a dataset possesses, as this detail would be sourced through engagements with Divisional/Agency data stewards later in the campaign.
For these reasons, we will solely focus on Output when applying the priority framework for all requests received.

Data Preparation: Extracting, Cleansing and Tagging

The first step in preparing the data for analysis started with an extract of the raw data from Checkmarket; the survey tool used to design the questionnaire. To work with the data, I opted for google sheets; a free, accessible and fairly intuitive tool to use that is ideal for basic data preparation and cleansing. As you can see, the raw data extract from Cherkmarket wasn’t pretty:
Screenshot of raw, complex data extracted from Checkmarket, an online survey tool
In order to clean the data, I separated all requests by civic issue. I then reviewed each response and split requests that referenced multiple datasets in a single request.
Screenshot of raw survey data requests for Housing
Once I extracted individual requests from all the responses received through the questionnaire, I began ascribing each request a thematic ‘tag’, for instance; spending, taxes, construction, water, pollution.
Screenshot of raw survey data with tags applied
This exercise allowed me to begin clustering similar requests under a shared theme. The challenge for me was avoiding the use of similar tags (e.g. Energy use, Energy Spending, Energy Consumption) as well as clustering requests objectively with limited context readily available.

Analysing the data for key insights

Before diving into the analysis, I identified a number of research questions I was interested in answering:
  1. Which civic issue received the most requests?
  2. What are the top data requests that were identified for each civic issue?
  3. What else does the data reveal?
When reviewing the total number of requests received per civic issue, the breakdown was the following:
  • 256 – Climate Change,
  • 225 – Affordable Housing,
  • 154 – Poverty Reduction,
  • 146 – Mobility,
  • 94 – Fiscal Responsibility
I then looked at the top tags collected per civic issue. Again, each tag represents the thematic grouping of similar dataset requests. The tables below display what were the most popular tags associated with each civic issue, as well as how many individual requests the tag represents.

Individual tags by count

Affordable Housing

Tags Count
N/A 25
Housing availability 24
Rent cost 18
Housing sales 12
Neighbourhood profile 9
Short-term rentals 8
Affordable housing waitlist 8
Official plan 6
Vacancy tax 6
Ward profiles 5
Home ownership 5
Rent registry 5

Poverty reduction

Tags Count
Neighbourhood profiles 30
N/A 16
Food by ward 10
Childcare availability 8
Ward profiles 8
Pedestrian network 6
Childcare costs 4
Employment 4
Housing costs 3
Poverty reduction strategy 3
Transit 3

Mobility

Tags Count
N/A 28
Accessibility 10
Bike network 10
Traffic 8
Collision/fatalities 6
Road restrictions 5
Construction projects 4
Presto 4
TTC routes and schedules 4

Fiscal Responsibility

Tags Count
N/A 25
Budget 13
Taxes 9
Spending 8
Revenue 4
Section 37 2
City debt 2
Voting record 1
Transit project budget 1
Transform TO 1

Climate change

Tags Count
Pollution 64
N/A 37
Flooding 13
Energy use 13
Weather 11
Tree 9
Waste 8
Green 8
Natural heritage system 8
Utility spending 5
Land use 4
Water quality 4
Energy 4
In total, there were: 54 unique tags for climate change, 52 for affordable housing, 48 for poverty reduction, 46 for mobility, and 31 for fiscal responsibility.
Screenshot of count of all tags per civic issue request
What was surprising was the number of N/A tags applied to the requests received for each civic issue. Requests with an N/A tag were in fact requests for information, policies, procedures; too general or broadly phrased (e.g. ‘water data,’ ‘budget data’); or not a dataset request at all (e.g. a rant or complaint). This leads me to conclude that open information is highly desirable and needs to be made more accessible to the general public. What is also interesting is that many of the requests received were related to datasets already made openly available through the Portal. An example of this when reviewing the numerous requests for demographic and income data, such as income by neighbourhood, which exists in our Neighbourhood Profile dataset. These findings indicate a need for improved search and discoverability of datasets on the portal, particularly those that align to civic issues.
Screenshot of neighbourhood profile tags visualized using a bar chart.

Conclusions & Next Steps:

This is a simple, preliminary analysis of the requests received through the civic issue campaign, which has resulted in some interesting, unexpected findings, such as requests for data already available through our Open Data portal. In terms of next steps, we will analyze all requests received in order to determine their level of priority, particularly in relation to the recent recent motion passed by the General Government and Licensing Committee.

How you can help:

Want to take a stab at analyzing which requests received through the campaign will have the greatest priority score using our prioritization framework? Check out the dataset on our portal, and be sure to share your findings with us by emailing opendata@toronto.ca.