Blog article: Powering open data with civic issues

Powering open data with civic issues

Article text

Hey Reham! Let’s start by telling our readers a little bit about you and your role with Toronto Open Data. I’m Reham Youssef. I’m the Marketing and Communications Lead for the open data team, and I’ve been with the City for over fifteen years but in the last ten years, I’ve really been focusing in on open data. Tell us about the project you’re currently working on. Sure. At open data, we always want to know: what do users want? In the past we would accept dataset requests through email and twitter on an ad-hoc basis. We didn’t really have a concise way of gathering data from the public on what they want when it comes to open data and we’ve been trying to establish different criteria that will help us evaluate and prioritize dataset releases. The Mayor recently announced three new city directives. We took that list and identified two more: climate change, which is an important global issue right now, and poverty reduction, which we heard as a consistent theme from the civic tech community. This gave us a list of five total civic issues we wanted to prioritize dataset releases around: affordable housing, poverty reduction, fiscal responsibility, climate change, and mobility. We’re now considering the impact of these civic issues in determining what we release next. Why is it important to know how to prioritize what datasets get released? It’s just a better way to understand data requests, and we wanted to make sure that incoming data requests align as much as possible with these priorities. We can’t release everything at once. Unfortunately we have limited resources and (wo)manpower, and we also have to work with 44 different City divisions. As you can imagine, there are numerous units within the individual divisions that are responsible for data. When we prioritize data, we link it to the city priorities which you can then narrow down to 5 to 10 divisions, which makes it much easier for us to discuss with those divisions instead of all 44. Often times it’s tricky to ask for data when we don’t even know how or where it exists, let alone if it exists in the first place. So we decided to focus in on the most important things for the city. We want to establish a consensus on what users want, and specifically go after that data. Great! So you started by developing a survey that would be filled out by users to inform what civic issues and datasets they wanted. How did that go? Well, when we first went out the door with the initial civic issues survey, we unfortunately didn’t get what we were looking for the first time. Why is that? There were just too many questions, and we were asking for too much detail at the time. People didn’t understand how to respond. We really wanted people to tell us as much as possible, right down to the data attributes, and exactly what they wanted. We started to see that while people know they want to solve problems like affordable housing, they don’t always necessarily know what data they need to address that problem. So what did you do when you found that users weren’t filling out the civic issues survey as expected? After a month, we took the survey down and switched things around. That’s how we got to the second questionnaire, where we simplified it down to 3 simple questions. The wording had to change for people to think differently, and asking people what they wanted and what they wanted to do with the data was the main driver. The reason we asked for what they will do with the data is because of a fun little algorithm we created that I’ll go into detail later. What was the result? Unexpectedly, we were overwhelmed with responses! 875 in total. We received enough information from our users the second time around that we could plug right into the prioritization framework. This gave each dataset request a score as well as informed us as to how many people were asking for the same data in different words. Can you tell our readers a little bit about the prioritization framework? The prioritization framework was developed by the open data team as a way to assess and prioritize upcoming dataset releases. The algorithm was developed by Ryan Garnett, who manages the open data team as well as the geospatial competency center (Editor’s note: Ryan writes frequently for the Open Data Knowledge Center; you can read his articles here) in the form of a dynamic spreadsheet. The framework focuses on five outputs. Each output is given a specific score. The algorithm applies a weighting to all these scores and then outputs an overall ranking. The primary assessment metrics are:


Is the data in a database somewhere or is it in a spreadsheet on someone’s desktop?

Civic Issue

A dataset request that isn’t related to one of the five main civic issues would receive a lower score than one that does.


Who requested the dataset? Is it requested by council or a member of the public? Different requesters are given different scoring.


What will it be used for? Education? Media? Government city report? Is it for research? Will it be used to create a by-product, like an app?
So the framework lets us figure out where the priority lies for every request. The higher the number, the more we will focus in on getting that data from the appropriate division or unit. Do you think the scores will be made public anytime soon? They’re already public but they aren’t presently linked to the prioritization framework! We’ve got a few more things to do first. We’re reviewing all of our requests, separating one single request perhaps into two or more data requests, manually tagging them with topics or themes and separating them out by civic issue. We’re actively working on releasing that data to the public in a raw format. It’ll contain every single request with the requester listed. What is the benefit of releasing that data to the public? It’ll be our first actual open dataset from open data. Data about the data! Yes, exactly. This will give people the opportunity to take a look and play around with the data. We’re going to analyze the data and figure out how many requests there are based on tags. We’ll plug each one into the framework, then come up with a score that we then report back to the senior management team (SMT) or appropriate division. That we hope to release soon, publically, and ultimately this is how we want to report back. That seems fairly well-aligned with the mandate of open data, which includes transparency as well as data-informed decision making. That’s right. So what are your next steps as this seems like a fairly mammoth undertaking. We’re going to clean the data for the release. My personal next steps are to perform a little bit of an analysis so we can have a cool data story to report back with. I’m not a data scientist, so I’ve never worked with so many sources before, but I’m excited to venture off and try this. I want to have some sort of summary story that tells us the total number of requests, unique tags, and breakdown of civic issues. Ultimately we want users to understand how we took nearly a thousand requests and categorized them with tags and top civic issues. Thank you, Reham, for explaining the open data civic issues campaign and prioritization framework. We welcome reader comments and questions to Subscribe to our monthly newsletter to stay on top of articles and content like this.