UrbanToronto: Seeing the Future of the City with Data

A great example, partnership and arguably a great use-case of how open data is used in big Cities. One of our key collaborators in this endeavor is UrbanToronto, a comprehensive resource for tracking new developments across the Greater Toronto Area.

This blog post delves into our successful partnership with UrbanToronto, highlighting how our open data fuels their innovative services and how, together, we are helping citizens and professionals alike see the future of our city through data. Written by UrbanToronto.


If you ever search for a new condo development, you’ll likely come across UrbanToronto. We track every large new project in the GTA (and beyond) in four different ways: a quantitative database page for the project; a discussion thread in our highly active discussion forum; a pin on our detailed map; a news story about the project written by one of our journalists.

All four of these information services rely heavily on open data from the City of Toronto. We transform this data into a standardized format to allow searching, filtering, and plotting, which fuels the rest of the services we offer. 

While our business is predominantly based on data today, it wasn’t always the case. Here is the story of how open data transformed UrbanToronto’s business. 

History of UrbanToronto

You know how there are train geeks, movie geeks, and history geeks—people who love to learn and talk about these topics in great detail?

Skyscraper geeks exist, too: people who love to talk about the newest high rise developments in the city. But unlike most discussions on TV or the newspapers about condos, skyscraper geeks don’t care so much about prices. Instead, they would focus on the design of the building, the construction process, and the urban planning involved. 

UrbanToronto began more than 20 years ago as a discussion forum for skyscraper geeks in Toronto. Much like the city we cover, our business has grown a lot in that time, too. 

Soon after the forum started taking off, we added a news component to the website. We cover breaking news in the development industry, as well as feature articles highlighting new technologies, new policies, and innovative builders, suppliers, and designers in the field. 

Our Data Origin Story 

As the popularity of our community grew, so did the needs of our forum members and journalists. We have thousands of threads for individual projects, and some have hundreds of pages of comments with thousands of posts. The posts include construction photos, but also images of the architectural plans, and other data about the project. If you wanted to know how tall a building was, but there were 300 posts into the discussion, you would struggle to find the information. 

Enter the UrbanToronto database, version 1.0. Especially popular projects got a dedicated page that listed their crucial information: some renderings, the heights, unit mix, and the developer and architect. All of this data came from City of Toronto planning documents. What’s more is that we also started plotting these projects on a map, colour-coded by construction status. That way, you could see where the big projects in the city were going up. 

Soon enough, this database grew to over a thousand projects. At this point, the database itself, and especially the map that was built on top of it, had grown a dedicated user base of their own. We realized it made sense to invest more into our map and database service, which involved hiring a bigger team, investing in new technologies, and expanding the scope and depth of our data. 

Today, UrbanToronto data is available for free, as well as a premium subscription package called UTPro. We track over 5000 projects across the Greater Golden Horseshoe, although due to the City of Toronto’s excellent open data products, our coverage in Toronto is the deepest and most accurate. 

What Data Does UrbanToronto Track? 

Unlike other real estate data providers, we rely almost exclusively on publicly available documents. While we track a wide variety of sources, including building permits and the heritage registry, our main source of data are development applications: rezoning, site plan approvals, Official Plan amendments, and so on. 

There is a vast wealth of information in these applications, and many people and organizations link to and track them. However, they can comprise 60 different documents, most of which are in PDF format. This lowers the digital legibility of the documents, which is where UrbanToronto comes in. Through a combination of manual and automated processes, we read every one one of those PDF and input the data into a standardized format which makes filtering, sorting, and plotting the data much easier. 

Our data is used by three types of users: (1) those looking to buy or develop new properties, including developers, urban planners, land assemblers, and retail condo investors; (2) realtors, tradespeople, and suppliers looking for leads on new developments and projects under construction; and (3) thousands of enthusiasts who, for their own interests, consult our map and database to stay up to date with what’s going on in their neighbourhood.

The Future for UrbanToronto and Open Data

While we are constantly expanding our database with new projects (as well as updating existing projects), our focus has historically been on “large” developments—typically townhouses and above. As the City is changing policy to permit more infill development, our database will be expanding to include these smaller projects as well. We are also in the process of expanding the features of our map to include more functionality in terms of filtering and exporting the data, as well as new layers to supplement the investment decision processes. 

We’re looking forward to continuing to build our relationship with the City’s Open Data team, as we deepen our coverage of development in Toronto.

Powering open data with civic issues

Hey Reham! Let’s start by telling our readers a little bit about you and your role with Toronto Open Data. I’m Reham Youssef. I’m the Marketing and Communications Lead for the open data team, and I’ve been with the City for over fifteen years but in the last ten years, I’ve really been focusing in on open data. Tell us about the project you’re currently working on. Sure. At open data, we always want to know: what do users want? In the past we would accept dataset requests through email and twitter on an ad-hoc basis. We didn’t really have a concise way of gathering data from the public on what they want when it comes to open data and we’ve been trying to establish different criteria that will help us evaluate and prioritize dataset releases. The Mayor recently announced three new city directives. We took that list and identified two more: climate change, which is an important global issue right now, and poverty reduction, which we heard as a consistent theme from the civic tech community. This gave us a list of five total civic issues we wanted to prioritize dataset releases around: affordable housing, poverty reduction, fiscal responsibility, climate change, and mobility. We’re now considering the impact of these civic issues in determining what we release next. Why is it important to know how to prioritize what datasets get released? It’s just a better way to understand data requests, and we wanted to make sure that incoming data requests align as much as possible with these priorities. We can’t release everything at once. Unfortunately we have limited resources and (wo)manpower, and we also have to work with 44 different City divisions. As you can imagine, there are numerous units within the individual divisions that are responsible for data. When we prioritize data, we link it to the city priorities which you can then narrow down to 5 to 10 divisions, which makes it much easier for us to discuss with those divisions instead of all 44. Often times it’s tricky to ask for data when we don’t even know how or where it exists, let alone if it exists in the first place. So we decided to focus in on the most important things for the city. We want to establish a consensus on what users want, and specifically go after that data. Great! So you started by developing a survey that would be filled out by users to inform what civic issues and datasets they wanted. How did that go? Well, when we first went out the door with the initial civic issues survey, we unfortunately didn’t get what we were looking for the first time. Why is that? There were just too many questions, and we were asking for too much detail at the time. People didn’t understand how to respond. We really wanted people to tell us as much as possible, right down to the data attributes, and exactly what they wanted. We started to see that while people know they want to solve problems like affordable housing, they don’t always necessarily know what data they need to address that problem. So what did you do when you found that users weren’t filling out the civic issues survey as expected? After a month, we took the survey down and switched things around. That’s how we got to the second questionnaire, where we simplified it down to 3 simple questions. The wording had to change for people to think differently, and asking people what they wanted and what they wanted to do with the data was the main driver. The reason we asked for what they will do with the data is because of a fun little algorithm we created that I’ll go into detail later. What was the result? Unexpectedly, we were overwhelmed with responses! 875 in total. We received enough information from our users the second time around that we could plug right into the prioritization framework. This gave each dataset request a score as well as informed us as to how many people were asking for the same data in different words. Can you tell our readers a little bit about the prioritization framework? The prioritization framework was developed by the open data team as a way to assess and prioritize upcoming dataset releases. The algorithm was developed by Ryan Garnett, who manages the open data team as well as the geospatial competency center (Editor’s note: Ryan writes frequently for the Open Data Knowledge Center; you can read his articles here) in the form of a dynamic spreadsheet. The framework focuses on five outputs. Each output is given a specific score. The algorithm applies a weighting to all these scores and then outputs an overall ranking. The primary assessment metrics are:

Source

Is the data in a database somewhere or is it in a spreadsheet on someone’s desktop?

Civic Issue

A dataset request that isn’t related to one of the five main civic issues would receive a lower score than one that does.

Requester

Who requested the dataset? Is it requested by council or a member of the public? Different requesters are given different scoring.

Output

What will it be used for? Education? Media? Government city report? Is it for research? Will it be used to create a by-product, like an app?
So the framework lets us figure out where the priority lies for every request. The higher the number, the more we will focus in on getting that data from the appropriate division or unit. Do you think the scores will be made public anytime soon? They’re already public but they aren’t presently linked to the prioritization framework! We’ve got a few more things to do first. We’re reviewing all of our requests, separating one single request perhaps into two or more data requests, manually tagging them with topics or themes and separating them out by civic issue. We’re actively working on releasing that data to the public in a raw format. It’ll contain every single request with the requester listed. What is the benefit of releasing that data to the public? It’ll be our first actual open dataset from open data. Data about the data! Yes, exactly. This will give people the opportunity to take a look and play around with the data. We’re going to analyze the data and figure out how many requests there are based on tags. We’ll plug each one into the framework, then come up with a score that we then report back to the senior management team (SMT) or appropriate division. That we hope to release soon, publically, and ultimately this is how we want to report back. That seems fairly well-aligned with the mandate of open data, which includes transparency as well as data-informed decision making. That’s right. So what are your next steps as this seems like a fairly mammoth undertaking. We’re going to clean the data for the release. My personal next steps are to perform a little bit of an analysis so we can have a cool data story to report back with. I’m not a data scientist, so I’ve never worked with so many sources before, but I’m excited to venture off and try this. I want to have some sort of summary story that tells us the total number of requests, unique tags, and breakdown of civic issues. Ultimately we want users to understand how we took nearly a thousand requests and categorized them with tags and top civic issues. Thank you, Reham, for explaining the open data civic issues campaign and prioritization framework. We welcome reader comments and questions to opendata@toronto.ca.

Civic Issues Initiative

In this four-part series, we introduce our readers to the Civic Issues campaign. This campaign highlights some of the most important socio-political issues impacting Toronto residents, including hot-button items like housing affordability and poverty reduction. The Civic Issues Initiative survey
What do civic issues have to do with open data? It turns out, quite a bit. It’s important that open data releases reflect the concerns and interests of the city’s residents. Releasing in-demand open data is one way to increase community participation in civic tech, increase data literacy, and activate data-driven decision-making. Solving complex civic problems means ensuring there is a seat at the table for underrepresented voices. Join us as we take you through how we are improving the way open data is created and shared in the city to bring you more of the data you want, when you want it.

How do we currently acquire data requests?

Since the start of Toronto’s Open Data program in 2009, we’ve used many ways to determine what to publish, and when. Some of these ways include keeping tabs on formal and community requests through our e-mail inbox, public consultations, and requests from the media. Once we’re alerted to a dataset request, we connect with the appropriate division to find out if they have the data that’s being asked for, and assess how much effort is involved in acquiring it prior to publishing it. The Open Data team has a highly engaged following on Twitter, which has served as one of the primary ways in which the community can tell us about what they’re interested in. We also track current events and the media to establish the demand for a specific set of related information. We recently launched our monthly newsletter as well, The Open Data Update, which encourages our readers to contact us with requests for data.

Following a request, how is the data acquired?

City divisions like Transportation Services or Parks, Forestry and Recreation can periodically provide us with a ready to go real-time data feed, but not always! A lot of this data is subject to the technical limitations of the time at which it was collected, so much of it is buried. Even when we have access, the formats might be out of date, and there might be issues with consistency. We see these issues in many of the datasets currently hosted on open data. This means that the data would need to be cleaned up, undergo an extensive privacy review, and/or need to be digitized prior to release. It’s a lot of work. There are an estimate 9 petabytes of data in the City, and not all of it can be made open due to privacy, licensing, or technical restrictions either. So it’s essential that we prioritize our releases based on the value their provide to service provisioning.

How does all this relate to civic issues?

As a civic campaign, we’re obligated to demonstrate the socioeconomic value of open data in our reporting. Like other cities, we struggle with demonstrating the true social value of the data that we provide. Many times, we aren’t able to truly demonstrate just how impactful open data can be to a typical resident of the city. It can seem too technical or too bureaucratically inaccessible. How do we change this perspective, and democratize access to open data? Let’s consider some non-technical challenges. Often, the decision-makers in the room don’t usually represent the groups we need to provide services to the most. These are groups with limited data literacy, limited mobility, or economic insecurity. Lived experience is often the best way to understand the unique experience of someone who may be under-housed, or struggle with transit affordability. We need to make sure that we don’t overlook the importance of these communities, and so we want to prioritize the release of data that can positively influence change and provide opportunities for improvement. Evidence shows that decision-making models that involve affected communities and prioritize their needs are typically the most sustainable and scalable.

How do we understand social value?

Let’s pause for a moment and think about a common experience many residents have. Prior to the existence of smartphones, it was difficult to predict transit delays, and commuters had few options outside of waiting. Through access to historical transit data, a frustrated commuter was able to develop an app-based solution that can predict the arrival time of your bus with a high level of accuracy. This example, and countless others from our community, demonstrate the value of open data in case studies that are identifiable to a diverse range of city residents. As such, we have a responsibility to dismantle the barriers that contribute to the under-representation of marginalized communities in civic technology. So how do we truly engage a diverse audience? How do we ensure everyone gets a seat at the table? How do we ensure that we balance feelings with facts to create policies that benefit the residents of Toronto? Simple. We listen. In the interest of working within a data-driven government model, sometimes this will mean delving into the uncomfortable, and being honest and transparent with data that shows us where there’s room for improvement. Open data is also about self-sufficiency. We want to reduce barriers to access. We want anyone who wants our data to be able to use it openly and transparently, whether they’re going to start a business or create a community campaign in support of a social issue they care about. The Civic Issues Initiative survey