Blog article: How the City is Winning the War Against Lead Contamination in Drinking Water
Article text
Our next student guest blog was written by Abbass Sleiman, a third-year undergraduate student at the University of Toronto in the Mathematical Applications in Economics and Finance Specialist. Abbass has experience as a Teaching Assistant in mathematics and wrote this paper for STA302H1 (Methods of Data Analysis) at the University of Toronto with Professor Rohan Alexander.
Background
Lead in drinking water isn’t something most of us think about when we turn on the tap – it certainly hadn’t crossed my mind until I read a CBC News article from 2014. The article, aptly titled “High lead levels found in some Toronto drinking water”, claimed that after analyzing 15,000 water samples provided to the city by homeowners between 2008-2014 through the Residential Lead Testing Program, 13% of Torontonian households exceeded Health Canada’s standards for lead exposure – an alarmingly high portion when you consider the potential health risks associated with lead exposure.
Lead exposure can have serious health consequences, especially for children. It can cause damage to the brain and nervous system, slowed growth and development, learning and behavior problems, and decreased IQ. For adults, long-term exposure can lead to cardiovascular issues, kidney damage, and reproductive problems. This, coupled with the fact that lead can’t be seen, smelled, or tasted, means that testing our water becomes all the more crucial. So, how did the city of Toronto address this issue? Back in 2011, Toronto’s City Council took action against this silent threat by approving a strategy to reduce lead in our drinking water. Fast forward to 2014, and the city began adding phosphate to the water treatment process which forms a protective coating in the pipes, helping to keep lead from leaching into the water we drink every day.
But did it work? To find out, I delved into a dataset of 12,810 water samples from Toronto homes, collected between 2014 and 2024. My goal was to determine if the phosphate treatment made a difference in reducing dangerous lead levels in our water, how efficient it may have been at doing so, and whether certain areas in Toronto are more at risk than others by analyzing the data using the programming language R.
Understanding Lead Levels
To understand the impact of lead concentrations on our drinking water, we need to first understand how these levels are measured. Lead concentrations are generally measured in parts per billion (ppb), effectively how many tiny amounts of lead are present in every billion drops of water, or equivalently in micrograms per liter.
Back in 2014, Health Canada set the safety threshold of lead exposure in water at 10 ppb, meaning that the CBC news article’s claim that 13% of Toronto households were facing high lead levels essentially meant that 13% of households provided water samples with a lead concentration of at least 10 ppb. However, in 2019, Health Canada tightened the standard, lowering it to just 5 ppb. So, when we talk about ‘high lead concentrations’, we’ll distinguish between whether we’re referencing the older threshold or the updated one.
The Data
The data used in this analysis was derived from Open Data Toronto, under “Non Regulated Lead Sample”. Published by Toronto Water, this data features data from Toronto’s Residential Lead Testing Program – the same source data used in the CBC news article. It includes details on various houses’ lead concentrations based on water samples provided by the households themselves. The data is refreshed daily and the particular dataset used was up-to-date as of January 22, 2024.
The raw data set features the lead concentration in parts per million (ppm) of 12,810 water samples, where 1 ppm is equivalent to 1000 ppb, the date that each sample was collected, and the household’s partial postal code (only the first three digits of the resident’s postal code for privacy reasons).
Cleaning the Data
Given that I was interested in the after-effects of the phosphate addition to the drinking water treatment process in 2014, all entries from 2014 were eliminated to ensure the analysis focused on water samples taken after the policy was put into effect. Additionally, entries with missing values were removed, and the column for lead concentrations was converted to ppb for consistency.
Extreme outliers were also excluded to avoid skewing the analysis. In the context of this study, I defined a lead concentration to be an outlier if it was any value exceeding (and including) 100 ppb, 20 times Health Canada’s standard of 5 ppb, making it reasonable to assume that the value is so extreme as a result of an error in the particular household’s sample collection process. This left us with a cleaned dataset of 9,302 samples ready for analysis. A sample of this cleaned data can be seen in Table 1 below, and all observations are visualized in the scatter plot in Figure 1.
Summary Statistics of the Data
Before diving further into the analysis, it was important to examine the structure of the dataset, particularly the number of observations per year, to ensure that any conclusions drawn later down the line would be based on a sufficient amount of data. Table 2 shows us that we have access to much fewer data points in the year 2020, likely as a result of the COVID-19 pandemic, and only a single observation available for 2024, meaning that any information pertaining to that particular year is likely meaningless and should be taken with a grain of salt.
Then, to get a better sense of the data as a whole, both the mean and standard deviation of lead concentrations were calculated and are presented in Table 3. The mean lead concentration of all samples was approximately 1.04 ppb, well below both the old and new safety limits. However, the standard deviation of the lead concentrations, essentially a measure of how spread out the lead concentrations are from the mean, was a rather large 4.05 ppb, indicating considerable variability.
Examining the Portion of Households Exceeding the Lead Concentration Limit
I was mainly interested in whether the portion of households exceeding the lead concentration limit of 10 ppb had changed from the past portion of 13%. Additionally, it was important to examine the portion of households exceeding the newer limit of 5 ppb. To do so, I decided to look at the distribution of households across four lead concentration categories (<5 ppb, 5-10 ppb, 10-20 ppb, >20 ppb), all of which are shown in Table 4.
Using this information, we can see that the vast majority of water samples (98.73%) contained a lead concentration below the previous limit of 10 ppb, a striking improvement from the original 87%. Even more so, approximately 97.14% of water samples are below the new limit of 5 ppb. Overall, this provides some fairly compelling evidence in favor of the benefits of adding phosphate to the water treatment plan.
Investigating the Relationship Between Time and Lead Concentration
But how quickly did the lead concentrations improve? Were they instantaneous or were they gradual? These are key questions to consider when evaluating the effectiveness of the phosphate treatment. To answer this, I created a scatter plot (Figure 2) to see how the mean lead concentration changed across each year.
We can see a clear and consistent decline over time, with 2015 featuring the highest mean lead concentration of 1.53 ppb, and 2024 with the lowest at a mere 0.15 ppb (though the 2024 data point is based on only one observation and should be viewed cautiously). To gain a deeper insight, I examined how the proportion of households with water samples exceeding lead concentrations of 10 ppb and 5 ppb changed over time in Figures 3 and 4, respectively (note that 2024 was omitted given that there is only one observation).
Though these figures show a slight rise in mean lead concentrations from 2019-2022, we can still see an overall fairly consistent decline over time in the portion of households exceeding both the 10 ppb and 5 ppb limits. Moreover, the portion of households across all years with high lead concentrations still remains well below the 13% found in 2014, indicating that the addition of phosphate was indeed effective in improving water quality.
Exploring the Relationship Between Location and Lead Concentration
Lastly, I wanted to uncover whether certain locations were more prone to having contaminated water than others. Given that the dataset had dozens of unique partial postal codes, I grouped them by their first 2 characters (for e.g. M2L and M2K would both fall under the “M2-” group) to create a more readable figure. Doing so allowed me to calculate the mean lead concentration for households in each group, showcased in Figure 5.
While we do see some variation in lead concentrations between locations, with the highest mean concentration being 1.27 ppb in the “M6-” group and the lowest being 0.29 ppb in the “M8-” group, the range of concentrations is within 1 ppb. This means that we can safely conclude that location doesn’t significantly impact the likelihood of a household having high lead concentrations.
Summary
Consuming water contaminated with lead can cause serious health issues, including brain damage and slowed growth, especially in children. To tackle this problem, Toronto started adding phosphate to its drinking water treatment process in 2014 to reduce lead contamination. After analyzing water samples from post-2014, it’s clear that mean lead concentrations have decreased, and fewer households are exceeding the safe lead exposure limit. There’s also little evidence to suggest that where the water samples were taken from has a significant impact on lead levels.
Next Steps
Given that the data used was based on samples collected by the individuals residing in each household, future analysis could be better improved by incorporating sources of data from more controlled sources, collected by qualified individuals.
Another improvement could be the use of time-series data showcasing changes in lead concentrations over time from the same source. This could provide deeper, and possibly more accurate, insights into the effects of various water treatments of water quality and safety.