How to run the application
Accessing the web application online
Just click on the provided link to access the web application hosted on shinyapps.io from your browser. It is recommended to use a recent version of Google Chrome.
Hosting the web application locally
Clone the project repository. Open app.R with RStudio, download RStudio
if you don't have them installed on your machine. Download all the R libraries used in the project executing the command install.packages("library") in the RStudio console. Run the application via RStudio by clicking Run App at the top right on the main RStudio panel. Access it using a browser and with the local machine address and port 6676. (http://127.0.0.1:6676/
Getting started and main tab
The application starts in full screen. You can open the menu icon at the top next to the title, to open the sidebar where you can select the State, the County and the year, the main panel on the right will reload the data and recreate the graphs interactively.
The main tab is divided into two main boxes, the one of the left shows AQI (Air Quality Index) levels. It consists in: a pie chart showing the percentages (sometimes estimated if there are missing data) of days with a certain level of AQI in a specific year for the selected County. Under this pie chart there is a bar chart and a table, both showing the number of days in the year with that level.
The right box, instead, shows detected pollutants data. The first tab in this box shows a pie chart for each pollutant with the percentage of days in which that pollutant was the main cause of problems. The second tab shows a bar chart with the number of days in which they were the main pollutant in that year. Again, the table at the bottom shows the same thing as the bar chart but in a different way.
From the sidebar you can also access other tabs with different functionalities and plots.
Second tab: County Trends
The second panel shows AQI and pollutants time series.
The left box contains various inputs that allow interactivity with the plots. You can choose the color of the plot background, choose the county from an alphabetically ordered list of (County - State) pairs, select a range of years in which you want to concentrate.
By clicking on the settings button you can also change the grid and text colors in the plots to black, if the background color is too bright. In addition, you can change the colors for the pollutants for the second plot.
The first plot in the first tab shows AQI statistics over time. The second tabs contains a time series plot of the percentages of days as main pollutant and a table with days counts.
The third tab is a map showing the location of the selected county on the map, and showing all the counties in the US that highlights in white when you hover over them.
Third tab: Compare Counties
The third panel is about comparing different counties. You can choose 3 counties among all the counties by searching on the input lists. You will see on the small map the location of these counties. By clicking on the setting button you can change the AQI statistic or pollutant that you want to focus your analysis on.
There are 3 subtabs in this panel. The first is an AQI statistic time series comparison. The second a percentage of days as main pollutant time series for a specific pollutant.
The third one focuses on a specific year, that can be selected with the slider in the sidebar, and compares all the pollutants percentage for the 3 counties at the same time with a bar chart.
Data, libraries and implementation
United States Environmental Protection Agency
United States Counties shape in GeoJSON
The missing dataset presents some missing data for some specific years and counties, this was handled by warning the user with an alert message whenever he selects this type of data.
Used R libraries
This is the list of R libraries used for this project:
Problems during the development
Missing data handling: A big problem was the reactivity queue execution, when I changed a state, all the reactive expressions depending on the state input were queued, the highest priority observeEvent was fired first, this, in turn, changed the counties list and the current county, but all the others reactive expressions were still run with the county input before the observeEvent changed it, this created some problems in the signaling of missing data for the selected county and year. I solved it by isolating the State input and the current Reactive Value in the reactive expression that had the task of signaling the missing data error.
White margins in ggplot pie charts: I spent so much time trying to figure out how to eliminate or color the white margins present in the ggplot2 pie chart plots. When you change the background colors to a color different than white, you those margins are visible. My conclusions are that it's a bug in the library or I'm too stupid to figure out how a way to eliminate it.
Counties with highest population comparison
An interesting comparison is between counties with a very high population. The county with the highest population in the united states is the County of Los Angeles in California. The city of Los Angeles is also known for the constant traffic and lack of public transportation, I expected it to have a very bad air. The second is Cook county (Chicago area), the third Harris (Houston area). This ranking is not surprising and follows the same order as the cities by population except New York City is not there, and this is because the counties of New York City are very small.
What I expected was a very bad air quality, which releaved to be true. What I did not expect was seeing a decreasing trend. Apparently, all the big cities have been improving the air quality in the past decades. Los Angeles is clearly worse with respect to the other two counties. The counties are also very far from each other, confirming that the cause of the bad air quality is because of the big cities and not the surrounding area.
Main pollutant over time
We can also compare the number of days as main pollutant for these 3 big counties. As you can see from the image below, there are some differences but overall they are similar. We can notice that in 1980 the main pollutants were: Ozone, NO2, SO2, CO. If we look at the same counties in 2018, instead, we see similar small differences. What we notice if we compare the two different years 2018 and 1980 is that the main pollutants have changed a lot. CO and SO2 are no more the problem. Now the expanding pollutants are: PM2.5 and Ozone.
Illinois and counties around Chicago
To better demonstrate that the bad air quality is caused by the human impact we can compare Cook county with two bordering counties: Lake and Will, two very empty and green counties in Illinois.
They both are much better than Cook. Lake is slightly better than Will, probably because of the proximity of the lake.
New York City pollutants trend
Looking at the New York City counties (Manhattan which is called New York county, Bronx, queens..), we can notice a difference with respect to the other big cities' counties. The PM2.5 which is the new main pollutant in the air together with Ozone, in New York City has been decreasing over the past 10 years. This does not happen in Los Angeles, Coook and Harris.
Another thing to notice in the New York City's counties is that in the early days, Manhattan's air quality was way worse than the other two main counties. Around 1995 it aligned with the rest of the counties, suggesting that that actions were took towards improving the air quality, maybe moving some industries outside or limiting the vehicle traffic within Manhattan.