Menu

Project page

Just Breathe, Visualization and Visual Analytics

About

Just Breathe, Visualization and Visual Analytics

This is the first project for the CS424 Visualization and Visual Analytics class at UIC. It consists in various visualizations and interactive plots in a web application created using the Shiny library for R. The visualizations are about the Air Quality dataset by County in the United States, concentrating only on the yearly statistics.

Go to Github Repository

Access the Shiny application

How to run the application

Accessing the web application online

Just click on the provided link to access the web application hosted on shinyapps.io from your browser. It is recommended to use a recent version of Google Chrome.

Hosting the web application locally

Clone the project repository. Open app.R with RStudio, download RStudio and R if you don't have them installed on your machine. Download all the R libraries used in the project executing the command install.packages("library") in the RStudio console. Run the application via RStudio by clicking Run App at the top right on the main RStudio panel. Access it using a browser and with the local machine address and port 6676. (http://127.0.0.1:6676/)

Functionalities

Getting started and main tab

The application starts in full screen. You can open the menu icon at the top next to the title, to open the sidebar where you can select the State, the County and the year, the main panel on the right will reload the data and recreate the graphs interactively.
Screen
The main tab is divided into two main boxes, the one of the left shows AQI (Air Quality Index) levels. It consists in: a pie chart showing the percentages (sometimes estimated if there are missing data) of days with a certain level of AQI in a specific year for the selected County. Under this pie chart there is a bar chart and a table, both showing the number of days in the year with that level.
The right box, instead, shows detected pollutants data. The first tab in this box shows a pie chart for each pollutant with the percentage of days in which that pollutant was the main cause of problems. The second tab shows a bar chart with the number of days in which they were the main pollutant in that year. Again, the table at the bottom shows the same thing as the bar chart but in a different way.

From the sidebar you can also access other tabs with different functionalities and plots.
Screen

Second tab: County Trends

The second panel shows AQI and pollutants time series. The left box contains various inputs that allow interactivity with the plots. You can choose the color of the plot background, choose the county from an alphabetically ordered list of (County - State) pairs, select a range of years in which you want to concentrate.
Screen
By clicking on the settings button you can also change the grid and text colors in the plots to black, if the background color is too bright. In addition, you can change the colors for the pollutants for the second plot.
Screen
Screen

The first plot in the first tab shows AQI statistics over time. The second tabs contains a time series plot of the percentages of days as main pollutant and a table with days counts.
Screen
The third tab is a map showing the location of the selected county on the map, and showing all the counties in the US that highlights in white when you hover over them.
Screen

Third tab: Compare Counties

The third panel is about comparing different counties. You can choose 3 counties among all the counties by searching on the input lists. You will see on the small map the location of these counties. By clicking on the setting button you can change the AQI statistic or pollutant that you want to focus your analysis on.
Screen
There are 3 subtabs in this panel. The first is an AQI statistic time series comparison. The second a percentage of days as main pollutant time series for a specific pollutant.
Screen
The third one focuses on a specific year, that can be selected with the slider in the sidebar, and compares all the pollutants percentage for the 3 counties at the same time with a bar chart.
Screen


Data, libraries and implementation

Data

United States Environmental Protection Agency
United States Counties shape in GeoJSON

Missing data

The missing dataset presents some missing data for some specific years and counties, this was handled by warning the user with an alert message whenever he selects this type of data.
Screen

Used R libraries

This is the list of R libraries used for this project:
  • shiny
  • shinydashboard
  • ggplot2
  • scales
  • shinythemes
  • dashboardthemes
  • ggthemes
  • shinyalert
  • leaflet
  • rgdal
  • geojson
  • geojsonio
  • colourpicker
  • shinyWidgets

Problems during the development

Missing data handling: A big problem was the reactivity queue execution, when I changed a state, all the reactive expressions depending on the state input were queued, the highest priority observeEvent was fired first, this, in turn, changed the counties list and the current county, but all the others reactive expressions were still run with the county input before the observeEvent changed it, this created some problems in the signaling of missing data for the selected county and year. I solved it by isolating the State input and the current Reactive Value in the reactive expression that had the task of signaling the missing data error.

White margins in ggplot pie charts: I spent so much time trying to figure out how to eliminate or color the white margins present in the ggplot2 pie chart plots. When you change the background colors to a color different than white, you those margins are visible. My conclusions are that it's a bug in the library or I'm too stupid to figure out how a way to eliminate it. Responsivity for SAGE2: I would say half of the time was spent on figuring out the best ways to create a user interface that is perfect for both a normal display (e.g. Full HD, retina..) and the SAGE2 display with a 3.555 ratio and the huge width resolution of 11k pixels. I managed to scale most of the elements via javascript at runtime by detecting the screen size and act upon that, but there are still a few small problems.

Interesting Insights

Counties with highest population comparison

An interesting comparison is between counties with a very high population. The county with the highest population in the united states is the County of Los Angeles in California. The city of Los Angeles is also known for the constant traffic and lack of public transportation, I expected it to have a very bad air. The second is Cook county (Chicago area), the third Harris (Houston area). This ranking is not surprising and follows the same order as the cities by population except New York City is not there, and this is because the counties of New York City are very small. What I expected was a very bad air quality, which releaved to be true. What I did not expect was seeing a decreasing trend. Apparently, all the big cities have been improving the air quality in the past decades. Los Angeles is clearly worse with respect to the other two counties. The counties are also very far from each other, confirming that the cause of the bad air quality is because of the big cities and not the surrounding area.
Screen

Main pollutant over time

We can also compare the number of days as main pollutant for these 3 big counties. As you can see from the image below, there are some differences but overall they are similar. We can notice that in 1980 the main pollutants were: Ozone, NO2, SO2, CO. If we look at the same counties in 2018, instead, we see similar small differences. What we notice if we compare the two different years 2018 and 1980 is that the main pollutants have changed a lot. CO and SO2 are no more the problem. Now the expanding pollutants are: PM2.5 and Ozone.
Screen
Screen

Illinois and counties around Chicago

To better demonstrate that the bad air quality is caused by the human impact we can compare Cook county with two bordering counties: Lake and Will, two very empty and green counties in Illinois. They both are much better than Cook. Lake is slightly better than Will, probably because of the proximity of the lake.
Screen

New York City pollutants trend

Looking at the New York City counties (Manhattan which is called New York county, Bronx, queens..), we can notice a difference with respect to the other big cities' counties. The PM2.5 which is the new main pollutant in the air together with Ozone, in New York City has been decreasing over the past 10 years. This does not happen in Los Angeles, Coook and Harris.
Screen
Another thing to notice in the New York City's counties is that in the early days, Manhattan's air quality was way worse than the other two main counties. Around 1995 it aligned with the rest of the counties, suggesting that that actions were took towards improving the air quality, maybe moving some industries outside or limiting the vehicle traffic within Manhattan.
Screen

Video

Video presentation

Contact Us

Reach out for a new project or just say hello

Send Me A Message

Sending...
Something went wrong. Please try again.
Your message was sent, thank you!

Contact Info

Where I live

Bussero, MI
20060 Italy

Email Me At

mrk23 at hotmail dot it

Call Me At

Mobile: