Project 3 - The Air That I Breathe


Visualization link

Sai Krishnan Thiruvarpu Neelakantan
Praveen Chandrasekaran
Varsha Jayaraman
Abdullah Aleem

Purpose of the project:
Will you need your snow boots tonight? Should you bring an umbrella? Accurate weather predictions are important for planning our day-to-day activities. Weather forecasting helps us to make more informed daily decisions, and may even help keep us out of danger.
When weather events happen, economic repercussions usually follow. Consumer behavior as well as supply and demand for a product or raw material can be affected. Energy demand soars during heatwaves; insurance claims rise after hailstorms; snow slows in-store shopping but can increase online sales; and grain prices spike during drought. Understanding how those impacts affect the U.S. economy has spurred a growing demand for analysis of real-time weather and climate data.
An increasing number of companies are faced with the need to adapt their business models based on weather volatility. Weather alone can cause the gross domestic product (GDP) to fluctuate 3 to 6 percent a year, or as much as $1.3 trillion, based on a National Weather Service analysis. Demand for value-added weather services is projected to grow by 10–15 percent a year.
Observations can provide forecasters real-time information that they are able to react to in order to accomplish things like issuing life-saving weather warnings, make critical adjustments to aviation forecasts, and much, much more. The collection of domestic and international observation systems add up to billions of observations of the Earth’s atmosphere measured each day.
As society becomes more sensitive to weather, the importance of instantaneous weather prediction for the protection of lives and property and continued economic growth increases. For example, the U.S. population that resides within 50 miles of the nation’s coastlines and is most threatened by hurricanes and flooding is growing rapidly. Such population growth in these and other high-risk areas significantly increases the need for improved weather predictions and warnings to minimize risks to life and property. Another consideration is that the new economic concept of “just-in-time manufacturing” uses computer-timed and -directed supply systems to eliminate the warehousing of parts and products at ports and factories. However, even minor weather disruptions of land, sea, and air-supply-system pathways caused by snow, ice, and high-wind weather systems can now have large, leveraged impacts on these production systems, whereas previously they had little effect.
Real-time weather-readings can be used to aid fleets in accident reconstruction and crash-related insurance claims as well – expanding the potential return on investment (ROI) for such technology. Farmers need information to help them plan for the planting and harvesting of their crops. Airlines need to know about local weather conditions in order to schedule flights.
There is a wealth of environmental data used in product and service development as well, for instance:
Energy traders develop consumer-demand forecasts when weather is expected to impact a region.
Insurance companies apply forensic analysis to weather-related accidents and claims.
Transportation providers determine where to build facilities so that fog, snow, or other weather factors pose fewer challenges to logistics.
Retailers analyze how seasonal patterns can affect merchandising and operations.
Here’s another important thing we use the weather for, and it’s not something you’re probably thinking of. The weather is often the go-to ice-breaker when you’re meeting someone new and you’re not sure what you have in common. We all have the weather in common.

Controls for choosing parameters to visualise such as Light, Tempreature, Humidity and various pollutants have been given as checkboxes to compare amongst any subset of choices. The leaflet also provides options to view the map in Mountainous/Light/Street View. Further, the user can also choose to view each of the data points in imperial/metric units.

This tab consists of visualisations using the data retrieved from R library, AotClient. The Leaflet tab revolves around visualisations and comparisons of reading based on user-selection from the leaflet.
When two nodes are selected on the leaflet, a table appears to it's right to show the Maximum/Minimum/Average readings for the daily and weekly data in addition to displaying the current values.
Similar data is pulled from Darksky in the form of table for that node, tabulating readings for Temperature, Light, Humidity for the chosen nodes.
Bar Graphs are plotted depicting the Mean/Max/Min values and line graphs using appropriate readings for both daily and weekly observations.

This tab consists of a visualisations using the data pulled using OpenAq API. Using this API, We can get the pollutant values for all the active AoT nodes. The nodes are plotted on the leafet and there is a checkbox to select the pollutants. There is also a radio button to get the list of current active nodes.

This tab consists of a visualisations using the data pulled using API. Additional data for nodes from AOT is pulled from weatherbit such as Relative Humidity, Solar Radiation, Feels Like Temperature, Dew Point, Clouds, Visibility, Precipitation, and UV Index.
The nodes are marked on a leaflet and tabulated. The selections are coded to be reflected both on the leaflet and the appropriate record in table simultaneously.
Any of the parameters(Relative Humidity, Solar Radiation, Feels Like Temperature, Dew Point, Clouds, Visibility, Precipitation, and UV Index) can be chosen to visualise as line plots or bar graphs(Max/Min/Mean) for daily and weekly stats.

This tab contains information about the coursework, who developed the project, what libraries are being used to visualize the data and the data sources from which the data is downloaded. This section also contains information about the description of the application and the purpose of doing this project.

Additional information about Heatmap:
We are plotting heat map for Chicago area for different variables that were available from three APIs, AoT, DarkSky and AQI. The tool that is used for the heatmap is "tmap". tmap takes input in "shp" form, which has a geometry row corresponding to each county which describes its boundaries. However we had data available for given points (latitude and longitude) instead of any pre-defined shape for the chicago region (e.g. neighourhoods or streets). This meant we had to do some of interpolation. We used The Thiessen polygons (or proximity interpolation) to interpolate our points to geometry form where the the geometry was created using spatstat’s dirichlet function. For the dirichlet function we needed two inputs. The points(containing the information about longitude and latitude and other paramters) in SpatialPointsDataFrame and the boundary for Chicago in SpatialPolygonDataFrame form. The Chicago boundary data was taken from the Chicago Data Portal ( and the SpatialPointsDataFrame was created using the data available from the APIs.
Because we are interpolating, this opens up some uncertanity for the areas that is far away from the point of calculation. Hence, to prevent presenting false information, We have adding the nodes on the heat map too. This allow the user to get an idea of how far their region of interest is from the point of calculation. Also, the way the DarkSky API work is it takes the points location(nodes) and inputs those points into DarkSky and plots the heatmap for those points. This way the divsions of both heat maps looks very similar and also the user to compare the data available for both these APIs.
The way the algorithm works is it takes points outside the Chicago boundary into consideration too when dividing up the region. But if there are no points inside chicago and every available point is well outside the chicago boundary. It will should data as missing. We had good enough data points ~35 in chicago for the AoT and DarkSky. But for the AQI API we didnt have alot of points inside chicago so we also take into consideration the points outside chicago.
A major challenge was to read the data and process if quickly for daily and weekly heatmaps, where we have to fetch data for for around 35 nodes. For DarkSky data our load times when were fetching the data for current and last day reading were reasonable but for the weekly data for all the nodes there was alot of data to fetch in reasonable time, we also had to make a different call for each day of the past week due to restriction of application. We solved this problem bu fetching the data in the background using a library called future. This prevents long wait times for other parts of the application. For the AoT data we using time buckets created by the owner which had the summary stats like min, max and mean already available. For the AQI API, the loading times were already reasonable.
The heat map gives you the ability to compare the data geographically throughout chicago for the following sensor for these API's. AOT: SO2, H2S, O3, NO2, CO, PM2.5, PM10) as well as temperature, light intensity, humidity. DarkSky: temperature, humidity, wind speed, wind bearing, cloud cover, visibility, pressure, ozone and AQI: data for PM25, PM10, SO2, NO2, O3, CO and BC.
You have the option to select and API. Once you select an API it will give you the options to select the available sensor. Then you can choose to view the current data, data for last day or data for last week. The data for last day and last week is summarized using the following stats: Min, Max and Mean. The data has been divided using quantiles that uses the underlying distribution of the data to equally divide up the data points. You can also select the types of background available for the maps and disable the plotting of nodes.

Libraries Used:
spatstat maptools


About the data

The data used for the visualization of Chicago weather has been collected from various sources and processed in innovative ways to present the dashboard efficiently. All the data used in the application are pulled from various API and no external data files were used to support the application.
The AoT Data was retrieved for the 'alive' nodes installed around the Chicago Area, AotClient. The live data were using the R library under the ls.observations() and also JSON calls were directly used for some functions. These measurements include calibrated temperature, humidity, pressure data, as well as raw air quality data, including for various gases and particulate matter, for each of the alive nodes. The fields from the API data were further processed to generate a customized generic data frame which could be loaded once for the entire project in order to save time. The data has been pulled for three categories for each node alive, The Current data, Last 24 hours and Last 7 days. These were achieved using the TimeBucket option in the Observations endpoint which gives the average, min and max values for the given timeline and intervals for the node chosen. This data was processed to fit the various plots we used in the application.
Below is the link to the AoT API documentation:
Below is the link to the R documentation:
Below are some examples on how to use the timebucket option using JSON call:
Format: time_bucket={function}:{interval}
The avg in the below call can be replaced with max and min respectively.

The DarkSky data provides us the weather information when we pass it the location coordinates (latitude and longitude point). We have received the location coordinates for the active AoT nodes and passed the values in the DarkSky API to receive the weather details for that location. These data is pulled using the R library build for DarkSky using get_current_forecast(). Out of the given weather information received, we have pulled temperature, humidity, wind speed, wind bearing, cloud cover, visibility, pressure, ozone, and summary. These data have been processed to calculate values for the current, daily and weekly time frame with the minimum, maximum and average. These data have been used in the heatmap, line chart, bar chart and table forms for the user to visually analyze.
Below is the link to the DarkSky API documentation:
Below is the link to the R documentation:

OPENAQ: OpenAQ is used to retrieve the pollutant data information around the Chicago area. OpenAQ has around 25 nodes around the city. Each of the nodes collected different pollutant data and update it every hour. The pollutant list includes PM25, PM10, SO2, NO2, O3, CO, and BC. The data is pulled using the R library build for OpenAQ using the aq_latest() function. The data pulled is used to show the total number of nodes that the OpenAQ has around Chicago and also differentiate the number of nodes which active last 24 hours collecting data. We also show the pollutant values in the leaflet map as well as the in a daily and weekly format along with Average, Minimum and Maximum values.

The WeatherBit is also about the Weather information for the locations that are in Array of Things data. The information showed using the WeatherBit provides addon details that reason the data being shown in all the other visualization done in the application. This enables the user to understand better about the environment and what causes the trends in those values. The data showed in the WeatherBit include Relative Humidity, Solar Radiation, Dew Point, Precipitation and UV Index. These values are shown for every node in the form of Leaflet map, Table, Line chart and Bar Chart. Since the WeatherBit does not have a built-in R Library we have used direct JSON calls in R to pull the data from this API.
We have used the Current Weather API and Historical Weather API (hourly) options in the WeatherBit.
Below is the link to the WeatherBit API documentation:

Source Code

Link to download the code
YouTube Video

Role of Team Members

Praveen Chandrasekaran
1) Worked on the plotting OpenAQ sites in the Chicago area on the map, allowing the user to filter based on any or all of the pollutants being monitored to see which sites are collecting which data, and which sites are currently alive.
2) Worked on the graduate requirements visualisation. The data was retreived using Weatherbit API.

Sai Krishnan Thiruvarpu Neelakantan
1) Worked on the AoT API to show the active sites for chicago area in the leafet and table and added weather related information using darksky for that area in the table. The active sites were plotted based on the one or subset of variables selected.
2) Worked on the UI template with all the controls required for visualization.

Abdullah Aleem
1) Worked on the heatmap for the Chicago area where the user can choose to view any of the AoT data or the Dark Sky data for the current time, min, max, average for the last 24 hourr or min, max, average for the last 7 days.
2) Worked on the heatmap to integrate pollutants(PM25, PM10, SO2, NO2, O3, CO and BC) from OpenAQ API.

Varsha Jayaraman
1) Worked on the comparison of nodes by providing information through graphical plots(line chart and bar chart) for both the Aot and Darksky data.
2) Worked on the updation for current data(Aot, Darksky, OpenAQ, Weatherbit) every minute.

Finally, everybody contributed and worked towards the development of the website information.

Insights from the data

1) The light intensity get more if we move far away from downtown chicago. This could be perhaps due the to infrastructure in downtown chicago and it blocking light.
2) We can see that the temperature is higher on the west side of Chicago. A reason for that could be its distance from Lake Michigan.
3) As you can see for Humidity, it is more towards Lake Michigan. With peaks at the north and south coast and it gets less as we move towards the west.
4) As you can see that there are no nodes inside Chicago for CO from AQI API and hence the algorithm doesnt interpolate from 1 point outside chicago and shows the data as missing.
5) For Ozone we can see it gets more as we go towards the south. Its particularly low near ohare and pretty much constant thorough the city. Also note that the map uses points outside chicago to do a good heat map.
6) Here you can see the temperature for the last day and the past week. As well as stats sumarized in bar graph. We can see that the temperature was particularly now for the last couple of days (courtesy to a late april snow). From the bar graphs you can see the min max and mean for the day and for the week and see the range for the day of recording is between 8 and 15. However the range for the week was between 0 and 20. You can also see the difference the slight differnce between 2 nodes. The first node is the north most node and the second node the south most node of chicago.
The above points corresponds to the following first 6 images respectively.