Air pollution data (observations or model predictions) consist of realizations of multiple variables in time and space. How can we structure this data so that we may efficiently apply simple statistical analyses on them? In general, the goal of statistical analysis can be thought of as having two products:

- statistical description (providing a few metrics which represent a large quantity of values)
- statistical inference (which includes hypothesis testing to infer properties of a population, and predictions resulting from inferred characteristics of variables or relationships among them)

In this project, we will learn methods of exploratory data analysis using relational models of data. We will cover the software package R.