--- title: "Grammar of graphics exercises" author: "Nicholas Horton (nhorton@amherst.edu)" date: "August 31, 2017" output: html_document: fig_height: 5 fig_width: 7 pdf_document: fig_height: 5 fig_width: 7 word_document: fig_height: 3 fig_width: 5 --- ```{r, setup, include=FALSE} library(mdsr) # Load additional packages here ``` ## Introduction These exercises are taken from the grammar of graphics chapter from **Modern Data Science with R**: http://mdsr-book.github.io. Other materials relevant for instructors (sample activities, overview video) for this chapter can be found there. ## Galton Using the famous `Galton` data set from the `mosaicData` package: 1. Create a scatterplot of each person's `height` against their father's height 2. Separate your plot into facets by `sex` 3. Add regression lines to all of your facets Hint: recall that you can find out more about the data set by running the command `?Galton`. SOLUTION: ```{r} library(mdsr) glimpse(Galton) # solution goes here ``` ## Railtrails Using the `RailTrail` data set from the `mosaicData` package: 1. Create a scatterplot of the number of crossings per day `volume` against the high temperature that day 2. Separate your plot into facets by `weekday` 3. Add regression lines to the two facets SOLUTION: ```{r} library(mdsr) glimpse(RailTrail) # solution goes here ``` ## Hamilton and Angelica Angelica Schuyler Church (https://en.wikipedia.org/wiki/Angelica_Schuyler_Church, 1756--1814) was the daughter of New York Governer Philip Schuyler and sister of Elizabeth Schuyler Hamilton. Angelica, New York was named after her. Generate a plot of the reported proportion of babies born with the name Angelica over time and interpret the figure. SOLUTION: ```{r} library(mdsr) library(babynames) glimpse(babynames) # solution goes here ``` ## Marriage The following questions use the `Marriage` data set from the `mosaicData` package. 1. Create an informative and meaningful data graphic. 2. Identify each of the visual cues that you are using, and describe how they are related to each variable. 3. Create a data graphic with at least *five* variables (either quantitative or categorical). For the purposes of this exercise, do not worry about making your visualization meaningful---just try to encode five variables into one plot. ```{r} library(mdsr) glimpse(Marriage) # solution goes here ``` ## MLB teams The `MLB_teams` data set in the `mdsr` package contains information about Major League Baseball teams in the past four seasons. There are several quantitative and a few categorical variables present. See how many variables you can illustrate on a single plot in R. The current record is 7. (Note: This is *not* good graphical practice---it is merely an exercise to help you understand how to use visual cues and aesthetics!) ```{r} library(mdsr) glimpse(MLB_teams) # solution goes here ``` ## Payroll Use the `MLB_teams` data in the `mdsr` package to create an informative data graphic that illustrates the relationship between winning percentage and payroll in context. SOLUTION: ```{r} library(mdsr) # solution goes here ``` ## Dead names Use the `make_babynames_dist` function in the `mdsr` package to recreate the "Deadest Names" graphic from FiveThirtyEight (http://tinyurl.com/zcbcl9o). SOLUTION: ```{r} library(mdsr) babynames_dist <- make_babynames_dist() glimpse(babynames_dist) ``` ## MacLeish The `macleish` package contains weather data collected every ten minutes in 2015 from two weather stations in Whately, MA. Using `ggpplot2` create a data graphic that displays the average temperature over each 10-minute interal (`temperature` as a function of time (`when`). SOLUTION: ```{r} library(mdsr) library(macleish) glimpse(whately_2015) # solution goes here ``` ## NASA weather Using data from the `nasaweather` package, create a scatterplot between `wind` and `pressure` with color being used to distinguish the `type` of storm. SOLUTION: ```{r message=FALSE} library(mdsr) library(nasaweather) glimpse(storms) # solution goes here ``` ## More weather Using data from the `nasaweather` package, use the `geom_path` function to plot the path of each tropical storm in the `storms` data table. Use color to distinguish the storms from one another, and use facetting to plot each `year` in its own panel. SOLUTION: ```{r} library(mdsr) library(nasaweather) # solution goes here ```