When should you use ggplot2?

You should use ggplot to make most of your figures, because it:

I use ggplot2 to generate almost all my figures, with the exception of some spatial figures and a few specialized plots. My experience has been that ggplot2 often chokes on larger raster or shape files. But, there are some great resources for visualizing spatial data using ggplot2 (e.g., ggmap and an example from Casey O’Hara).

Good resources

An example in 3 easy steps

We will use my cheatsheet as a reference to make a scatterplot. We will use a sample dataset from the package gcookbook.

# loading packages and data

#install.packages('gcookbook') # install if you don't have the package
library(ggplot2)     
library(gcookbook)   # source of example data
library(knitr)       # functions for knitting Rmd documents to html
library(RColorBrewer)

hw <- heightweight
kable(head(hw))
sex ageYear ageMonth heightIn weightLb
f 11.92 143 56.3 85.0
f 12.92 155 62.3 105.0
f 12.75 153 63.3 108.0
f 13.42 161 59.0 92.0
f 15.92 191 62.5 112.5
f 14.25 171 62.5 112.0

Step 1: Set everything up

In this step, you will use the ggplot and aes functions to assign variables in your dataset to the x and y axes, and if desired, to other aesthetics such as, size, color, and labels.

This does not do any plotting! It just tells ggplot how to assign the data.

ggplot(hw, aes(x = ageYear, y = weightLb))

Step 2: Select the plot geom

This step tells ggplot what type of figure to make. In this case, we will use geom_point which is a scatterplot.

ggplot(hw, aes(x = ageYear, y = weightLb)) +
  geom_point()

Tada!! You now have a basic plot, but you will probably want to modify a few things.

Step 3: Fine-tune the plot

This section describes how to make some routine changes, such as:

  • changing point size, shape, transparency
  • changing axes labels
  • fitting a linear model
  • adding a reference line
  • changing the overall look of the plot using existing themes
  • displaying different groups of data
ggplot(hw, aes(x = ageYear, y = weightLb)) +
  geom_point(size = 4, shape = 15, alpha = 0.3) + #changing point size, shape, color
  labs(x = 'Age (yr)', y = "Weight (lb)", title = "Older people weigh more!") +
  stat_smooth(method = lm) +  # default is loess spline, can remove 95% confidence interval, se=FALSE
  geom_hline(yintercept = 150, color = 'orange', linetype=3, size=1) +
  theme_bw() 

Changing the color to correspond to a third variable (sex, in this case):

ggplot(hw, aes(x = ageYear, y = weightLb, color=sex)) +
  geom_point(size = 4, shape = 15, alpha = 0.3) + #changing point size, shape, color
  labs(x = 'Age (yr)', y = "Weight (lb)", title = "Older people weigh more!") +
  stat_smooth(method = lm, size = 1) + # default is loess spline, can remove 95% confidence interval, se=FALSE
    geom_hline(yintercept = 150, color = 'orange', linetype=3, size=1) +
  theme_bw()

Creating separate plots for males and females using faceting. The arrangement of the plots can be controlled using row_variable ~ column_variable (use a period to indicate no variable):

ggplot(hw, aes(x = ageYear, y = weightLb, color=sex)) +
  geom_point(size = 4, shape = 15, alpha = 0.3) + #changing point size, shape, color
  labs(x = 'Age (yr)', y = "Weight (lb)", title = "Older people weigh more!") +
  stat_smooth(method = lm, se = FALSE) + # default is loess spline
  facet_grid(. ~ sex) + 
  theme_bw()

Here is a variation on the theme:

ggplot(hw, aes(x = ageYear, y = weightLb, color=sex)) +
  geom_point(size = 4, shape = 15, alpha = 0.3) + #changing point size, shape, color
  labs(x = 'Age (yr)', y = "Weight (lb)", title = "Older people weigh more!") +
  stat_smooth(method = lm, se = FALSE) + # default is loess spline
  facet_grid(sex ~ ., scales = 'free') + 
  theme_bw()

Some common figures

histogram

ggplot(hw, aes(x = weightLb)) +
  geom_histogram()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Ways to modify the figure:

ggplot(hw, aes(x = weightLb)) +
  geom_histogram(fill="gray") +  ## use 'fill' (color only refers to the outline) 
  labs(y = "Number of people", x = "weight (lb)") +
  theme_bw()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

boxplot

The box portion of this figure represents the range of values where 50% of the data occurs. The midline is the median. The whiskers extend to \(1.5*box\), and outliers beyond the whiskers are points.

In this case, I demonstrate how to assign the plot a name (bp) so we can add elements more easily.

bp <- ggplot(hw, aes(x = sex, y = weightLb)) +
  geom_boxplot(fill="gray") +
  labs(y = "Weight (lb)", x = "") +
  theme_bw()

bp

I like seeing the data used to create the boxplot, this is easy to add as points:

bp + 
  geom_point()   # overlay true data, jitter to see all points

There are a lot of overlapping points which makes it difficult to discern the true density of points. I will use the geom_jitter function, which is similar to geom_point, but randomly jitters the points.

bp +  
  geom_jitter()

This looks pretty good but is more scattered than I would like. The degree of scatter can be controlled:

bp +  
  geom_jitter(position = position_jitter(width = .05), alpha = 0.5)

barplot

This function can be used to create a variety of styles depending on the arguments that are used.

## create a dataset:
data <- expand.grid(pet = c('dog', 'cat', 'hamster'), gender=c('m', 'f'))
data$size <- c(45, 10, 1, 40, 8, 2)

ggplot(data, aes(x=pet, y=size, fill=gender)) +
  geom_bar(stat="identity")

ggplot(data, aes(x=pet, y=size, fill=gender)) +
  geom_bar(stat="identity", position="dodge")

lineplot

ggplot(data, aes(x=pet, y=size, color=gender, group=gender)) +
  geom_line() +
  geom_point(size=5)

Extra information

Make your own themes

It is possible to make your own theme. This is useful when you want your plots to have a consistent appearance, and you don’t want to repeat a lot of code for each figure.

I created a theme that I often use for scatterplots in publications. One issue with the default ggplot2 figures is that when they are saved the axes labels can appear very small. I increased the size of the labels to make them more readable.

I keep my theme on Github so I can access it from anywhere.

Another theme idea: I like the general appearance of the figures at rvisualization.com. The background is minimalistic which puts the emphasis on the data. A good project would be creating a theme based on their code.

source('https://raw.githubusercontent.com/OHI-Science/ohiprep/master/src/R/scatterTheme.txt')

ggplot(hw, aes(x = ageYear, y = weightLb)) +
  geom_point(size = 4, shape = 15, alpha = 0.3) + #changing point size, shape, color
  labs(x = 'Age (yr)', y = "Weight (lb)", title = "Older people weigh more!") +
  stat_smooth(method = lm) + 
  scatterTheme


## save the figure
ggsave('example.png', width = 6, height = 6)

Dealing with color

The ggplot default colors aren’t always the prettiest, and most of the time, you will want to change them.

One thing about ggplot2 that confused me for a while is that both color and fill are used to define color. For the most part, ‘color’ is used to color lines and outlines of polygons (e.g., histograms, bar plots, shapes 21-25), and ‘fill’ is used to color the rest of the area. However, point shapes are treated a bit differently and color is used to color the entire point.

I recommend using established color palettes, such as those from RColorBrewer.

display.brewer.all()

To select a particular ColorBrewer palette:

myCols <- brewer.pal(11, "Spectral")  #choose the number of colors you want from the palette
myCols  
##  [1] "#9E0142" "#D53E4F" "#F46D43" "#FDAE61" "#FEE08B" "#FFFFBF" "#E6F598"
##  [8] "#ABDDA4" "#66C2A5" "#3288BD" "#5E4FA2"

This returns a vector of colors in hexadecimal (the color language used by R).

There are many ways to assign colors in ggplot2 (so many that it can be rather confusing). I am only going to describe the methods I have found work best for me.

The first thing to consider is whether the variable you want to represent with color is discrete (e.g., categories, such as gender or eye color) or continuous (e.g., weight or height).

Discrete variable

If you have a discrete variable, the best bets are to use scale_color_brewer or scale_color_manual (or, alternatively ‘scale_fill_brewer’ or ‘scale_fill_manual’ if you are trying to color the inside of a polygon shape). scale_color_brewer provides a nice shortcut if you are going to use one of the Color Brewer palettes and do not care how the colors are assigned. scale_color_manual provides a lot of flexibility for assigning colors to particular categories.

sp <- ggplot(hw, aes(x = ageYear, y = weightLb, color=sex)) +
  geom_point(size = 4, shape = 19) + 
  labs(x = 'Age (yr)', y = "Weight (lb)", title = "Older people weigh more!")  +
  theme_bw()

# using a Colour Brewer palette:
sp +
  scale_colour_brewer(palette = "Set1") 

In this case, I want females to be purple and males green:

sp +
  scale_color_manual(values = c("purple", "darkgreen"),
                     limits = c("f", "m"),
                     labels =c("females", "males"))

Continuous variable: displayed as a discrete variable

Sometimes you have a continuous variable you want to display as categories. In the following example, instead of mapping color to sex, we map it to height. But, we want height to be displayed as categories (tall, medium, short).

# figure out quantiles I want to use:
quantile(hw$heightIn)
##     0%    25%    50%    75%   100% 
## 50.500 58.725 61.500 64.300 72.000
# use cut function to make the breaks and labels:
sp <- ggplot(hw, aes(x = ageYear, y = weightLb, 
                     color=cut(heightIn, 
                               breaks = c(-Inf, 58.7, 64.3, Inf),
                               labels = c("small", "medium", "large")))) +
  geom_point(size = 4, shape = 19) + 
  labs(x = 'Age (yr)', y = "Weight (lb)", title = "Older people weigh more!")  +
  theme_bw()

# Default:
sp 

# Some changes using scale_color_manual:

# Determine hex codes for ColorBrewer YlOrRd palette
brewer.pal(9, "YlOrRd")
## [1] "#FFFFCC" "#FFEDA0" "#FED976" "#FEB24C" "#FD8D3C" "#FC4E2A" "#E31A1C"
## [8] "#BD0026" "#800026"
sp +
  scale_color_manual(values = c('#FED976', '#FD8D3C', '#E31A1C'), # colors
                     limits =c('small', 'medium', 'large'),     # categories that map to colors
                     name = 'size',                             # legend title
                     labels = c('smallish', 'med', 'very large')) # legend category names

Continuous variable

There are three general options I tend to use based on whether I want a 2 color gradient palette, 3 color diverging palette, or 4+ color gradient.

Here is the default:

sp <- ggplot(hw, aes(x = ageYear, y = weightLb, color=heightIn)) +
  geom_point(size = 4, shape = 19) + 
  labs(x = 'Age (yr)', y = "Weight (lb)", title = "Older people weigh more!")  +
  theme_bw()

sp

A two color gradient palette:

sp +
  scale_color_gradient(low = 'yellow', high = 'red')

A three color diverging palette:

sp +
  scale_color_gradient2(low = 'yellow', mid = 'grey', high = 'red', midpoint = 65)
## Warning: Non Lab interpolation is deprecated

Multiple color scale:

sp +
  scale_color_gradientn(colours = rev(brewer.pal(11, "Spectral")))

Adding text to your figure

labeling points

If the text that you want to add corresponds to a variable in your data, you should use geom_text.

## add some labels to the hw data
hw$name <- c("chad", "lee", "pierce", 'niles')

sp <- ggplot(hw, aes(x = ageYear, y = weightLb, color=heightIn)) +
  geom_point(size = 4, shape = 19) + 
  labs(x = 'Age (yr)', y = "Weight (lb)", title = "Older people weigh more!")  +
  scale_color_gradientn(colours = rev(brewer.pal(11, "Spectral"))) +
  theme_bw()

# The general command:
sp +
  geom_text(aes(label=name))

This always takes a bit of work to get right:

sp +
 geom_text(aes(label = name), color = 'black', size = 3, vjust = 1.5) # vjust adds a value in vertical direction

It is usually better to only display a subset of the points. In some cases, you might be able to simply subset the data:

sp +
 geom_text(data=subset(hw, ageYear > 16), aes(x = ageYear, y = weightLb, label = name), 
           color = 'black', size = 3, vjust = 1.5) # vjust adds a value in 

Sometimes it is necessary to make a variable with the names you want displayed:

hw$name2 <- NA
hw$name2[c(2,5,10,15)] <- "niles"
kable(head(hw))
sex ageYear ageMonth heightIn weightLb name name2
f 11.92 143 56.3 85.0 chad NA
f 12.92 155 62.3 105.0 lee niles
f 12.75 153 63.3 108.0 pierce NA
f 13.42 161 59.0 92.0 niles NA
f 15.92 191 62.5 112.5 chad niles
f 14.25 171 62.5 112.0 lee NA
sp <- ggplot(hw, aes(x = ageYear, y = weightLb, color=heightIn)) +
  geom_point(size = 4, shape = 19) + 
  labs(x = 'Age (yr)', y = "Weight (lb)", title = "Older people weigh more!")  +
  scale_color_gradientn(colours = rev(brewer.pal(11, "Spectral"))) +
   geom_text(aes(label = name2), color = 'black', size = 5, vjust = 1.5) +
  theme_bw()

sp 
## Warning: Removed 232 rows containing missing values (geom_text).

Expressions

It is also possible to add text to a particular location on the plot using expressions. For example, we can add the R2 value to the plot.

# figure out what the R2 value is:
mod <- lm(weightLb ~ ageYear, data=hw)
summary(mod)
## 
## Call:
## lm(formula = weightLb ~ ageYear, data = hw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.751 -10.370  -1.970   7.751  49.397 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -6.3356     9.2118  -0.688    0.492    
## ageYear       7.8513     0.6699  11.720   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.06 on 234 degrees of freedom
## Multiple R-squared:  0.3699, Adjusted R-squared:  0.3672 
## F-statistic: 137.4 on 1 and 234 DF,  p-value: < 2.2e-16
sp +
  annotate("text", x = 16, y = 60, label = 'R^2==0.37', parse = TRUE, fontface = 'bold') 
## Warning: Removed 232 rows containing missing values (geom_text).

You can also use annotate to add things other than “text”, such as line segments, rectangles, arrows, etc.

Some general tips