PeterMac Data Science’s modified version of material by the University of Cambridge (Mark Dunning, Suraj Menon and Aiora Zabala. Original material by Robert Stojnić, Laurent Gatto, Rob Foy, John Davey, Dávid Molnár and Ian Roberts)
4. Plotting in R
Plot basics
- As we have heard, R has extensive graphical capabilities
- …but we need to start simple
- We will describe base graphics in R: the plots available with any standard R installation
- other more advanced alternatives are, e.g.,
lattice
, ggplot2
- See our intermediate R course for fancy graphics
- Plotting in R is a vast topic:
- We cannot cover everything
- You can tinker with plots to your hearts content
- Best to learn from examples; e.g. The R Graph Gallery
- You need to think about how best to visualise your data
- R cannot prevent you from creating a plotting disaster:
Making a Scatter Plot
- If given a single vector as an argument, the function
plot()
will make a scatter plot with the values of the vector on the y axis, and indices in the x axis
- e.g. it puts a point at:
- x = 1, y = 70.8
- x = 2, y = 67.9 etc…
- We are going to be using the patients data frame, read using the following command
patients <- read.delim(\patient-info.txt\)
Remember that $
can be used to access a particular column. The result is a vector, which is the most-basic type of data used in plotting
- R tries to guess the most appropriate way to visualise the data, according to the type and dimensions of the object(s) provided
- Axis limits, labels, titles are inferred from the data
- We can modify these as we wish, by specifying arguments
- We can give two arguments to
plot()
:
- In order to visualise the relationship between two variables
- It will put the values from the first argument in the x axis, and values from the second argument on the y axis
patients$Age
plot(patients$Age, patients$Weight)
Making a barplot
- Other types of visualisation are available:
- These are often just special cases of using the
plot()
function
- One such function is
barplot()
- It is more usual to display count data in a barplot
- e.g. the counts of a particular categorical variable
barplot(summary(patients$Sex))
Plotting a distribution: Histogram
- A histogram is a popular way of visualising a distribution of continuous data:
- You can change the width of bins
- The y-axis can be either frequency of density
Plotting a distribution: Boxplot
- The boxplot is commonly used in statistics to visualise a distribution:
boxplot(patients$Weight ~ patients$Sex)
- We can include multiple factors
boxplot(patients$Weight ~ patients$Smokes + patients$Sex)
- Other alternatives to consider:
example(dotchart)
example(stripchart)
example(vioplot) # From vioplot library
example(beeswarm) # From beeswarm library
Exercise: Exercise 4a
- In the course folder you will find the file
ozone.csv
:
- Read these data into R using
read.csv
or read.delim
as described in the previous section
- you will need to choose which is appropriate for the file type
- What data types are present? Try to think of ways to create the following plots from the data
- Scatter plot two variables. e.g. Solar Radiation against Ozone
- A histogram. e.g. Wind Speed
- Boxplot of a continuous variable against a categorical variable. e.g. Ozone level per month
Simple customisations
plot()
comes with a large collection of arguments that can be set when we call the function:
- Recall that, unless specified, arguments have a default value
- We can choose to draw lines on the plot rather than points
- The rest of the plot remains the same
plot(patients$Weight, type = "l")
- We can also have both lines and points:
plot(patients$Weight, type = "b")
- Add an informative title to the plot using the
main
argument:
plot(patients$Age, patients$Weight,
main = "Relationship between Weight and Age")
plot(patients$Age, patients$Weight, ylab = \Weight\)
plot(patients$Age, patients$Weight, ylab = "Weight")
- We can specifiy multiple arguments at once:
- here
ylim
and xlim
are used to specify axis limits
plot(patients$Age,patients$Weight,
ylab="Weight",
xlab="Age",
main="Relationship between Weight and Age",
xlim=c(10,70),
ylim=c(60,80))
Defining a colour
Changing the col
argument to plot()
changes the colour that the points are plotted in:
plot(patients$Age, patients$Weight, col = "red")
Plotting characters
- R can use a variety of plotting characters
- Each of which has a numeric code
plot(patients$Age, patients$Weight, pch = 16)
- Or you can specify a character:
plot(patients$Age, patients$Weight, pch = "X")
Size of points
Character expansion: Make the size of points 3 times larger than the default
plot(patients$Age, patients$Weight, cex = 0.2)
or 20% of the original size
plot(patients$Age, patients$Weight, cex = 0.2)
Colours and characters as vectors
- Previously we have used a vector of length 1 as our value of colour and character
- We can use a vector of any length:
- the values will get recycled (re-used) so that each point gets assigned a value
- We can use a pre-defined colour palette (see later)
plot(patients$Age, patients$Weight,
col = c("red","blue"))
We can use factors to determine which points to color
plot(patients$Age, patients$Weight,col = patients$Sex)
palette(c("firebrick1","dodgerblue"))
plot(patients$Age, patients$Weight,col = patients$Sex)
Other plots use the same arguments
- Other plotting functions use the same arguments as
plot()
- technical explanation: the arguments are ‘inherited’
We can change color, and size according to data
plot(patients$Age, patients$Weight, col = patients$Sex, cex=patients$Age/10,pch=18)
Exercise: exercise4b
- Can you re-create the following plots? Hint:
- See the
breaks
and freq
arguments to hist (?hist
) to create 20 bins and display density rather than frequency
- For third plot, see the rainbow function (
?rainbow
)
- Don’t worry too much about getting the colours exactly correct
- The
las
argument changes the label orientation. See ?par
.
- look at the arguments to
boxplot
to see how to change the names printed under each box
More on colours
- The
rainbow()
function is used to create a vector of colours for the boxplot; in other words a palette:
- Red, Orange, Yellow, Green, Blue, Indigo, Violet, etc.
- Other palette functions available:
heat.colors(), terrain.colors(), topo.colors(), cm.colors()
- Red, Orange, Yellow, Green, Blue, Indigo, Violet….etc
- More aesthetically-pleasing palettes are provided by the
RColorBrewer
package:
- can also check for palettes that are accepted for those with colour-blindness
- You may need to install
RColorBrewer
with the following line of code
- remember, you only need to do this once
install.packages("RColorBrewer")
library(RColorBrewer)
display.brewer.all()
display.brewer.all(colorblindFriendly = TRUE)
weather <- read.csv("ozone.csv")
boxplot(weather$Temp ~ weather$Month,col=brewer.pal(5,"Set1"))
