logo

WEEK 2: GRAPHICS AND EXPLORATORY ANALYSIS

Table of Contents

2.1 Introduction

A picture is worth a thousand words; when presenting and interpreting data this basic idea also applies. There has been, indeed, a growing shift in data analysis toward more visual approaches to both interpretation and dissemination of numerical analysis. Part of the new data revolution consists in the mixing of ideas from visualisation of statistical analysis and visual design. Indeed data visualisation is one of the most interesting areas of development in the field.

Good graphics not only help researchers to make their data easier to understand by the general public. They are also a useful way for understanding the data ourselves. In many ways it is very often a more intuitive way to understand patterns in our data than trying to look at numerical results presented in a tabular form.

Recent research has revealed that papers which have good graphics are perceived as overall more clear and more interesting, and their authors perceived as smarter (see this presentation)

The preparation for this session includes many great resources on visualising quantitative information, and if you have not had time to go thorugh them, I recommend that you take some time to do so.

As with other aspects of R, there are a number of core functions that can be used to produced graphics. For example, we’ve already used hist() and plot(). However these offer limited possibilities for building graphs, and it is by exploring packages that are developed especially for graphing.

The package we will be using throughout this tutorial is ggplot2. The aim of ggplot is to implement the grammar of graphics. The ggplot2 package has excellent online documentation.

If you don’t already have the package installed, you will need to do so using the install.packages() function.

You will then need to load up the package

library(ggplot2)                                  
## Warning: package 'ggplot2' was built under R version 3.3.2

The grammar of graphics defines various components of the graphic. Some of the most important are:

-The data: For using ggplot2 the data has to be stored as a data frame

-The geoms: They describe the objects that represent the data (e.g., points, lines, polygons, etc.).

-The aesthetics: They describe the visual characteristics that represent data (e.g., position, size, colour, shape, transparency).

-Facets: They describe how data is split into subsets and displayed as multiple small graphs.

-Stats: They describe statistical transformations that typically summarise data.

2.2 Anatomy of a plot

Essentially the philosophy behind this as that all graphics are made up of layers. Ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same few components: a data set, a set of geoms—visual marks that represent data points, and a coordinate system.

Take this example (all taken from Wickham, H. (2010). A layered grammar of graphics. Journal of Computational and Graphical Statistics, 19(1), 3-28.)

You have a table such as:

You then want to plot this. To do so, you want to create a plot that combines the following layers:

This will result in a final plot:

Let’s have a look at what this looks like for a graph.

Let’s have a look at some of the homework data agail, lets look at number of banning orders for different football clubs.

First read the data in from wherever you had saved it

fbo <- read.csv("/Users/reka/Desktop/R-for-Criminologists/fbo-by-club-supported-cleaned.csv")

Now let’s revisit the question of looking at different number of banning orders for clubs in different leagues. But as a first step, let’s just plot the number of banning orders for each club. Let’s build this plot:

ggplot(data = fbo, aes(x = Club.Supported, y=Banning.Orders)) +          #data
   geom_point() +                           #geometry
  theme_bw()                                    #backgroud coordinate system