Practical 2 What you need to be able to do in R before you start

Most people taking this module have used R a lot already, but it is possible you’re a bit rusty, or you’ve found this course on GitHub and have no R experience. This isn’t a problem, I will try and summarise what you need to be able to do to get these practicals running below. However, I’m not going to write a help guide to R here, if you can’t work out how to open it and get started etc. I strongly recommend the book Getting Started With R or there are lots of great tutorials online.

Throughout, R code will be in shaded boxes:

library(ape)

R output will be preceded by ## and important comments will be in quote blocks:

Note that many things in R can be done in multiple ways. You should choose the methods you feel most comfortable with, and do not panic if someone is doing the same analyses as you in a different way!

2.1 Installing R (and RStudio)

Install R from [https://cran.r-project.org]
You can install RStudio from [http://www.rstudio.com/products/rstudio/download/]. I’d recommend trying this out if you’re a beginner as it has a nicer interface.

2.2 Setting the working directory

To use the practicals you need to download all the files for each practical into a folder somewhere on your computer (I usually put mine on the Desktop). We will then tell R to look in this folder for all data etc. by setting the working directory to that folder.

To set the working directory you’ll need to know what the path of the folder is. The path is really easy to find in a Windows machine, just click on the address bar of the folder and the whole path will appear. For example on my Windows machine, the path is:

C:/Users/Natalie/Desktop/RAnalyses

It’s a bit trickier to find the path on a Mac, so use Google if you need help. On my Mac the path is:

~/Desktop/RAnalyses

Note that the tilde ~ is a shorthand for /Users/Natalie.

We can then set the working directory to your folder using setwd:

setwd("~/Desktop/RAnalyses")

Alternatively if using RStudio you use the menus to do this. Go to Session > Set Working Directory > Choose Directory.

Setting the working directory tells R which folder to look for data in (and which folder you’d like it to write results to). It saves a bit of typing when reading files into R. Now I can read in a file called mydata.csv as follows:

mydata <- read.csv("mydata.csv")

rather than having to specify the folder too:

mydata <- read.csv("~/Desktop/RAnalyses/mydata.csv")

Remember if you move the data files, or the folder itself, you’ll need to set the working directory again.

2.3 Using a script

Next, open a text editor. R has an inbuilt editor that works pretty well, but NotePad and TextEdit are fine too. However, I highly recommend using something that will highlight code for you. My personal favorite is Sublime Text 2, because you can also use it for any other kind of text editing like LaTeX, html etc. RStudio’s editor is also very nice.

You should type (or copy and paste) your code into the text editor, edit it until you think it’ll work, and then either paste it into R’s console window, or you can highlight the bit of code you want to run and press ctrl or cmd and enter or R (different computers seem to do this differently). This will automatically send it to the console.

Saving the script file lets you keep a record of the code you used, which can be a great time saver if you want to use it again, especially as you know this code will work!

You can cut and paste code from my handouts into your script. You don’t need to retype everything!

If you want to add comments to the file (i.e., notes to remind yourself what the code is doing), put a hash/pound sign (#) in front of the comment.

# Comments are ignored by R but remind you what the code is doing. 
# You need a # at the start of each line of a comment.
# Always make plenty of notes to help you remember what you did and why

2.4 Installing and loading extra packages in R

To run any specialised analysis in R, you need to download one or more additional packages from the basic R installation. For these problem sets you will need to install the following packages:

ape
geiger
picante
caper
BAMMtools

To install the package ape:

install.packages("ape")

Pick the closest mirror to you if asked.

You’ve installed the packages but they don’t automatically get loaded into your R session. Instead you need to tell R to load them every time you start a new R session and want to use functions from these packages. To load the package ape into your current R session:

library(ape)

You can think of install.packages like installing an app from the App Store on your smart phone - you only do this once - and library as being like pushing the app button on your phone - you do this every time you want to use the app.

2.5 Loading and viewing your data in R

R can read files in lots of formats, including comma-delimited and tab-delimited files. Excel (and many other applications) can output files in this format (it’s an option in the Save As dialog box under the File menu). Mostly I will give you .csv files during these practicals. As an example, here is how you would read in the tab-delimited text file called Primatedata.csv which we are going to use in the PGLS practical. Load these data as follows, assuming you have set your working directory (see step 2 above).

primatedata <- read.csv("Primatedata.csv")

read.csv reads in comma delimited files.

This is a good point to note that unless you tell R you want to do something, it won’t do it automatically. So here if you successfully entered the data, R won’t give you any indication that it worked. Instead you need to specifically ask R to look at the data.

We can look at the data by typing:

str(primatedata)

## 'data.frame':    77 obs. of  9 variables:
##  $ Order          : Factor w/ 1 level "Primates": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Family         : Factor w/ 15 levels "Aotidae","Atelidae",..: 2 2 2 14 3 3 3 4 4 4 ...
##  $ Binomial       : Factor w/ 77 levels "Alouatta palliata",..: 5 6 7 8 9 10 11 15 16 17 ...
##  $ AdultBodyMass_g: num  6692 7582 8697 958 558 ...
##  $ GestationLen_d : num  138 226 228 164 154 ...
##  $ HomeRange_km2  : num  2.28 0.73 1.36 0.02 0.32 0.02 0.00212 0.51 0.16 0.24 ...
##  $ MaxLongevity_m : num  336 328 454 304 215 ...
##  $ SocialGroupSize: num  14.5 42 20 2.95 6.85 ...
##  $ SocialStatus   : int  2 2 2 2 2 2 2 2 2 2 ...

Always look at your data before beginning any analysis to check it read in correctly.

str shows the structure of the data frame (this can be a really useful command when you have a big data file). It also tells you what kind of variables R thinks you have (characters, integers, numeric, factors etc.). Some R functions need the data to be certain kinds of variables so it’s useful to check this.

As you can see, the data contains the following variables: Order, Family, Binomial, AdultBodyMass_g, GestationLen_d, HomeRange_km2, MaxLongevity_m, and SocialGroupSize.

head(primatedata)

##      Order      Family           Binomial AdultBodyMass_g GestationLen_d
## 1 Primates    Atelidae   Ateles belzebuth         6692.42         138.20
## 2 Primates    Atelidae   Ateles geoffroyi         7582.40         226.37
## 3 Primates    Atelidae    Ateles paniscus         8697.25         228.18
## 4 Primates Pitheciidae  Callicebus moloch          958.13         164.00
## 5 Primates     Cebidae  Callimico goeldii          558.00         153.99
## 6 Primates     Cebidae Callithrix jacchus          290.21         144.00
##   HomeRange_km2 MaxLongevity_m SocialGroupSize SocialStatus
## 1          2.28          336.0           14.50            2
## 2          0.73          327.6           42.00            2
## 3          1.36          453.6           20.00            2
## 4          0.02          303.6            2.95            2
## 5          0.32          214.8            6.85            2
## 6          0.02          201.6            8.55            2

This gives you the first few rows of data along with the column headings.

names(primatedata)

## [1] "Order"           "Family"          "Binomial"        "AdultBodyMass_g"
## [5] "GestationLen_d"  "HomeRange_km2"   "MaxLongevity_m"  "SocialGroupSize"
## [9] "SocialStatus"

This gives you the names of the columns.

primatedata

This will print out all of the data!

This should be everything you need to know to get the practicals that follow working. Let me know if you have any problems (natalie.cooper@nhm.ac.uk).