Executed using command line, or a graphical user interface (GUI)
On this course, we use the RStudio GUI (www.rstudio.com)
rstudio
To launch RStudio, find the RStudio icon and click
RStudio screenshot
The traditional way to enter R commands is via the Terminal, or using the console in RStudio (bottom-left)
However, for this course we will use a relatively new feature called R-notebooks.
An R-notebook mixes plain text with R code
The R code can be run from inside the document and the results are displayed directly underneath
Each chunk of R code looks something like this.
Each line of R can be executed by clicking on the line and pressing CTRL and ENTER
Or you can press the green triangle on the right-hand side to run everything in the chunk
Try this now!
print("Hello World")
The R notebook can be rendered into a format such as PDF or HTML so they can be shared with your collaborators
On the course website you will see compiled versions of each session
Basic concepts in R - simple arithmetic
The command line can be used as a calculator and understands the usual arithmetic operators +, -, *, /
Try adding a few more calculations here
2 + 2
2 - 2
4 * 3
10 / 2
Note: The number in the square brackets is an indicator of the position in the output. In this case the output is a ‘vector’ of length 1 (i.e. a single number). More on vectors coming up…
In the case of expressions involving multiple operations, R respects the BODMAS system to decide the order in which operations should be performed.
2 + 2 *3
2 + (2 * 3)
(2 + 2) * 3
R is capable of more complicated arithmetic such as trigonometry and logarithms; like you would find on a fancy scientific calculator. Of course, R also has a plethora of statistical operations as we will see.
pi
sin (pi/2)
cos(pi)
tan(2)
log(1)
We can only go so far with performing simple calculations like this. Eventually we will need to store our results for later use. For this, we need to make use of variables.
Basic concepts in R - variables
A variable is a letter or word which takes (or contains) a value. We use the assignment operator: <-
x <- 10
x
myNumber <- 25
myNumber
We can perform arithmetic on variables:
sqrt(myNumber)
We can add variables together:
x + myNumber
We can change the value of an existing variable:
x <- 21
x
We can set one variable to equal the value of another variable:
x <- myNumber
x
We can modify the contents of a variable:
myNumber <- myNumber + sqrt(16)
myNumber
When we are feeling lazy we might give our variables short names (x, y, i…etc), but a better practice would be to give them meaningful names. There are some restrictions on creating variable names. They cannot start with a number or contain characters such as ., _, ‘-’. Naming variables the same as in-built functions in R, such as c, T, mean should also be avoided.
Naming variables is a matter of taste. Some conventions exist such as a separating words with - or using CamelCaps. Whatever convention you decided, stick with it!
Basic concepts in R - functions
Functions in R perform operations on arguments (the inputs(s) to the function). We have already used:
sin(x)
This returns the sine of x
In this case the function has one argument: x.
Arguments are always contained in parentheses – curved brackets, () – separated by commas.
Arguments can be named or unnamed, but if they are unnamed they must be ordered (we will see later how to find the right order). The names of the arguments are determined by the author of the function and can be found in the help page for the function. When testing code, it is easier and safer to name the arguments.
seq is a function for generating a numeric sequence from and to particular numbers.
Type ?seq to get the help page for this function.
When testing code, it is easier and safer to name the arguments
seq(from = 2, to = 20, by = 4)
seq(2, 20, 4)
Arguments can have default values, meaning we do not need to specify values for these in order to run the function.
rnorm is a function that will generate a series of values from a normal distribution. In order to use the function, we need to tell R how many values we want
rnorm(n=10)
The normal distribution is defined by a mean (average) and standard deviation (spread). However, in the above example we didn’t tell R what mean and standard deviation we wanted. So how does R know what to do? All arguments to a function and their default values are listed in the help page
(N.B sometimes help pages can describe more than one function)
?rnorm
In this case, we see that the defaults for mean and standard deviation are 0 and 1. We can change the function to generate values from a distribution with a different mean and standard deviation using the mean and sdarguments. It is important that we get the spelling of these arguments exactly right, otherwise R will an error message, or (worse?) do something unexpected.
rnorm(n=10, mean=2,sd=3)
rnorm(10, 2, 3)
In the examples above, seq and rnorm were both outputting a series of numbers, which is called a vector in R and is the most-fundamental data-type.
Basic concepts in R - vectors
The basic data structure in R is a vector – an ordered collection of values.
R treats even single values as 1-element vectors.
The function ccombines its arguments into a vector:
x <- c(3,4,5,6)
x
The square brackets [] indicate the position within the vector (the index).
We can extract individual elements by using the [] notation:
x[1]
x[4]
We can even put a vector inside the square brackets (vector indexing):
Before executing this line of code, what do you think it will produce?
y <- c(2,3)
x[y]
There are a number of shortcuts to create a vector.
Instead of:
x <- c(3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
x
we can write:
x <- 3:12
x
or we can use the seq() function, which returns a vector:
x <- seq(2, 20, 4)
x
x <- seq(2, 20, length.out=5)
x
or we can use the rep() function:
y <- rep(3, 5)
y
y <- rep(1:3, 5)
y
We have seen some ways of extracting elements of a vector. We can use these shortcuts to make things easier (or more complex!)
x <- 3:12
# Extract elements from x:
x[3:7]
x[seq(2, 6, 2)]
x[rep(3, 2)]
We can add an element to a vector:
y <- c(x, 1)
y
We can glue vectors together:
z <- c(x, y)
z
We can “remove” element(s) from a vector:
NOTE: the vector x doesn’t get modified
we’re just displaying what the vector looks like without particular elements
x <- 3:12
x[-3]
x[-(5:7)]
x[-seq(2, 6, 2)]
x
Finally, we can modify the contents of a vector:
x[6] <- 4
x
x[3:5] <- 1
x
Remember!
Square brackets [ ] for indexing
Parentheses () for function arguments
Basic concepts in R - vector arithmetic
When applying all standard arithmetic operations to vectors, application is element-wise
x <- 1:10
y <- x*2
y
z <- x^2
z
Adding two vectors:
y + z
If vectors are not the same length, the shorter one will be recycled:
x + 1:2
But be careful if the vector lengths aren’t factors of each other:
x + 1:3
Sometimes R will give a warning message. It has performed the calculation you asked it to, but the results may be unexpected. You need to check the output carefully to make sure it is what you really wanted.
Basic concepts in R - Character vectors and naming
All the vectors we have seen so far have contained numbers, but we can also store text (/“strings”) in vector
We can also use the names() function to get a vector of the names of an object:
names(gene.expression)
Exercise: Body-Mass Index
Let’s try some vector arithmetic. Here are the weights and heights of five individuals
Person
Weight (kg)
Height (cm)
Jo
65.8
192
Sam
67.9
179
Charlie
75.3
169
Frankie
61.9
175
Alex
92.4
171
Create weight and height vectors to hold the data in each column using the c function. Create a person vector and use this vector to name the values in the other two vectors.
The body-mass index is given by the formula:- \(BMI = (Weight)/(Height^2)\); where Height is given in metres
Create a new vector to record this, called bmi.
Create a new vector bmi.sorted where the bmi values are put in increasing numeric order (HINT: look up the help on the sort function)
The interquartile range (IQR) of a vector is defined as the 75% percentile of the data minus the 25% percentile. Calculate the IQR for our bmi values
check your answer using the IQR function
### YOUR ANSWER HERE (please) ###
Getting help
This is possibly the most important slide in the whole course!?!
To get help on any R function, type ? followed by the function name. For example:
?seq
This retrieves the syntax and arguments for the function. The help page shows the default order of arguments. It also tells you which package it belongs to.
There is typically a usage example, which you can test using the example function:
example(seq)
If you can’t remember the exact name, type ?? followed by your guess. R will return a list of possibilities:
??mean
The Packages tab in the lower-right panel of RStudio will help you locate the help pages for a particular package and its functions
Often there will be a user-guide or ‘vignette’ too
R packages
R comes ready loaded with various libraries of functions called packages. For example: the function sum() is in the base package and sd(), which calculates the standard deviation of a vector, is in the stats package
There are 1000s of additional packages provided by third parties, and the packages can be found in numerous server locations on the web called repositories
The two repositories you will come across the most are: