R screenshot
New York Times, Jan 2009
https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919
http://www.revolutionanalytics.com/companies-using-r
Sidney Harris - New York Times
New York Times, July 2011
According to recent editorials, the reproducibility crisis is still on-going
Nature, May 2016
rstudio
To launch RStudio, find the RStudio icon and click
RStudio screenshot
print("Hello World")
2 + 2
2 - 2
4 * 3
10 / 2
Note: The number in the square brackets is an indicator of the position in the output. In this case the output is a ‘vector’ of length 1 (i.e. a single number). More on vectors coming up…
In the case of expressions involving multiple operations, R respects the BODMAS system to decide the order in which operations should be performed.
2 + 2 *3
2 + (2 * 3)
(2 + 2) * 3
R is capable of more complicated arithmetic such as trigonometry and logarithms; like you would find on a fancy scientific calculator. Of course, R also has a plethora of statistical operations as we will see.
pi
sin (pi/2)
cos(pi)
tan(2)
log(1)
We can only go so far with performing simple calculations like this. Eventually we will need to store our results for later use. For this, we need to make use of variables.
<-
x <- 10
x
myNumber <- 25
myNumber
sqrt(myNumber)
x + myNumber
x <- 21
x
x <- myNumber
x
myNumber <- myNumber + sqrt(16)
myNumber
When we are feeling lazy we might give our variables short names (x
, y
, i
…etc), but a better practice would be to give them meaningful names. There are some restrictions on creating variable names. They cannot start with a number or contain characters such as .
, _
, ‘-’. Naming variables the same as in-built functions in R, such as c
, T
, mean
should also be avoided.
Naming variables is a matter of taste. Some conventions exist such as a separating words with -
or using CamelCaps. Whatever convention you decided, stick with it!
sin(x)
Arguments can be named or unnamed, but if they are unnamed they must be ordered (we will see later how to find the right order). The names of the arguments are determined by the author of the function and can be found in the help page for the function. When testing code, it is easier and safer to name the arguments.
seq
is a function for generating a numeric sequence from and to particular numbers.
?seq
to get the help page for this function.seq(from = 2, to = 20, by = 4)
seq(2, 20, 4)
Arguments can have default values, meaning we do not need to specify values for these in order to run the function.
rnorm
is a function that will generate a series of values from a normal distribution. In order to use the function, we need to tell R how many values we want
rnorm(n=10)
The normal distribution is defined by a mean (average) and standard deviation (spread). However, in the above example we didn’t tell R what mean and standard deviation we wanted. So how does R know what to do? All arguments to a function and their default values are listed in the help page
(N.B sometimes help pages can describe more than one function)
?rnorm
In this case, we see that the defaults for mean and standard deviation are 0 and 1. We can change the function to generate values from a distribution with a different mean and standard deviation using the mean
and sd
arguments. It is important that we get the spelling of these arguments exactly right, otherwise R will an error message, or (worse?) do something unexpected.
rnorm(n=10, mean=2,sd=3)
rnorm(10, 2, 3)
In the examples above, seq
and rnorm
were both outputting a series of numbers, which is called a vector in R and is the most-fundamental data-type.
c
combines its arguments into a vector:x <- c(3,4,5,6)
x
[]
indicate the position within the vector (the index).[]
notation:x[1]
x[4]
y <- c(2,3)
x[y]
x <- c(3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
x
x <- 3:12
x
seq()
function, which returns a vector:x <- seq(2, 20, 4)
x
x <- seq(2, 20, length.out=5)
x
rep()
function:y <- rep(3, 5)
y
y <- rep(1:3, 5)
y
x <- 3:12
# Extract elements from x:
x[3:7]
x[seq(2, 6, 2)]
x[rep(3, 2)]
y <- c(x, 1)
y
z <- c(x, y)
z
x <- 3:12
x[-3]
x[-(5:7)]
x[-seq(2, 6, 2)]
x
x[6] <- 4
x
x[3:5] <- 1
x
Remember!
x <- 1:10
y <- x*2
y
z <- x^2
z
y + z
x + 1:2
x + 1:3
gene.names <- c("Pax6", "Beta-actin", "FoxP2", "Hox9")
gene.names
names()
function, which can be useful to keep track of the meaning of our data:gene.expression <- c(0, 3.2, 1.2, -2)
names(gene.expression) <- gene.names
gene.expression
names()
function to get a vector of the names of an object:names(gene.expression)
Person | Weight (kg) | Height (cm) |
---|---|---|
Jo | 65.8 | 192 |
Sam | 67.9 | 179 |
Charlie | 75.3 | 169 |
Frankie | 61.9 | 175 |
Alex | 92.4 | 171 |
c
function. Create a person vector and use this vector to name the values in the other two vectors.bmi
.bmi.sorted
where the bmi values are put in increasing numeric order (HINT: look up the help on the sort
function)IQR
function### YOUR ANSWER HERE (please) ###
?
followed by the function name. For example:?seq
example
function:example(seq)
??
followed by your guess. R will return a list of possibilities:??mean
sum()
is in the base package and sd()
, which calculates the standard deviation of a vector, is in the stats
packageCRAN packages can be installed using install.packages()
install.packages(name.of.my.package)
source("http://bioconductor.org/biocLite.R")
biocLite()
function:biocLite("PackageName")
install.packages()
function to install it:install.packages("ggplot2")
DESeq2
is a Bioconductor package (http://www.bioconductor.org) for the analysis of RNA-seq data:source("http://www.bioconductor.org/biocLite.R")
biocLite("DESeq2")
library(...)
function to load the newly installed features:library(ggplot2) # loads ggplot functions
library(DESeq2) # loads DESeq functions
library() # Lists all the packages
# you've got installed