Course Introduction

Mark Dunning

28/03/2016

Welcome!

About us

us

Admin

About the Course

Further disclaimer

fisher

To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.”. R.A. Fisher, 1938

If you haven’t designed your experiment properly, then all the Bioinformatics we teach you won’t help: Consult with your local statistician - preferably not the day before your grant is due!!!!

Course Outline

Day 1

Day 2

Day 3

Day 4

Historical context

Cast your minds back a few years..

array-summary

Plenty of success stories with microarrays

array-achievements

What did we learn from arrays?

Reproducibility is key

duke-scandal

Two Biostatiscians (later termed ‘Forensic Bioinformaticians’) from M.D. Anderson used R extensively during their re-analysis and investigation of a Clinical Prognostication paper from Duke. The subsequent scandal put Reproducible Research on the map.

Keith Baggerly’s talk from Cambridge in 2010 is highy-recommended.

Advantages of R

NYT

The R programming language is now recognised beyond the academic community as an effect solution for data analysis and visualisation. Notable users of R include Facebook, google, Microsoft (who recently invested in a commerical provider of R), and the New York Times.

Key features

Crash-course in R

Support for R

RStudio

RStudio

R recap

R can do simple numerical calculations

2  + 2
## [1] 4
sqrt(25)
## [1] 5

Here, sqrt is a function and the number 25 was used as an argument to the function. Functions can have multiple arguments

Variables

We can save the result of a computation as a variable using the assignment operator <-

x <- sqrt(25)
x + 5
## [1] 10
y <- x +5
y
## [1] 10

Vectors

A vector can be used to combine multiple values. The resulting object is indexed and particular values can be queried using the [] operator

vec <- c(1,2,3,6)
vec[1]
## [1] 1

Vectors

Calculations can be performed on vectors

vec*2
## [1]  2  4  6 12
mean(vec)
## [1] 3
sum(vec)
## [1] 12

Data frames

These can be used to represent familiar tabular (row and column) data

df <- data.frame(A = c(1,2,3,6), B = c(7,8,10,12))
df
##   A  B
## 1 1  7
## 2 2  8
## 3 3 10
## 4 6 12

Data frames

Don’t need the same data type in each column

df <- data.frame(A = c(1,2,3,6), 
                 B = month.name[c(7,8,10,12)])
df
##   A        B
## 1 1     July
## 2 2   August
## 3 3  October
## 4 6 December

Data frames

We can subset data frames using the [], but can specify row and column indices

df[1,2]
## [1] July
## Levels: August December July October
df[2,1]
## [1] 2

Data frames

df[1,]
##   A    B
## 1 1 July
df[,2]
## [1] July     August   October  December
## Levels: August December July October

Or leave the row or column index blank to get all rows and columns respectively

Plotting

All your favourite types of plot can be created in R

Plotting

The Bioconductor project

BioC

The Bioconductor project

Many of the packages are by well-respected authors and get lots of citations.

citations

Downloading a package

Each package has its own landing page. e.g. http://bioconductor.org/packages/release/bioc/html/beadarray.html. Here you’ll find;

Introducing the practical