22 February 2016

Data organisation.

URL for the lesson files.

Follow the link below.

bit.ly/1oVvND1

View raw

Extract the zip file

R data wrangling

Benefits

  • Needed for large datasets too big for excel e.g Sequencing data
  • Can automate the quality control procedure.
  • Reproducible.

Downsides

  • Pre-formatting required
  • Excel sometimes better for simple wrangling

You will learn more about these possibilities this afternoon!

Importing data into R.

  # read in the data file
  setwd("~/Biochem_R_training/data/")
  dat = read.csv("survey_data_spreadsheet_messy_fixed.csv",header=T)
  # Summarise some columns
  summary(dat$Sex)
##     F  M 
##  6 48 38
  summary(dat$Weight)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    7.00   32.50   42.00   65.54  113.50  232.00      12

Visualise your data

  # Hmmm let's see what plot does with weight, not that useful.
  plot(dat$Weight)

Visualise your data

  # A histogram is what we want
  hist(dat$Weight, breaks=10)

Plotting variables against each other

  # Let's see the weight by species.
  # you can clearly see that DS is much heavier on average.
  plot(dat$Weight ~ dat$Species)

Plotting variables against each other

  # What about the relationship between plot and weight
  plot(dat$Weight ~ dat$Plot)

Box and whisker

  # We can make R give us what we want.
  boxplot(dat$Weight ~ dat$Plot)