Importing the data

The data from this experiment comprises nine paired tumor/normal colon tissues on Illumina HT12_v3 gene expression Beadchips. These data were generated to inform a comparison of technologies for microRNA profiling. However, we will only use the mRNA data here.

library(GEOquery)
library(limma)
url <- "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE33126/GSE33126_series_matrix.txt.gz"
filenm <- "data/GSE33126_series_matrix.txt.gz"
if(!file.exists(filenm)) download.file(url, destfile=filenm)
gse <- getGEO(filename=filenm)

## File stored at:

## /tmp/RtmpyCVGTI/GPL6947.soft

## Warning in read.table(file = file, header = header, sep = sep, quote =
## quote, : not all columns named in 'colClasses' exist

gse

## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 48803 features, 18 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM820516 GSM820517 ... GSM820533 (18 total)
##   varLabels: title geo_accession ... data_row_count (31 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: ILMN_1343291 ILMN_1343295 ... ILMN_2416019 (48803
##     total)
##   fvarLabels: ID nuID ... GB_ACC (30 total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL6947

head(exprs(gse))

##               GSM820516  GSM820517  GSM820518  GSM820519  GSM820520
## ILMN_1343291 66665.3800 69404.6700 64128.0700 68943.9700 67827.2200
## ILMN_1343295 22040.1100 13046.3400 38678.9600 16641.8900 33719.8900
## ILMN_1651199   226.6081   205.4483   217.2475   229.0451   226.3029
## ILMN_1651209   278.5710   253.7044   211.8002   278.0423   259.8059
## ILMN_1651210   195.4914   195.9835   175.3356   193.9065   229.5674
## ILMN_1651221   217.2764   206.0723   219.5992   205.0462   194.7481
##               GSM820521  GSM820522  GSM820523  GSM820524  GSM820525
## ILMN_1343291 71775.3000 62245.5900 69713.7000 69509.2700 68244.5900
## ILMN_1343295 18933.2900 26170.0400  9906.9130 17166.5200 12428.9500
## ILMN_1651199   203.8710   213.4431   210.4129   229.5394   212.7384
## ILMN_1651209   265.1900   321.2587   273.4458   253.6032   310.1582
## ILMN_1651210   164.0632   244.6696   190.9813   188.1039   199.3084
## ILMN_1651221   233.9729   233.2414   222.1413   244.1758   209.8247
##               GSM820526  GSM820527  GSM820528  GSM820529  GSM820530
## ILMN_1343291 65427.4700 68436.5200 57608.6700 69959.7700 69509.2700
## ILMN_1343295 25297.5700 17535.6000 19749.1400 17854.2900 43670.6800
## ILMN_1651199   226.1345   232.2437   208.7316   229.2948   214.4033
## ILMN_1651209   275.0126   274.9519   250.6420   255.8540   219.5752
## ILMN_1651210   220.6229   213.3975   194.7746   173.7073   185.3380
## ILMN_1651221   227.8108   255.9940   233.5495   194.4097   218.6604
##               GSM820531  GSM820532  GSM820533
## ILMN_1343291 70063.7700 69647.1700 70332.3400
## ILMN_1343295 22849.0800 23725.6600 28747.0100
## ILMN_1651199   216.6758   195.6539   252.1502
## ILMN_1651209   292.4965   253.3126   237.9844
## ILMN_1651210   174.6898   195.3534   191.9382
## ILMN_1651221   271.4561   222.7748   241.0521

Q Do the data look to be normalised?

exprs(gse) <- log2(exprs(gse))
boxplot(exprs(gse),outline=FALSE)

Inspect the clinical variables

Q What column has the information about whether is sample is a tumour or normal?

Q How many do we have in each group?

Q Create a design matrix for the experiment, on paper if you like

Simple t-tests

For completeness, we note that t-statistics and corresponding p-values can be generated quickly using the genefilter package.

pd <- pData(gse)
SampleGroup <- pd$source_name_ch1
library(genefilter)
destats <- rowttests(exprs(gse),fac=SampleGroup)
head(destats)

##               statistic          dm     p.value
## ILMN_1343291  2.6776390  0.08354028 0.016510459
## ILMN_1343295 -3.0922257 -0.66722098 0.006992817
## ILMN_1651199  0.5595122  0.02324258 0.583560512
## ILMN_1651209  1.0935577  0.07758592 0.290340309
## ILMN_1651210 -1.5758446 -0.10374687 0.134625663
## ILMN_1651221  0.2322365  0.01441968 0.819298727

Can we remember the drawbacks of this approach?

A useful function is the model.matrix, which understands the linear model syntax of R (i.e. using the tilde ~ symbol)

design <- model.matrix(~0+SampleGroup)
colnames(design) <- c("Normal","Tumour")

By-hand, we would do;

design <- matrix(nrow=18,ncol=2)
design[,1] <- c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1)
design[,2] <- c(1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0)
colnames(design) <- c("Normal","Tumour")

or a shortcut in this would use the rep function.

design[,1] <- rep(c(0,1),by=9)
design[,2] <- rep(c(1,0),by=9)

Prior to the differential expression analysis, we will filter the data so only the top 50% most-variable genes get analysed, which has been shown to increase our power to detect differential expression. We can do this using the varFilter function in the genefilter package.

library(genefilter)
gse.expFilt <- varFilter(gse)
gse.expFilt

## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 24401 features, 18 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM820516 GSM820517 ... GSM820533 (18 total)
##   varLabels: title geo_accession ... data_row_count (31 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: ILMN_1343295 ILMN_1651228 ... ILMN_2416019 (24401
##     total)
##   fvarLabels: ID nuID ... GB_ACC (30 total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL6947

The lmFit funcion is used to fit the model to the data. The output has various components, most of which you probably don’t need to know about.

fit <- lmFit(exprs(gse.expFilt), design)
names(fit)

##  [1] "coefficients"     "rank"             "assign"          
##  [4] "qr"               "df.residual"      "sigma"           
##  [7] "cov.coefficients" "stdev.unscaled"   "pivot"           
## [10] "Amean"            "method"           "design"

dim(fit$coefficients)

## [1] 24401     2

head(fit$coefficients)

##                 Normal    Tumour
## ILMN_1343295 14.032346 14.699567
## ILMN_1651228 13.984486 14.329732
## ILMN_1651229  9.661445  9.528996
## ILMN_1651232  8.089572  8.115662
## ILMN_1651237  9.214117 10.153798
## ILMN_1651238  7.710813  7.545338

Q. What is the interpretation of the `coeff` item in the output

Now we define the contrast

contrasts <- makeContrasts(Tumour - Normal, levels=design)
fit2 <- contrasts.fit(fit, contrasts)
head(fit2$coeff)

##               Contrasts
##                Tumour - Normal
##   ILMN_1343295      0.66722098
##   ILMN_1651228      0.34524651
##   ILMN_1651229     -0.13244960
##   ILMN_1651232      0.02609049
##   ILMN_1651237      0.93968052
##   ILMN_1651238     -0.16547496

Finally, apply the empirical Bayes’ step

fit2 <- eBayes(fit2)
fit2

## An object of class "MArrayLM"
## $coefficients
## ILMN_1343295 ILMN_1651228 ILMN_1651229 ILMN_1651232 ILMN_1651237 
##   0.66722098   0.34524651  -0.13244960   0.02609049   0.93968052 
## 24396 more rows ...
## 
## $rank
## [1] 2
## 
## $assign
## NULL
## 
## $qr
## $qr
##          Normal     Tumour
## [1,] -3.0000000  0.0000000
## [2,]  0.3333333  3.0000000
## [3,]  0.0000000 -0.3333333
## [4,]  0.3333333  0.1111111
## [5,]  0.0000000 -0.3333333
## 13 more rows ...
## 
## $qraux
## [1] 1.000000 1.111111
## 
## $pivot
## [1] 1 2
## 
## $tol
## [1] 1e-07
## 
## $rank
## [1] 2
## 
## 
## $df.residual
## [1] 16 16 16 16 16
## 24396 more elements ...
## 
## $sigma
## ILMN_1343295 ILMN_1651228 ILMN_1651229 ILMN_1651232 ILMN_1651237 
##    0.4577251    0.2678995    0.4035902    0.1808880    0.7133376 
## 24396 more elements ...
## 
## $cov.coefficients
##                  Contrasts
## Contrasts         Tumour - Normal
##   Tumour - Normal       0.2222222
## 
## $stdev.unscaled
## ILMN_1343295 ILMN_1651228 ILMN_1651229 ILMN_1651232 ILMN_1651237 
##    0.4714045    0.4714045    0.4714045    0.4714045    0.4714045 
## 24396 more rows ...
## 
## $Amean
## ILMN_1343295 ILMN_1651228 ILMN_1651229 ILMN_1651232 ILMN_1651237 
##    14.365957    14.157109     9.595220     8.102617     9.683957 
## 24396 more elements ...
## 
## $method
## [1] "ls"
## 
## $design
##      Normal Tumour
## [1,]      0      1
## [2,]      1      0
## [3,]      0      1
## [4,]      1      0
## [5,]      0      1
## 13 more rows ...
## 
## $contrasts
##         Contrasts
## Levels   Tumour - Normal
##   Normal              -1
##   Tumour               1
## 
## $df.prior
## [1] 2.839956
## 
## $s2.prior
## [1] 0.077334
## 
## $var.prior
## [1] 11.67797
## 
## $proportion
## [1] 0.01
## 
## $s2.post
## ILMN_1343295 ILMN_1651228 ILMN_1651229 ILMN_1651232 ILMN_1651237 
##   0.18958760   0.07260886   0.14998900   0.03944557   0.44380322 
## 24396 more elements ...
## 
## $t
## ILMN_1343295 ILMN_1651228 ILMN_1651229 ILMN_1651232 ILMN_1651237 
##    3.2506558    2.7179456   -0.7254829    0.2786695    2.9922044 
## 24396 more rows ...
## 
## $df.total
## [1] 18.83996 18.83996 18.83996 18.83996 18.83996
## 24396 more elements ...
## 
## $p.value
## ILMN_1343295 ILMN_1651228 ILMN_1651229 ILMN_1651232 ILMN_1651237 
##  0.004242497  0.013718934  0.477071595  0.783534214  0.007539433 
## 24396 more rows ...
## 
## $lods
## ILMN_1343295 ILMN_1651228 ILMN_1651229 ILMN_1651232 ILMN_1651237 
##    -2.271990    -3.376114    -6.317279    -6.545394    -2.816065 
## 24396 more rows ...
## 
## $F
## [1] 10.56676330  7.38722816  0.52632548  0.07765668  8.95328702
## 24396 more elements ...
## 
## $F.p.value
## [1] 0.004242497 0.013718934 0.477071595 0.783534214 0.007539433
## 24396 more elements ...

We usually get our first look at the results by using the topTable command

topTable(fit2)

##                  logFC   AveExpr         t      P.Value    adj.P.Val
## ILMN_1704294  3.269052  9.833508  20.44807 2.544269e-14 6.208271e-10
## ILMN_1813704  4.749574 10.782232  15.96861 2.102340e-12 2.564960e-08
## ILMN_1724686  3.627208 10.814662  15.16574 5.197074e-12 4.227127e-08
## ILMN_1811598 -2.380786  9.150986 -13.41658 4.338822e-11 2.646790e-07
## ILMN_1763749 -5.039637 11.894505 -12.42644 1.599366e-10 7.805228e-07
## ILMN_1652431 -5.998422 11.580616 -11.94769 3.096877e-10 1.259448e-06
## ILMN_1659990  1.259224  8.898440  11.72716 4.228175e-10 1.473881e-06
## ILMN_1702858 -1.511602  9.233874 -11.12718 1.009704e-09 3.079722e-06
## ILMN_1663866  2.038634 14.040020  10.82858 1.577919e-09 4.278090e-06
## ILMN_1657435 -3.354608 10.528543 -10.63195 2.127825e-09 5.111229e-06
##                     B
## ILMN_1704294 21.16212
## ILMN_1813704 17.73053
## ILMN_1724686 16.97912
## ILMN_1811598 15.16284
## ILMN_1763749 14.01219
## ILMN_1652431 13.42051
## ILMN_1659990 13.13976
## ILMN_1702858 12.34868
## ILMN_1663866 11.93958
## ILMN_1657435 11.66437

The rows in this table are ordered according the B-statistic, or log-odds from the empirical Bayes’ analysis. You should be cautious when interpreting these values. Although a B-statistic of 0 corresponds to a 50/50 chance of a gene being DE, this is contingent on a certain percentage of genes in the experiment being DE. In experiments such as the one we are looked at here, there are a large number of DE genes. In this case genes with B-statistic < 0 could still be DE. Thus, you should use the B-statistic as a ranking criteria and not attempt to compare the values between experiments.

Including annotation

anno <- fData(gse.expFilt)
head(anno)[,1:5]

##                        ID               nuID      Species Source
## ILMN_1343295 ILMN_1343295 9fQSYRUddRf4Z6p6T4 Homo sapiens RefSeq
## ILMN_1651228 ILMN_1651228 fZRGweI51.AwJv7v0I Homo sapiens RefSeq
## ILMN_1651229 ILMN_1651229 9EIpq4KI64KL0R57lo Homo sapiens RefSeq
## ILMN_1651232 ILMN_1651232 cqKdrRDdYiRJTTd0aw Homo sapiens RefSeq
## ILMN_1651237 ILMN_1651237 BiXUgqHpd5URfh9LP0 Homo sapiens RefSeq
## ILMN_1651238 ILMN_1651238 WKQZZ4QIE_WAr06inU Homo sapiens RefSeq
##              Search_Key
## ILMN_1343295      GAPDH
## ILMN_1651228   ILMN_992
## ILMN_1651229 ILMN_10123
## ILMN_1651232 ILMN_34533
## ILMN_1651237 ILMN_18895
## ILMN_1651238 ILMN_18299

anno <- anno[,c("Symbol","Entrez_Gene_ID","Chromosome","Cytoband")]
fit2$genes <- anno
topTable(fit2)

##                Symbol Entrez_Gene_ID Chromosome       Cytoband     logFC
## ILMN_1704294     CDH3           1001         16       16q22.1c  3.269052
## ILMN_1813704 KIAA1199          57214         15       15q25.1b  4.749574
## ILMN_1724686    CLDN1           9076          3          3q28c  3.627208
## ILMN_1811598    ADH1B            125          4          4q23b -2.380786
## ILMN_1763749   GUCA2A           2980          1        1p34.2b -5.039637
## ILMN_1652431      CA1            759          8        8q21.2b -5.998422
## ILMN_1659990     HIG2          29923          7        7q32.1a  1.259224
## ILMN_1702858   ADHFE1         137872          8        8q13.1b -1.511602
## ILMN_1663866    TGFBI           7045          5 5q31.1f-q31.2a  2.038634
## ILMN_1657435     MT1M           4499         16         16q13b -3.354608
##                AveExpr         t      P.Value    adj.P.Val        B
## ILMN_1704294  9.833508  20.44807 2.544269e-14 6.208271e-10 21.16212
## ILMN_1813704 10.782232  15.96861 2.102340e-12 2.564960e-08 17.73053
## ILMN_1724686 10.814662  15.16574 5.197074e-12 4.227127e-08 16.97912
## ILMN_1811598  9.150986 -13.41658 4.338822e-11 2.646790e-07 15.16284
## ILMN_1763749 11.894505 -12.42644 1.599366e-10 7.805228e-07 14.01219
## ILMN_1652431 11.580616 -11.94769 3.096877e-10 1.259448e-06 13.42051
## ILMN_1659990  8.898440  11.72716 4.228175e-10 1.473881e-06 13.13976
## ILMN_1702858  9.233874 -11.12718 1.009704e-09 3.079722e-06 12.34868
## ILMN_1663866 14.040020  10.82858 1.577919e-09 4.278090e-06 11.93958
## ILMN_1657435 10.528543 -10.63195 2.127825e-09 5.111229e-06 11.66437

Diagnostic checks

At this point it is useful to take stock and do some diagnostic checks

dotchart(exprs(gse)["ILMN_1704294",])

boxplot(exprs(gse)["ILMN_1704294",]~SampleGroup)

To find how many genes are DE we can use the decideTests function, which also applies multiple-test correction.

Make a note of how many genes are DE, you will need this later

decideTests(fit2)

## TestResults matrix
## ILMN_1343295 ILMN_1651228 ILMN_1651229 ILMN_1651232 ILMN_1651237 
##            1            0            0            0            0 
## 24396 more rows ...

table(decideTests(fit2))

## 
##    -1     0     1 
##  1438 21057  1906

sum(abs(decideTests(fit2))==1)

## [1] 3344

The “Volcano Plot” function is a common way of visualising the results of a DE analysis. The $x$ axis shows the log-fold change and the $y$ axis is some measure of statistical significance, which in this case is the log-odds, or “B” statistic. A characteristic “volcano” shape should be seen.

volcanoplot(fit2)

The function also has options to highlight the postions of the top N genes from the test.

volcanoplot(fit2,highlight=10,names = fit2$genes$"Symbol")

Exporting the results

write.fit(fit2, file = "de-results.txt",adjust="BH")

Getting the results for a particular gene

The topTable function by default only prints the top 10 DE genes. However, by changing the arguments we can make it print the statistics for all the genes that we test.

testResults <- topTable(fit2, number=nrow(fit2))

This gives us the opportunity to look-up the results for particular genes of interest. Lets say we’re interested in “OCIAD2” (for no particular reason…)

testResults[which(testResults$Symbol == "OCIAD2"),]

##              Symbol Entrez_Gene_ID Chromosome Cytoband     logFC  AveExpr
## ILMN_1700306 OCIAD2         132299          4    4p12a 0.7480189 13.40736
## ILMN_1772286 OCIAD2         132299          4    4p12a 0.7877357 10.92200
##                     t      P.Value   adj.P.Val          B
## ILMN_1700306 4.190794 0.0005035995 0.009607765 -0.2227105
## ILMN_1772286 3.538375 0.0022183900 0.024230499 -1.6530080

If we have a list of genes, the %in% function can be used.

mylist <- c("LOC441066","ARF3","FMNL3","CSF1R","XLKD1","TTRAP","DMWD","SYNPO2L","PILRB","LAMP3")

testResults[which(testResults$Symbol %in% mylist),]

##                 Symbol Entrez_Gene_ID Chromosome  Cytoband       logFC
## ILMN_2210753     XLKD1          10894         11  11p15.4a -1.45372426
## ILMN_1785821     TTRAP          51567          6   6p22.2b -0.43214642
## ILMN_1768754     PILRB          29990          7   7q22.1c -0.37606793
## ILMN_1686623     CSF1R           1436          5   5q33.1c -0.39403423
## ILMN_2183216     TTRAP          51567          6   6p22.2b -0.34506377
## ILMN_1657955     FMNL3          91010         12 12q13.12c  0.10533704
## ILMN_1685534     PILRB          29990          7   7q22.1c  0.07702480
## ILMN_1690253   SYNPO2L          79933         10  10q22.2a  0.06322535
## ILMN_1807712     PILRB          29990          7   7q22.1c  0.08345081
## ILMN_1682938      ARF3            377         12 12q13.12a -0.07912146
## ILMN_1714352      DMWD           1762         19 19q13.32a  0.11488061
## ILMN_2170813     LAMP3          27074          3   3q27.1a  0.04895846
## ILMN_1714808 LOC441066         441066          5   5q21.1b  0.04097845
## ILMN_1723984     PILRB          29990          7   7q22.1c  0.04075150
## ILMN_2395214     FMNL3          91010         12 12q13.12c  0.02334380
## ILMN_2170814     LAMP3          27074          3   3q27.1a -0.02860983
##                AveExpr           t      P.Value    adj.P.Val         B
## ILMN_2210753  8.469606 -6.23603683 5.672890e-06 0.0006207362  4.147419
## ILMN_1785821  8.700032 -1.88536236 7.489913e-02 0.2298027154 -4.905780
## ILMN_1768754 11.081650 -1.35504089 1.914305e-01 0.4149090657 -5.680934
## ILMN_1686623 11.593908 -1.25905686 2.233870e-01 0.4549592337 -5.799590
## ILMN_2183216  9.003461 -1.21319562 2.400473e-01 0.4741293394 -5.853696
## ILMN_1657955  8.010168  1.02756048 3.171695e-01 0.5584683031 -6.054871
## ILMN_1685534  8.116290  0.76377698 4.544597e-01 0.6807561543 -6.288670
## ILMN_1690253  7.719182  0.74547643 4.651835e-01 0.6894119309 -6.302515
## ILMN_1807712  8.638967  0.55453559 5.857359e-01 0.7787153453 -6.427850
## ILMN_1682938 11.506627 -0.48960363 6.300665e-01 0.8077682634 -6.462371
## ILMN_1714352  9.153734  0.43380919 6.693539e-01 0.8312757025 -6.488688
## ILMN_2170813  8.521685  0.33382435 7.421991e-01 0.8744761426 -6.528028
## ILMN_1714808  8.273997  0.25762280 7.994910e-01 0.9070871664 -6.551204
## ILMN_1723984  7.887657  0.23765047 8.147175e-01 0.9141899453 -6.556298
## ILMN_2395214  8.693602  0.12395878 9.026609e-01 0.9591042109 -6.577500
## ILMN_2170814  9.633219 -0.07762576 9.389442e-01 0.9745572234 -6.582323

Further filtering and ordering

Q. Create a data frame containing genes with i) adjusted p-value < 0.05 ii) log-fold change > 1. How many genes does this contain?
Q. Now order this data frame according to chromosome

Improved analysis: Doing a paired analysis

Q. What would the design matrix look like to incorporate patient ID into the model
Q. Create this model using the model.matrix function

Q. Proceed to fit your model to the data
Q. How many genes are DE? How does this compare to the previous model?

## Your answer here ##

The estrogen dataset

Recall the estrogen dataset from yesterday

library(affy)
targetsFile <- "estrogen/estrogen.txt"
pd <- read.AnnotatedDataFrame(targetsFile,header=TRUE,sep="",row.names=1)
pData(pd)

##              estrogen time.h
## low10-1.cel    absent     10
## low10-2.cel    absent     10
## high10-1.cel  present     10
## high10-2.cel  present     10
## low48-1.cel    absent     48
## low48-2.cel    absent     48
## high48-1.cel  present     48
## high48-2.cel  present     48

ER <- pData(pd)$estrogen
Time <- factor(pData(pd)$time.h)
design <- model.matrix(~ER+Time)
design

##   (Intercept) ERpresent Time48
## 1           1         0      0
## 2           1         0      0
## 3           1         1      0
## 4           1         1      0
## 5           1         0      1
## 6           1         0      1
## 7           1         1      1
## 8           1         1      1
## attr(,"assign")
## [1] 0 1 2
## attr(,"contrasts")
## attr(,"contrasts")$ER
## [1] "contr.treatment"
## 
## attr(,"contrasts")$Time
## [1] "contr.treatment"

design2 <- model.matrix(~ER*Time)
design2

##   (Intercept) ERpresent Time48 ERpresent:Time48
## 1           1         0      0                0
## 2           1         0      0                0
## 3           1         1      0                0
## 4           1         1      0                0
## 5           1         0      1                0
## 6           1         0      1                0
## 7           1         1      1                1
## 8           1         1      1                1
## attr(,"assign")
## [1] 0 1 2 3
## attr(,"contrasts")
## attr(,"contrasts")$ER
## [1] "contr.treatment"
## 
## attr(,"contrasts")$Time
## [1] "contr.treatment"

Reading and normalising the data

We read the data and then apply rma normalisation. What are the steps?

raw <-ReadAffy(celfile.path = "estrogen", filenames=rownames(pData(pd)),phenoData = pd)
raw

## Warning: replacing previous import 'AnnotationDbi::tail' by 'utils::tail'
## when loading 'hgu95av2cdf'

## Warning: replacing previous import 'AnnotationDbi::head' by 'utils::head'
## when loading 'hgu95av2cdf'

##

## AffyBatch object
## size of arrays=640x640 features (19 kb)
## cdf=HG_U95Av2 (12625 affyids)
## number of samples=8
## number of genes=12625
## annotation=hgu95av2
## notes=

eset <- rma(raw)

## Background correcting
## Normalizing
## Calculating Expression

eset

## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 12625 features, 8 samples 
##   element names: exprs 
## protocolData
##   sampleNames: low10-1.cel low10-2.cel ... high48-2.cel (8 total)
##   varLabels: ScanDate
##   varMetadata: labelDescription
## phenoData
##   sampleNames: low10-1.cel low10-2.cel ... high48-2.cel (8 total)
##   varLabels: estrogen time.h
##   varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: hgu95av2

Fitting the models

Fit the model with and without interaction.

library(limma)
fit1 <- lmFit(eset, design)
fit1 <- eBayes(fit1)
topTable(fit1, coef=2)

##              logFC   AveExpr        t      P.Value    adj.P.Val         B
## 910_at    3.484397  9.660238 23.45683 9.857528e-10 4.856645e-06 12.293257
## 41400_at  2.312275 10.041553 23.18878 1.097642e-09 4.856645e-06 12.214778
## 39755_at  1.722290 12.131839 23.06484 1.154054e-09 4.856645e-06 12.177991
## 40117_at  2.415755  9.676557 20.61439 3.292172e-09 1.039092e-05 11.379761
## 31798_at  3.198764 12.115778 18.59763 8.570374e-09 1.848386e-05 10.605992
## 1824_s_at 1.778002  9.238870 18.54821 8.784407e-09 1.848386e-05 10.585509
## 36134_at  2.466614  8.275734 17.51134 1.496103e-08 2.698329e-05 10.137159
## 1884_s_at 2.693047  9.034796 16.99701 1.970251e-08 3.084438e-05  9.900860
## 1854_at   2.924256  8.532099 16.79596 2.198807e-08 3.084438e-05  9.805830
## 673_at    1.476417  9.828894 15.24039 5.372461e-08 6.630068e-05  9.015793

fit2 <- lmFit(eset, design2)
fit2 <- eBayes(fit2)
topTable(fit2, coef=2)

##              logFC   AveExpr        t      P.Value    adj.P.Val        B
## 39642_at  2.939428  7.876515 23.71715 4.741579e-09 3.128295e-05 9.966810
## 910_at    3.113733  9.660238 23.59225 4.955715e-09 3.128295e-05 9.942522
## 31798_at  2.800195 12.115778 16.38509 1.025747e-07 3.511070e-04 7.977290
## 41400_at  2.381040 10.041553 16.22463 1.112418e-07 3.511070e-04 7.916921
## 40117_at  2.555282  9.676557 15.68070 1.472942e-07 3.576234e-04 7.705093
## 1854_at   2.507616  8.532099 15.15848 1.945518e-07 3.576234e-04 7.490766
## 39755_at  1.679331 12.131839 15.06365 2.048314e-07 3.576234e-04 7.450643
## 1824_s_at 1.914637  9.238870 14.87915 2.266129e-07 3.576234e-04 7.371475
## 1126_s_at 1.782825  6.879918 13.83040 4.119252e-07 5.778395e-04 6.892307
## 1536_at   2.662258  5.937222 13.26247 5.795111e-07 7.316327e-04 6.610486

Q. Which one do you prefer? Why?

## 
##     0 
## 12625

Differential Expression Tutorial

Mark Dunning; mark ‘dot’ dunning ‘at’ cruk.cam.ac.uk, Oscar Rueda; oscar ‘dot’ rueda ‘at’ cruk.cam.ac.uk

Last modified: 10 Feb 2017

Importing the data

Q Do the data look to be normalised?

Q What column has the information about whether is sample is a tumour or normal?

Q How many do we have in each group?

Q Create a design matrix for the experiment, on paper if you like

Simple t-tests

Q. What is the interpretation of the `coeff` item in the output

Including annotation

Diagnostic checks

Exporting the results

Getting the results for a particular gene

Further filtering and ordering

Improved analysis: Doing a paired analysis

The estrogen dataset

Reading and normalising the data

Fitting the models

Q. Which one do you prefer? Why?

Differential Expression Tutorial

Mark Dunning; mark ‘dot’ dunning ‘at’ cruk.cam.ac.uk, Oscar Rueda; oscar ‘dot’ rueda ‘at’ cruk.cam.ac.uk

Last modified: 10 Feb 2017

Importing the data

Q Do the data look to be normalised?

Q What column has the information about whether is sample is a tumour or normal?

Q How many do we have in each group?

Q Create a design matrix for the experiment, on paper if you like

Simple t-tests

Q. What is the interpretation of the coeff item in the output

Including annotation

Diagnostic checks

Exporting the results

Getting the results for a particular gene

Further filtering and ordering

Improved analysis: Doing a paired analysis

The estrogen dataset

Reading and normalising the data

Fitting the models

Q. Which one do you prefer? Why?

Q. What is the interpretation of the `coeff` item in the output