gene.expression.txt
is a tab-delimited file, so we can use read.delim
to import ithead
function is used as a convenient way to see the first six rows of the resulting data framenormalizedValues <- read.delim("gene.expression.txt")
head(normalizedValues)
rownames(normalizedValues)
; giving a vectorcolnames(normalizedValues)
; giving a vectorgeneAnnotation <- read.delim("gene.description.txt",stringsAsFactors = FALSE)
head(geneAnnotation)
patientMetadata <- read.delim("cancer.patients.txt",stringsAsFactors = FALSE)
head(patientMetadata)
table(patientMetadata$er)
0 1
88 249
To get a feel for these data, we will look at how we can subset and order
er
columnpatientMetadata$er == 0
We can do the comparison within the square brackets
,
to index the columns as wellerNegPatients <- patientMetadata[patientMetadata$er == 0,]
head(erNegPatients)
or
View(erNegPatients)
Sorting is supported by the sort()
function
sort(erNegPatients$grade)
[1] 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[55] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
order()
order()
will give a set of numeric values which will give an ordered version of the vector
myvec <- c(90,100,40,30,80,50,60,20,10,70)
myvec
[1] 90 100 40 30 80 50 60 20 10 70
order(myvec)
[1] 9 8 4 3 6 7 10 5 1 2
myvec[9]
[1] 10
myvec[8]
[1] 20
N.B. order
will also work on character vectors
firstName <- c("Adam", "Eve", "John", "Mary", "Peter", "Paul", "Joanna", "Matthew", "David", "Sally")
order(firstName)
[1] 1 9 2 7 3 4 8 6 5 10
order()
to perform a subset of our original vectormyvec.ord <- myvec[order(myvec)]
myvec.ord
[1] 10 20 30 40 50 60 70 80 90 100
Implication: We can use order
on a particular column of a data frame, and use the result to sort all the rows
Here we order the age
column and use the result to re-order the rows in the data frame
erNegPatientsByAge <- erNegPatients[order(erNegPatients$age),]
head(erNegPatientsByAge)
order
to be Largest –> SmallesterNegPatientsByAge <- erNegPatients[order(erNegPatients$age,decreasing = TRUE),]
head(erNegPatientsByAge)
write.table(erNegPatientsByAge, file="erNegativeSubjectsByAge.txt", sep="\t")
## Your Answer Here ###
subset
a bit easier to use
$
operator to access columnschr8Genes <- subset(geneAnnotation, Chromosome=="chr8")
head(chr8Genes)
ESR1
is known to be hugely-different between ER positive and negative patient
==
to do this, but there are some alternatives that are worth knowing aboutmatch()
and grep()
are often used to find particular matches
match("D", LETTERS)
[1] 4
grep("F", rep(LETTERS,2))
[1] 6 32
match("F", rep(LETTERS,2))
[1] 6
grep
can also do partial matching
month.name
[1] "January" "February" "March" "April" "May" "June" "July" "August" "September"
[10] "October" "November" "December"
grep("ary",month.name)
[1] 1 2
grep("ber",month.name)
[1] 9 10 11 12
%in%
will return a logical if each element is contained in a shortened listmonth.name %in% c("May", "June")
[1] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
match
rowInd <- match("ESR1", geneAnnotation$HUGO.gene.symbol)
geneAnnotation[rowInd,]
myProbe <- geneAnnotation$probe[rowInd]
myProbe
[1] "NM_000125"
Now, find which row in our expression matrix is indexed by this ID
match(myProbe, rownames(normalizedValues))
[1] 384
normalizedValues[match(myProbe, rownames(normalizedValues)), 1:10]
myGeneExpression <- normalizedValues[match(myProbe,rownames(normalizedValues)),]
class(myGeneExpression)
[1] "data.frame"
We have expression values and want to visualise them against our categorical data
as.numeric
to create a vector that we can plotas.
functions exist to convert between various data typesboxplot(as.numeric(myGeneExpression) ~ patientMetadata$er)
p.adjust
(?p.adjust
)t.test(as.numeric(myGeneExpression) ~ patientMetadata$er)
Welch Two Sample t-test
data: as.numeric(myGeneExpression) by patientMetadata$er
t = -38.746, df = 205.88, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.246953 -1.126198
sample estimates:
mean in group 0 mean in group 1
-1.17388506 0.01269076
geneAnnotation <- read.delim("gene.description.txt",stringsAsFactors = FALSE)
patientMetadata <- read.delim("cancer.patients.txt",stringsAsFactors = FALSE)
normalizedValues <- read.delim("gene.expression.txt")
rowInd <- match("ESR1", geneAnnotation$HUGO.gene.symbol)
myProbe <- geneAnnotation$probe[rowInd]
myGeneExpression <- normalizedValues[match(myProbe,rownames(normalizedValues)),]
boxplot(as.numeric(myGeneExpression) ~ patientMetadata$er)
t.test(as.numeric(myGeneExpression) ~ patientMetadata$er)
Welch Two Sample t-test
data: as.numeric(myGeneExpression) by patientMetadata$er
t = -38.746, df = 205.88, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.246953 -1.126198
sample estimates:
mean in group 0 mean in group 1
-1.17388506 0.01269076
Repeat the same steps we performed for the gene ESR1, but for GATA3:
### Your Answer Here ###
This example has been simplified by the fact that the columns in the expression matrix are in the same order as the patient metadata. This would normally be the case for data obtained in a public repository such as Gene Expression Omnibus
colnames(normalizedValues)
[1] "NKI_4" "NKI_6" "NKI_7" "NKI_8" "NKI_9" "NKI_11" "NKI_12" "NKI_13" "NKI_14" "NKI_17"
[11] "NKI_23" "NKI_24" "NKI_26" "NKI_27" "NKI_28" "NKI_29" "NKI_30" "NKI_31" "NKI_32" "NKI_34"
[21] "NKI_35" "NKI_36" "NKI_37" "NKI_38" "NKI_39" "NKI_40" "NKI_41" "NKI_42" "NKI_43" "NKI_44"
[31] "NKI_45" "NKI_48" "NKI_51" "NKI_56" "NKI_57" "NKI_58" "NKI_59" "NKI_60" "NKI_61" "NKI_62"
[41] "NKI_69" "NKI_70" "NKI_71" "NKI_72" "NKI_73" "NKI_75" "NKI_76" "NKI_78" "NKI_79" "NKI_80"
[51] "NKI_83" "NKI_84" "NKI_85" "NKI_86" "NKI_88" "NKI_89" "NKI_90" "NKI_91" "NKI_92" "NKI_93"
[61] "NKI_94" "NKI_95" "NKI_96" "NKI_97" "NKI_98" "NKI_99" "NKI_100" "NKI_102" "NKI_103" "NKI_104"
[71] "NKI_106" "NKI_107" "NKI_108" "NKI_109" "NKI_110" "NKI_111" "NKI_113" "NKI_114" "NKI_116" "NKI_117"
[81] "NKI_118" "NKI_119" "NKI_120" "NKI_122" "NKI_123" "NKI_124" "NKI_125" "NKI_126" "NKI_127" "NKI_128"
[91] "NKI_129" "NKI_130" "NKI_131" "NKI_132" "NKI_133" "NKI_134" "NKI_135" "NKI_136" "NKI_137" "NKI_138"
[101] "NKI_139" "NKI_140" "NKI_141" "NKI_142" "NKI_144" "NKI_145" "NKI_146" "NKI_147" "NKI_148" "NKI_149"
[111] "NKI_150" "NKI_151" "NKI_153" "NKI_154" "NKI_155" "NKI_156" "NKI_157" "NKI_158" "NKI_159" "NKI_160"
[121] "NKI_161" "NKI_162" "NKI_163" "NKI_164" "NKI_165" "NKI_166" "NKI_167" "NKI_169" "NKI_170" "NKI_172"
[131] "NKI_174" "NKI_175" "NKI_176" "NKI_177" "NKI_178" "NKI_179" "NKI_180" "NKI_181" "NKI_182" "NKI_183"
[141] "NKI_184" "NKI_185" "NKI_186" "NKI_187" "NKI_188" "NKI_189" "NKI_190" "NKI_191" "NKI_192" "NKI_193"
[151] "NKI_194" "NKI_195" "NKI_196" "NKI_197" "NKI_198" "NKI_199" "NKI_200" "NKI_201" "NKI_202" "NKI_203"
[161] "NKI_205" "NKI_207" "NKI_208" "NKI_209" "NKI_210" "NKI_212" "NKI_213" "NKI_214" "NKI_215" "NKI_217"
[171] "NKI_218" "NKI_219" "NKI_220" "NKI_221" "NKI_222" "NKI_224" "NKI_226" "NKI_227" "NKI_228" "NKI_229"
[181] "NKI_230" "NKI_231" "NKI_233" "NKI_235" "NKI_236" "NKI_237" "NKI_238" "NKI_239" "NKI_240" "NKI_241"
[191] "NKI_243" "NKI_245" "NKI_246" "NKI_247" "NKI_248" "NKI_249" "NKI_250" "NKI_251" "NKI_252" "NKI_254"
[201] "NKI_256" "NKI_257" "NKI_258" "NKI_259" "NKI_260" "NKI_261" "NKI_263" "NKI_264" "NKI_265" "NKI_266"
[211] "NKI_267" "NKI_268" "NKI_269" "NKI_270" "NKI_271" "NKI_272" "NKI_273" "NKI_274" "NKI_275" "NKI_276"
[221] "NKI_277" "NKI_278" "NKI_280" "NKI_281" "NKI_282" "NKI_283" "NKI_284" "NKI_285" "NKI_286" "NKI_287"
[231] "NKI_288" "NKI_290" "NKI_291" "NKI_292" "NKI_293" "NKI_294" "NKI_295" "NKI_296" "NKI_297" "NKI_298"
[241] "NKI_300" "NKI_301" "NKI_302" "NKI_303" "NKI_304" "NKI_305" "NKI_306" "NKI_307" "NKI_308" "NKI_309"
[251] "NKI_310" "NKI_311" "NKI_312" "NKI_313" "NKI_314" "NKI_315" "NKI_317" "NKI_318" "NKI_319" "NKI_320"
[261] "NKI_321" "NKI_322" "NKI_323" "NKI_324" "NKI_325" "NKI_326" "NKI_327" "NKI_328" "NKI_329" "NKI_330"
[271] "NKI_331" "NKI_332" "NKI_333" "NKI_334" "NKI_335" "NKI_336" "NKI_337" "NKI_338" "NKI_339" "NKI_340"
[281] "NKI_341" "NKI_342" "NKI_343" "NKI_344" "NKI_345" "NKI_346" "NKI_347" "NKI_348" "NKI_349" "NKI_350"
[291] "NKI_351" "NKI_352" "NKI_353" "NKI_354" "NKI_355" "NKI_356" "NKI_357" "NKI_358" "NKI_359" "NKI_360"
[301] "NKI_361" "NKI_362" "NKI_363" "NKI_364" "NKI_365" "NKI_366" "NKI_367" "NKI_368" "NKI_369" "NKI_370"
[311] "NKI_371" "NKI_373" "NKI_374" "NKI_375" "NKI_377" "NKI_378" "NKI_379" "NKI_380" "NKI_381" "NKI_383"
[321] "NKI_385" "NKI_387" "NKI_388" "NKI_389" "NKI_390" "NKI_391" "NKI_392" "NKI_393" "NKI_394" "NKI_395"
[331] "NKI_396" "NKI_397" "NKI_398" "NKI_401" "NKI_402" "NKI_403" "NKI_404"
patientMetadata$samplename
[1] "NKI_4" "NKI_6" "NKI_7" "NKI_8" "NKI_9" "NKI_11" "NKI_12" "NKI_13" "NKI_14" "NKI_17"
[11] "NKI_23" "NKI_24" "NKI_26" "NKI_27" "NKI_28" "NKI_29" "NKI_30" "NKI_31" "NKI_32" "NKI_34"
[21] "NKI_35" "NKI_36" "NKI_37" "NKI_38" "NKI_39" "NKI_40" "NKI_41" "NKI_42" "NKI_43" "NKI_44"
[31] "NKI_45" "NKI_48" "NKI_51" "NKI_56" "NKI_57" "NKI_58" "NKI_59" "NKI_60" "NKI_61" "NKI_62"
[41] "NKI_69" "NKI_70" "NKI_71" "NKI_72" "NKI_73" "NKI_75" "NKI_76" "NKI_78" "NKI_79" "NKI_80"
[51] "NKI_83" "NKI_84" "NKI_85" "NKI_86" "NKI_88" "NKI_89" "NKI_90" "NKI_91" "NKI_92" "NKI_93"
[61] "NKI_94" "NKI_95" "NKI_96" "NKI_97" "NKI_98" "NKI_99" "NKI_100" "NKI_102" "NKI_103" "NKI_104"
[71] "NKI_106" "NKI_107" "NKI_108" "NKI_109" "NKI_110" "NKI_111" "NKI_113" "NKI_114" "NKI_116" "NKI_117"
[81] "NKI_118" "NKI_119" "NKI_120" "NKI_122" "NKI_123" "NKI_124" "NKI_125" "NKI_126" "NKI_127" "NKI_128"
[91] "NKI_129" "NKI_130" "NKI_131" "NKI_132" "NKI_133" "NKI_134" "NKI_135" "NKI_136" "NKI_137" "NKI_138"
[101] "NKI_139" "NKI_140" "NKI_141" "NKI_142" "NKI_144" "NKI_145" "NKI_146" "NKI_147" "NKI_148" "NKI_149"
[111] "NKI_150" "NKI_151" "NKI_153" "NKI_154" "NKI_155" "NKI_156" "NKI_157" "NKI_158" "NKI_159" "NKI_160"
[121] "NKI_161" "NKI_162" "NKI_163" "NKI_164" "NKI_165" "NKI_166" "NKI_167" "NKI_169" "NKI_170" "NKI_172"
[131] "NKI_174" "NKI_175" "NKI_176" "NKI_177" "NKI_178" "NKI_179" "NKI_180" "NKI_181" "NKI_182" "NKI_183"
[141] "NKI_184" "NKI_185" "NKI_186" "NKI_187" "NKI_188" "NKI_189" "NKI_190" "NKI_191" "NKI_192" "NKI_193"
[151] "NKI_194" "NKI_195" "NKI_196" "NKI_197" "NKI_198" "NKI_199" "NKI_200" "NKI_201" "NKI_202" "NKI_203"
[161] "NKI_205" "NKI_207" "NKI_208" "NKI_209" "NKI_210" "NKI_212" "NKI_213" "NKI_214" "NKI_215" "NKI_217"
[171] "NKI_218" "NKI_219" "NKI_220" "NKI_221" "NKI_222" "NKI_224" "NKI_226" "NKI_227" "NKI_228" "NKI_229"
[181] "NKI_230" "NKI_231" "NKI_233" "NKI_235" "NKI_236" "NKI_237" "NKI_238" "NKI_239" "NKI_240" "NKI_241"
[191] "NKI_243" "NKI_245" "NKI_246" "NKI_247" "NKI_248" "NKI_249" "NKI_250" "NKI_251" "NKI_252" "NKI_254"
[201] "NKI_256" "NKI_257" "NKI_258" "NKI_259" "NKI_260" "NKI_261" "NKI_263" "NKI_264" "NKI_265" "NKI_266"
[211] "NKI_267" "NKI_268" "NKI_269" "NKI_270" "NKI_271" "NKI_272" "NKI_273" "NKI_274" "NKI_275" "NKI_276"
[221] "NKI_277" "NKI_278" "NKI_280" "NKI_281" "NKI_282" "NKI_283" "NKI_284" "NKI_285" "NKI_286" "NKI_287"
[231] "NKI_288" "NKI_290" "NKI_291" "NKI_292" "NKI_293" "NKI_294" "NKI_295" "NKI_296" "NKI_297" "NKI_298"
[241] "NKI_300" "NKI_301" "NKI_302" "NKI_303" "NKI_304" "NKI_305" "NKI_306" "NKI_307" "NKI_308" "NKI_309"
[251] "NKI_310" "NKI_311" "NKI_312" "NKI_313" "NKI_314" "NKI_315" "NKI_317" "NKI_318" "NKI_319" "NKI_320"
[261] "NKI_321" "NKI_322" "NKI_323" "NKI_324" "NKI_325" "NKI_326" "NKI_327" "NKI_328" "NKI_329" "NKI_330"
[271] "NKI_331" "NKI_332" "NKI_333" "NKI_334" "NKI_335" "NKI_336" "NKI_337" "NKI_338" "NKI_339" "NKI_340"
[281] "NKI_341" "NKI_342" "NKI_343" "NKI_344" "NKI_345" "NKI_346" "NKI_347" "NKI_348" "NKI_349" "NKI_350"
[291] "NKI_351" "NKI_352" "NKI_353" "NKI_354" "NKI_355" "NKI_356" "NKI_357" "NKI_358" "NKI_359" "NKI_360"
[301] "NKI_361" "NKI_362" "NKI_363" "NKI_364" "NKI_365" "NKI_366" "NKI_367" "NKI_368" "NKI_369" "NKI_370"
[311] "NKI_371" "NKI_373" "NKI_374" "NKI_375" "NKI_377" "NKI_378" "NKI_379" "NKI_380" "NKI_381" "NKI_383"
[321] "NKI_385" "NKI_387" "NKI_388" "NKI_389" "NKI_390" "NKI_391" "NKI_392" "NKI_393" "NKI_394" "NKI_395"
[331] "NKI_396" "NKI_397" "NKI_398" "NKI_401" "NKI_402" "NKI_403" "NKI_404"
There is a quick shortcut to check that these names are the same using the all
function
colnames(normalizedValues) == patientMetadata$samplename
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[22] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[43] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[64] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[106] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[127] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[148] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[169] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[190] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[211] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[232] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[253] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[274] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[295] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[316] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[337] TRUE
all(colnames(normalizedValues) == patientMetadata$samplename)
[1] TRUE
Let’s say that our metadata have been re-ordered by ER status and age, and not by patient ID
patientMetadata <- patientMetadata[order(patientMetadata$er,patientMetadata$age),]
patientMetadata
rowInd <- match("ESR1", geneAnnotation$HUGO.gene.symbol)
myProbe <- geneAnnotation$probe[rowInd]
myGeneExpression <- normalizedValues[match(myProbe,rownames(normalizedValues)),]
boxplot(as.numeric(myGeneExpression) ~ patientMetadata$er)
t.test(as.numeric(myGeneExpression) ~ patientMetadata$er)
Welch Two Sample t-test
data: as.numeric(myGeneExpression) by patientMetadata$er
t = -1.7848, df = 133.53, p-value = 0.07656
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.29727383 0.01525417
sample estimates:
mean in group 0 mean in group 1
-0.3990460 -0.2580361
If we run the same check as before on the column names and patient IDs, we see that it fails:-
all(colnames(normalizedValues) == patientMetadata$samplename)
[1] FALSE
A solution is to use match
again. Specifically, we want to know where each column in the expression matrix can be found in the patient metadata. The result is a vector, each item of which is an index for a particular row in the patient metadata
match(colnames(normalizedValues),patientMetadata$samplename)
[1] 143 260 53 62 244 105 54 144 245 246 145 68 134 206 32 311 327 312 328 329 330 313 331 314 221 324 332
[28] 333 334 85 106 5 146 90 2 111 196 207 112 100 147 113 63 122 281 82 83 335 325 20 16 80 91 6
[55] 11 76 4 48 7 87 114 208 88 35 9 92 64 282 39 13 17 115 247 36 297 283 57 336 337 298 231
[82] 86 162 179 248 116 284 117 163 285 180 58 28 232 93 118 49 8 148 197 149 222 12 123 24 249 233 25
[109] 261 43 101 164 21 135 262 165 209 69 198 94 223 286 119 77 124 181 199 136 166 224 150 65 225 66 250
[136] 33 102 18 182 167 44 168 59 137 151 125 251 97 152 287 210 50 234 22 183 126 184 95 60 263 288 200
[163] 211 153 289 40 264 154 70 42 120 226 169 212 14 201 71 127 29 213 3 185 170 235 72 78 41 138 236
[190] 37 202 252 19 290 34 203 128 265 227 107 266 10 186 171 172 291 129 173 38 187 267 26 27 79 174 188
[217] 292 268 269 23 108 270 253 254 255 271 214 189 155 204 98 272 130 256 228 273 257 229 109 293 99 237 238
[244] 190 139 140 191 45 156 192 51 175 239 141 131 142 193 110 258 205 132 215 157 55 176 30 274 158 275 1
[271] 259 73 46 103 67 240 89 74 216 194 52 217 218 47 241 276 159 294 219 56 160 195 104 242 295 277 121
[298] 220 278 279 177 230 178 75 299 296 280 96 31 300 301 302 315 316 84 317 318 319 320 321 326 322 323 303
[325] 304 305 306 307 308 309 81 310 15 161 61 243 133
The vector we have just generated can then by used to re-order the rows in the patient metadata
patientMetadata <- patientMetadata[match(colnames(normalizedValues),patientMetadata$samplename),]
patientMetadata
all(colnames(normalizedValues) == patientMetadata$samplename)
[1] TRUE
And we can now proceed to perform the analysis and can the result we expect
rowInd <- match("ESR1", geneAnnotation$HUGO.gene.symbol)
myProbe <- geneAnnotation$probe[rowInd]
myGeneExpression <- normalizedValues[match(myProbe,rownames(normalizedValues)),]
boxplot(as.numeric(myGeneExpression) ~ patientMetadata$er)
t.test(as.numeric(myGeneExpression) ~ patientMetadata$er)
Welch Two Sample t-test
data: as.numeric(myGeneExpression) by patientMetadata$er
t = -38.746, df = 205.88, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.246953 -1.126198
sample estimates:
mean in group 0 mean in group 1
-1.17388506 0.01269076