(Hint: How many breaks are there? Consider both reads that overlap the break and those that span the break.)
Advanced: Can you sketch how reads might overlap translocations?
(Hint: switch “Show soft-clipped bases” on under Tracks -> Preferences -> Alignments)
(Hint: Use the read.table function with the parameters: header, sep, stringsAsFactors, skip and comment.char. You will also need to reformat the chromosome co-ordinates e.g. change 1 to chr1.)
Generate a plot showing a summary of the number, type and location of SVs.
Generate a plot showing the size distributions of the different SV classes.
## sample chr startpos endpos nMajor nMinor
## 1 HCC1143 1 564621 6513908 2 2
## 2 HCC1143 1 6529823 8165619 2 1
## 3 HCC1143 1 8165719 8741438 2 2
## 4 HCC1143 1 8751268 9007556 4 1
## 5 HCC1143 1 9008739 20135822 2 1
## 6 HCC1143 1 20137714 23432346 2 2
(Hint: use the genomicTrackPlotRegion function)
(Hint: Firstly, convert the copy-number breakpoints (not segments!) to a GRanges object. Then convert the bedpe table to an InteractionSet object. You can use the countOverlaps function to test for concordance within a specified distance.)
Within 10,000bp (count and fraction)?
## [1] 186
## [1] 0.2447368
Within 100,000bp (count and fraction)?
## [1] 344
## [1] 0.4526316
You may think that this isn’t particularly good. Perhaps BRASS is missing some calls? Perhaps the copy-number is noisy? Can we improve this?
(Advanced: explore the events that don’t have matching breakpoints. Can you work out what might be going wrong?)
(Hint: use the View() function to look at the complete table os SVs. Use the documentation located here to help decipher the column descriptors.
Advanced: In the Day 4 directory you will also find SV calls from another caller, MANTA (somaticSV.bedpe). Can you compute the overlap between callers? Which is better?
What might the functional consequence of this rearrangement be?
Look at the GRASS annotations in the BRASS bedpe output for the cell line. Extract the events which involve two different genes. Which of these will NOT give rise to a viable fusion protein?
## svclass gene1 strand1.1 gene2 strand2.1
## 8 tandem-duplication RPRD2 1 CTSS -1
## 143 deletion GRIP1 -1 RAP1B 1
## 159 tandem-duplication C14orf159 1 SMEK1 -1
## 184 deletion MAST1 1 DNASE2 -1
## 191 inversion CYP4F22 1 CYP4F12 1
## 194 deletion DDA1 1 PLVAP -1
## 209 inversion HUNK 1 PCBP3 1
## 211 tandem-duplication PRDM15 -1 UMODL1 1
## 213 tandem-duplication PRDM15 -1 PCBP3 1
## 217 tandem-duplication KDM6A 1 UXT -1
## svclass gene1 strand1.1 region_number1 total_region_count1
## 16 deletion RBBP5 -1 10 14
## 36 tandem-duplication MTMR14 1 7 19
## 82 translocation HECW1 1 16 30
## gene2 strand2.1 region_number2 total_region_count2
## 16 RBM34 -1 6 11
## 36 FANCD2 1 32 44
## 82 HELZ -1 20 33
Advanced: What protein domains are included in these fusions?
(Hint: use the SKY karyotyping of this cell-line to assist in your search.)
(Hint: use the boundingBox function to get the genomic coordinates spanning the double-minute. Use the list of cancer genes found in the COSMIC.67 package to annotate these regions - data(cgc_67, package = “COSMIC.67”))
## 'select()' returned many:many mapping between keys and columns
## [1] "CREB3L1" "DDB2" "CLP1" "SDHAF2" "MEN1" "CCND1"
## [7] "NUMA1" "CDX2" "FLT3" "BRCA2" "LHFP" "FOXO1"
## [13] "LCP1" "RB1" "ERCC5" "ERCC4" "MYH11" "PALB2"
## [19] "IL21R" "FUS" "CYLD" "HERPUD1" "CDH11" "CBFB"
## [25] "HIST1H4I" "TRIM27" "POU5F1" "DAXX" "HMGA1" "FANCE"
## [31] "SRSF3" "PIM1" "TFEB" "CCND3" "HSP90AB1" "PRDM1"
## [37] "FOXO3" "ROS1" "GOPC" "STL" "MYB" "TNFAIP3"
## [43] "ECT2L" "EZR" "TCEA1" "PLAG1" "CHCHD7" "NCOA2"
## [49] "HEY1" "NBN" "RUNX1T1" "COX6C" "UBR5"
Advanced: Can you find evidence of chromothripsis in the SVs or copy-number profiles? Use the criteria on the lecture slides to assist