The biology underlying somatic structural variation

Lecture 1

Exercise 1: Understanding the relationship between SVs and short read sequencing

  1. How might the following double-stranded break repair mechanisms manifest in short read sequence data aligned to reference genome at the breakpoint?
    • Homologous recombination
    • Non-homologous end-joining
    • Microhomology mediated end-joining
  2. How would the following structural variants manifest in short read sequence data aligned to reference genome?
    • Deletion
    • Tandem duplication
    • Inversion

      (Hint: How many breaks are there? Consider both reads that overlap the break and those that span the break.)

Advanced: Can you sketch how reads might overlap translocations?


Methods for calling SVs

Lecture 2

Exercise 2 - Calling and visualising your SVs

  1. Run BRASS on the downsampled test tumour
    1. Click on the CGP Docker link on the desktop
    2. Navigate to the directory /datastore
    3. make a new directory called brass_downsampled
    4. run brass with no parameters (brass.pl) and look carefully at the required parameters
    5. build a commandline for running brass on the downsampled bam file and hit run
  2. Open the test tumour and normal genomes in IGV, navigate to the following locations and record the number of reads supporting each breakpoint.
    • chr1:150447295-150447299
    • chr4:92896530-92896534
    • chr6:24910138-24910139

      (Hint: switch “Show soft-clipped bases” on under Tracks -> Preferences -> Alignments)

  1. Load the deep whole-genome cell-line BRASS output (SV bedpe format) into R.

      (Hint: Use the read.table function with the parameters: header, sep, stringsAsFactors, skip and comment.char. You will also need to reformat the chromosome co-ordinates e.g. change 1 to chr1.)

  1. Generate a plot showing a summary of the number, type and location of SVs.

  2. Generate a plot showing the size distributions of the different SV classes.

  1. Using the circlize package generate a circos plot with SVs as links (use this documentation to assist you)


Improving the quality of SV calls

Lecture 3

Exercise 3: Filtering your SV calls

  1. Load the ASCAT copy-number calls into R (for convenience these are stored in HCC1143.Rdata). Find the data object which contains segmented copy numbers and output the first 10 lines.
##    sample chr startpos   endpos nMajor nMinor
## 1 HCC1143   1   564621  6513908      2      2
## 2 HCC1143   1  6529823  8165619      2      1
## 3 HCC1143   1  8165719  8741438      2      2
## 4 HCC1143   1  8751268  9007556      4      1
## 5 HCC1143   1  9008739 20135822      2      1
## 6 HCC1143   1 20137714 23432346      2      2
  1. Using these segments, add the copy-number calls to your circos plot.

      (Hint: use the genomicTrackPlotRegion function)

  1. How many copy-number breakpoints can be explained by SV breakpoints?

      (Hint: Firstly, convert the copy-number breakpoints (not segments!) to a GRanges object. Then convert the bedpe table to an InteractionSet object. You can use the countOverlaps function to test for concordance within a specified distance.)

Within 10,000bp (count and fraction)?

## [1] 186
## [1] 0.2447368

Within 100,000bp (count and fraction)?

## [1] 344
## [1] 0.4526316

You may think that this isn’t particularly good. Perhaps BRASS is missing some calls? Perhaps the copy-number is noisy? Can we improve this?

(Advanced: explore the events that don’t have matching breakpoints. Can you work out what might be going wrong?)

  1. Look at the metadata provided by brass and determine which variables might be useful for filtering.

      (Hint: use the View() function to look at the complete table os SVs. Use the documentation located here to help decipher the column descriptors.

  1. Decide on a filtering strategy and generate a final SV callset.

Advanced: In the Day 4 directory you will also find SV calls from another caller, MANTA (somaticSV.bedpe). Can you compute the overlap between callers? Which is better?


The functional consequences of SVs

Lecture 4

Exercise 4: Annotating your SVs

  1. Manually (using pen and paper) reconstruct the following set of breakpoints into a single rearrangement event:
    • 2:47676199 + 2:50629634 +
    • 2:49437116 - 2:47678469 -
    • 2:50632712 - 2:49431472 +
  2. What might the functional consequence of this rearrangement be?

  3. Look at the GRASS annotations in the BRASS bedpe output for the cell line. Extract the events which involve two different genes. Which of these will NOT give rise to a viable fusion protein?

##                svclass     gene1 strand1.1   gene2 strand2.1
## 8   tandem-duplication     RPRD2         1    CTSS        -1
## 143           deletion     GRIP1        -1   RAP1B         1
## 159 tandem-duplication C14orf159         1   SMEK1        -1
## 184           deletion     MAST1         1  DNASE2        -1
## 191          inversion   CYP4F22         1 CYP4F12         1
## 194           deletion      DDA1         1   PLVAP        -1
## 209          inversion      HUNK         1   PCBP3         1
## 211 tandem-duplication    PRDM15        -1  UMODL1         1
## 213 tandem-duplication    PRDM15        -1   PCBP3         1
## 217 tandem-duplication     KDM6A         1     UXT        -1
  1. Which are most likely to give rise to a viable fusion product with at least 30% of whole exons from both genes?
##               svclass  gene1 strand1.1 region_number1 total_region_count1
## 16           deletion  RBBP5        -1             10                  14
## 36 tandem-duplication MTMR14         1              7                  19
## 82      translocation  HECW1         1             16                  30
##     gene2 strand2.1 region_number2 total_region_count2
## 16  RBM34        -1              6                  11
## 36 FANCD2         1             32                  44
## 82   HELZ        -1             20                  33

Advanced: What protein domains are included in these fusions?


Complex rearrangements

Lecture 5

Exercise 5: Characterising and annotating the HCC1143 double minute chromosome

  1. This sample contains a double minute chromosome. Which chromosomes are likely to make up this event? Plot a circos plot of the SVs explaining the double minute.

      (Hint: use the SKY karyotyping of this cell-line to assist in your search.)

  1. What are the genes contained in this double-minute chromosome?

      (Hint: use the boundingBox function to get the genomic coordinates spanning the double-minute. Use the list of cancer genes found in the COSMIC.67 package to annotate these regions - data(cgc_67, package = “COSMIC.67”))

## 'select()' returned many:many mapping between keys and columns
##  [1] "CREB3L1"  "DDB2"     "CLP1"     "SDHAF2"   "MEN1"     "CCND1"   
##  [7] "NUMA1"    "CDX2"     "FLT3"     "BRCA2"    "LHFP"     "FOXO1"   
## [13] "LCP1"     "RB1"      "ERCC5"    "ERCC4"    "MYH11"    "PALB2"   
## [19] "IL21R"    "FUS"      "CYLD"     "HERPUD1"  "CDH11"    "CBFB"    
## [25] "HIST1H4I" "TRIM27"   "POU5F1"   "DAXX"     "HMGA1"    "FANCE"   
## [31] "SRSF3"    "PIM1"     "TFEB"     "CCND3"    "HSP90AB1" "PRDM1"   
## [37] "FOXO3"    "ROS1"     "GOPC"     "STL"      "MYB"      "TNFAIP3" 
## [43] "ECT2L"    "EZR"      "TCEA1"    "PLAG1"    "CHCHD7"   "NCOA2"   
## [49] "HEY1"     "NBN"      "RUNX1T1"  "COX6C"    "UBR5"

Advanced: Can you find evidence of chromothripsis in the SVs or copy-number profiles? Use the criteria on the lecture slides to assist