Introduction to single-cell RNA-seq

Abel Vertesy

Disclaimer

This course material was heavily based on multiple sources:

History

Bulk RNA-seq

  • Replaced microarrays in the late 00’s and has been widely used since
  • Average expression level for 1000’s of genes
  • Useful for:
    • Compare tissues
    • Compare disease / healthy
  • Insufficient for studying
    • heterogeneous systems
      • early development
      • complex tissues (brain)
    • Stochastic gene expression

scRNA-seq

  • A new technology, first publication by Tang 2009
  • Widespread from ~2014: new protocols and lower sequencing costs
  • Measures the distribution of expression levels for each gene

scRNA-seq evolution

hist-3148232

Concepts

scRNA-seq allows to study new biological questions

  • Cell type identification
  • Heterogeneity of response
  • Stochasticity of gene expression
  • Reconstructing continous processeses from a single experiment
  • Holistic view, hypothesis-free science
    • Datasets range from \(10^2\) to \(10^6\) cells and increase in size every year
    • Not testing a concrete hypothesis
  • Hypothesis-driven and data-driven science
    • Heard about it? Could you explain?

Hypothesis-driven and data-driven science

personalized-medicine-in-transplantation-by-maarten-naesens-at-universit-libre-de-bruxelles-20140128-14-638

Experimental Strategies

Basic Workflow

RNA-Seq_workflow-5.pdf

from Wikipedia

Cell capture

Kolodziejczyk.2017

Manual isolation by mouth pipet

mouthpipet

Laser capture micro-dissection (LCM) into eppendorf tubes

LCM

FACS sorting into microwell plates

384wp {width=600px}

Isolation by FACS machine

  • Fluorescence activated cell sorting
  • You can select a subset of cells:
    • Size of blood cells
    • Dazl-GFP mouse line to select germ cells

FACS

FACS concept

FACS.scheme

Isolation by microfluidics & preparation insider droplets

DROPSeq > Macosko 2015

Picowell

picowell

Picocells allow simple sedimentation instead of FACS

  • Too small for FACS
  • Volume per well is so low, that even at 10% loading, you waste very little reagents

Nanowell.sorting

Platforms & scaling

nprot.2017.149-F1

Source: Svensson 2018

RNA amplification strategies

  • Full length
  • 3’ (aka: 3 prime)
  • 5’ (aka: 5 prime)

RNA structure

mRNA.editing

Expressed sequence tag counting (EST)

Different Coverage along the transcript (Full L. vs. 3’)

Coverage

Bhargava 2014

Full length

  • Advantage?
    • Isoform specificity
    • Higher sensitivity
  • SMART-seq2 [@Picelli2013-sb]

3’

  • Sufficient for transcript counting
  • Advantage?
    • A lot cheaper
  • 3’ alternative polyadenylation
    • One kind of isoforms
  • CEL-seq (Hashimshony 2012)
  • Drop-seq (Macosko 2015)
  • Almost all novel methods

5’

Experimental Procedures in detail

Steps in single-cell mRNA sequencing

RNA-Seq_workflow-5.pdf

Problems

  • RNA is unstable ← → RNAse enzymes are very robust
    • Convert it to cDNA immediately
  • Very low RNA input material → Huge amplification
  • Huge amplification → Amplification bias (“PCR artefact”)

Reverse transcriptase

A reverse transcriptase (RT) is an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. Reverse transcriptases are used by:

Wikipedia, edited

Reverse transcription creates uniquely tagged cDNA molecules

Library.Velten 2015.1

Velten 2015

Primer design & labelling strategy

Sequence element Function
poly-T stretch Select poly-A RNA / deplete ribosomal RNA-s
UMI (Unique Molecular Identifier) Label each mRNA molecule before amplification : correct for amplification bias at the end
CBC (Cell Barcode) Each mRNA is labeled by cell of origin: samples can be pooled / multiplexed after this step → less labor
Illumina adapter 2 of these are required for sequencing

Unique Molecular Identifiers (UMIs),

Kivioja 2012

UMI

You can do better than just read counting, reads that map to the

  • Same gene and have,
  • same CBC &
  • same UMI can be collapsed.

In-vitro transcription or PCR amplifies your

Library.Velten 2015.2

Velten 2015

Linearly amplification by IVT is less noise sensitive to fluctuation in input material

PCR IVT
aRNA = cDNA^n aRNA = n * cDNA

Linaer ampl

Kolodziejczyk 2015

  • After the invention of UMI, it only spares sequencing costs

Information in paired end reads

Library.Velten 2015.3

Velten 2015

Overview

Library.Velten 2015 {width=300px}

Velten 2015