For this workflow example we begin at this step 3.5. Load rse.

Previous steps are described for reference.

Approximate run time (8 core macbook pro): 6 minutes

1. Data

P56 forebrain mouse scATAC-seq from Preissl et al.
GEO GSE100033
5 primary cell types (8 sub types)
Combinatorial barcode technology
2088 cells (post QC) (50 cells in our toy example)
original fastq files contain all 3034 cells combined

2. Bioinformatics pipeline

input: fastq files
output: bam files and peaks bed file

install the bioinformatics pipeline from github, where yourPathToDestinRepo is your path to the local cloned repository

cd yourPathToDestinRepo
git clone https://github.com/urrutiag/destin.git

2.1. Set up parameters, paths, and directories

The bioinformatics pipeline is displayed as an example, and not meant to be run as is. Running it requires that the user:

sets all parameters appropriately
ensures that all bioinformatics dependencies are included in the system search PATH
ensures directories are selected appropriately

2.1.1. Designate inputs and parameters

select outDir as the directory to read input and print all output. In this example we use “yourPathToDestinRepo/practice/BuenrostroMouse”, where yourPathToDestinRepo is the path to your local cloned repository
(optionally) if you want to download files using sratoolkit, include SRRFile path
sampleName is used to name files and metadata
srcDir is the path to our bioinformatic processing scripts (located locally at “yourPathToDestinRepo/src”)
model is selected as hg19 or mm10
cellData is the path to a sample description table with each row corresponding to a cell

outDir=yourPathToDestinRepo/practice/PreisslP56
SRRFile=$outDir/SRR_Acc_ListPreisslP56.txt
sampleName=PreisslP56
srcDir=yourPathToDestinRepo/src
model=mm10
cellData=$outDir/cellData1Preissl.txt

2.1.2. Create directories

This is the output tree for the bioinformatics processing pipeline. This should not be customized. The only selectable parameter is the outDir, defined in the last step

mkdir -p $outDir
mkdir -p $outDir/temp
mkdir -p $outDir/bam
mkdir -p $outDir/peaks
mkdir -p $outDir/fastq
mkdir -p $outDir/fastqCombined

2.1.3. Set model specific parameters for mm10

Here we select the path to model-specific files, these must be set specific to your system

genomeBowtie is the path to the indexed genome for bowtie2
genomeMacs2 is the model for MACS2 peak caller: either mm (mouse) or hs (human)
blacklistFile is the path to file containing peaks to exclude: either ENCFF547MET.bed.gz (mouse) or ENCFF001TDO.bed.gz (human). We have included these files in our github repository (located locally at pseudoname “yourPathToDestinRepo/data/blacklist”).

export genomeBowtie=/proj/seq/data/MM10_UCSC/Sequence/Bowtie2Index/genome
export genomeMacs2=mm;
export blacklistFile=yourPathToDestinRepo/data/blacklist/ENCFF547MET.bed.gz

2.1.4. Ensure that dependencies are in system search PATH

In order for the bioninformatics pipeline to run, several dependencies must be included in the system search PATH:

sratoolkit (optional if you wish to dowload fastq files from SRA)
cutadapt
bowtie2
samtools
macs/2016-02-15 (latest version)
bedtools

additionally, we set “picardPath”, the system specific path to picard, manually

below is an example when using a high performance cluster, but this will need to be customized to your system.

module add sratoolkit
module add cutadapt
module add bowtie2
module add samtools
export picardPath=~/applications/picard/build/libs
module add macs/2016-02-15
module add bedtools

Alternatively, manually add your system specific path to the executable to the system search PATH, as below

export PATH=/nas/longleaf/apps/bowtie2/2.3.4.1/bin:$PATH

To ensure you have set up the path to bowtie2 correctly, type “which bowtie2” and the result should be your system specific path to the executable

which bowtie2
/nas/longleaf/apps/bowtie2/2.3.4.1/bin/bowtie2

2.2. (Optional) Download fastq files for toy example

This is an optional step, if you wish to download fastq files from the SRA repository. The first line reads the SRRFile contents into an array. The next line downloads those files. –split-files tells sratoolkit that the read are paired and we desire 2 files per cell/sample. This pipeline assumes paired reads.

SRRNames=( $( cat $SRRFile ) )
fastq-dump "${SRRNames[@]}" --split-files -O $outDir/fastqCombined --gzip

2.2.1. (Technology specific step) Split the combined fastq into individual fastq files by cell

input: 1 pair of fastq paired read files for entire experiment output: individual paired fastq files separate by cell

This is necessary becasue with combinatorial barcode technology, the fastq contains all cells. Preissl et al. provide demultiplexed fastq files, meaning that the barcode combination is written in the sample name of each fastq entry. Below is a call to our python script used to split the fastq to individual cells. It assumes the fastq is demultiplexed.

for inputFile in $outDir/fastqCombined/$sampleName.R1.fastq.gz  $outDir/fastqCombined/$sampleName.R2.fastq.gz
do
sbatch $homeDir/src/splitCells.py \
--inputFile $inputFile \
--outputDir $outDir \
--sampleName $sampleName.$read \
--cellIDFile $cellIDFile 
done

2.3. Align reads

input: paired-read fastq files, one pair of fastq files per cell
output: indexed bam files, one bam file per cell

cut adapters
align
sam to bam
sort
Add read group and index
mark duplicates
remove mitochondrial, unmapped and chr Y
adjust for Tn5 insertion
alignment quality >= 30
index

We first cut Illumina adaptors using cutadapt with –minimum length set to 20. parameters w, x, y, z refer to cutadapt inputs. In this experiment Illumina adaptors were used, but the sequences can be adapted to a different experiment. See https://cutadapt.readthedocs.io/en/stable/guide.html for further info.

-w refers to cutadapt input ADAPTER_FWD “-A” -x refers to cutadapt input ADAPTER_REV “-a” -y refers to cutadapt input second ADAPTER_FWD “-G” -z refers to cutadapt input second ADAPTER_REV “-g”

Reads are aligned to respective genome using bowtie2 with setting X2000 to ensure paired reads are within 2kb of one another. Samtools is used to covert to bam format. Picard tools is then used to perform a series of tasks: SortSam to sort; AddOrReplaceReadGroups to add read groups and index; MarkDuplicates to mark and remove duplicates; and BuildBamIndex to index reads. Samtools is used to remove mitochondrial, unmapped, and chromosome Y reads.

Next, due to Tn5 insertion, the starting position of forward reads is adjusted +4, and the end position of reverse reads is adjusted -5. This is performed using a custom awk command. Only aligned reads with map quality over 30 are retained. Finally, aligned reads are indexed by picard SortSam and indexed by samtools.

for cellID in "${cellIDs[@]}"
do
  $srcDir/fastqToBam.sh \
    -c $cellID  \
    -s $sampleName \
    -o $outDir \
    -g $genomeBowtie \
    -w CTGTCTCTTATACACA \
    -x CTGTCTCTTATACACA \
    -y GATGTGTATAAGAGACAG \
    -z GATGTGTATAAGAGACAG
done

2.4. Call Peaks

input: indexed bam file, one file per cell
output: filtered narrow_peaks.bed

Overview

call peaks
filter blacklist

Peaks are called a single time, inlcuding all cell bam files as input. This is essentially calling peaks on a psuedobulk tissue incorporating all cell types.

Peaks are called by MACS2 using parameters: –nomodel -p 0.01. Thus, only peaks with p-value below 0.01 are retained. Peaks are filtered using an ENCODE annotated blacklist file mainly consisting of low-mappability regions and repeat regions: “ENCFF547MET.bed” for mm10 and “wgEncodeDacMapabilityConsensusExcludable.bed” for hg19.

$srcDir/callPeaks.sh \
-s $sampleName \
-o $outDir \
-l $blacklistFile \
-q $genomeMacs2

3. Destin

Destin incorporates a set of existing and novel practices regarding normalization, dimension reduction, and clustering. Several existing and novel techniques are cast as tuning parameters which are optimized by an unsupervised model-based likelihood.

Specifically, Destin incorporates biological weights for chromatin accessible regions based on reference annotations and bulk regulomic profiles. These weights, the number of clusters, and the number of principle components are cast as tuning parameters. Destin fits the full binary chromatin accessibility data using the multinomial distribution, providing an objective function in an unsupervised clustering context.

20 cells (out of 192) in this Destin toy example

startTime = Sys.time()

3.1. Load Destin R package

Please see the README.md for instructions to install the github repository. You can install the R package from the local copy. yourPathToDestinRepo is your path to the local cloned repository

yourPathToDestinRepo = "~/Documents/gitRepos/destin"

install.packages(file.path(yourPathToDestinRepo,"package"), repos = NULL, type = "source")

load the package

library(destin, quietly = T)

3.2. Set parameters

sampleName should be the same as used during the pipeline, used to identify input files
outDir is the home directory for input files, yourPathToDestinRepo is your path to the local cloned repository
cellDataFile is the path to a sample description table with each row corresponding to a cell
model is selected as hg19 or mm10

sampleName = "PreisslP56"
outDir = file.path(yourPathToDestinRepo, "practice/PreisslP56")
model = "mm10"

3.3. Create RSE from bam and peaks

inputs: indexed bam files (one bam per cell), peaks file
output: ranged summarized experiment containing:

colData which describes the cells
rowRanges which describes the peaks
assay which is the binary chromatin accessibility matrix

bamDir = file.path(outDir, "bam")
peaksDir = file.path(outDir, "peaks")
bedFile = file.path(peaksDir, 
                    paste0(sampleName, "_peaks.blacklist_removed.narrowPeak"))

extraCols_narrowPeak <- c(signalValue = "numeric", pValue = "numeric",
                          qValue = "numeric", peak = "integer")
bedData = import(bedFile, format = "BED",
                 extraCols = extraCols_narrowPeak)

bamFiles = dir( bamDir, pattern = "final.bam" )
bamFiles = bamFiles[!grepl("bai", bamFiles)]

rse = createRSE(bamDir, bamFiles, bedData)

3.4. Annotate regions

Annotate the ranges with TSS distance and DHS frequency

TSS distance:

Distance to transcription start site is annotated using R function annotatePeakInBatch from the ChIPpeakAnno package. Data set ’TSS.human.GRCh37’ is used for model hg19 and ’TSS.mouse.GRCm38’ is used for model mm10. A promoter is defined as distance >=2kb and <= 1kb. Otherwise the chromatin accessible region is defined as distal regulatory element.

DHS frequency:

To generate the reference DNase hypersensitivity frequency map, we used the ENCODE database (Consortium et al., 2012).

For human reference, we used search terms: “bed broadpeaks”, “Homo sapiens”, “DNase-seq”, and “cell line”. This resulted in 138 experiments representing 99 cell lines, all from either the John Stamatoyannopoulos lab or Gregory Crawford lab.

For mouse reference, we used search terms: “bed broadpeaks”, “Mus musculus”, “DNase-seq”, and “tissue”. This resulted in 61 experiments representing 27 tissue types. All experiments were from the John Stamatoyannopoulos lab.

For processing we created a template representing the entire genome in 500bp bins using R functions tile and GRanges from package GenomicRanges. This was followed by GenomicRanges function overlapsAny to assign the files to the template. Overlaps were averaged across all files of same type (cell line or tissue), so that no type would be overrepresented. Finally, overlaps were averaged across all types.

rse = annotateRSE(rse, model)

3.5. Load rse

For this workflow example we begin at this step

yourPathToDestinRepo = "~/Documents/gitRepos/destin"
# install.packages(file.path(yourPathToDestinRepo,"package"), repos = NULL, type = "source")
library(destin, quietly = T)
sampleName = "PreisslP56"
outDir = file.path(yourPathToDestinRepo, "practice/PreisslP56")
model = "mm10"
load(file.path(outDir, "peaks", "PreisslP56Annotated.Rdata"))

here is a description of the ranged summarized experiment

column data is a description of the cells

colData(rse)

## DataFrame with 2088 rows and 6 columns
##                                cellID
##                           <character>
## 1    AGCGATAGAACCAGGTCGAATTCCCAGGACGT
## 2    AGCGATAGAACCAGGTGAAGTATGTATAGCCT
## 3    AGCGATAGAACCAGGTGGATACTATATAGCCT
## 4    AGCGATAGAACCAGGTTTACGACCCAGGACGT
## 5    AGCGATAGAAGAGGCACCTAGAGTTATAGCCT
## ...                               ...
## 2084 TCCGGAGATTCCATCCCGAATTCCATAGAGGC
## 2085 TCTCGCGCCGCTTGATTAAGATCCTATAGCCT
## 2086 TCTCGCGCCTCTGGTAGTAAGGAGCAGGACGT
## 2087 TCTCGCGCCTTCAATGTGCCTTACTATAGCCT
## 2088 TCTCGCGCGAATTACGTGCCTTACCAGGACGT
##                                            fileName   cellNum
##                                            <factor> <integer>
## 1    AGCGATAGAACCAGGTCGAATTCCCAGGACGT.p56.final.bam         8
## 2    AGCGATAGAACCAGGTGAAGTATGTATAGCCT.p56.final.bam         5
## 3    AGCGATAGAACCAGGTGGATACTATATAGCCT.p56.final.bam         3
## 4    AGCGATAGAACCAGGTTTACGACCCAGGACGT.p56.final.bam         5
## 5    AGCGATAGAAGAGGCACCTAGAGTTATAGCCT.p56.final.bam         7
## ...                                             ...       ...
## 2084 TCCGGAGATTCCATCCCGAATTCCATAGAGGC.p56.final.bam         9
## 2085 TCTCGCGCCGCTTGATTAAGATCCTATAGCCT.p56.final.bam         7
## 2086 TCTCGCGCCTCTGGTAGTAAGGAGCAGGACGT.p56.final.bam         1
## 2087 TCTCGCGCCTTCAATGTGCCTTACTATAGCCT.p56.final.bam         7
## 2088 TCTCGCGCGAATTACGTGCCTTACCAGGACGT.p56.final.bam         1
##                                       full         rep   cell_type
##                                <character> <character> <character>
## 1    rep2_AGCGATAGAACCAGGTCGAATTCCCAGGACGT        rep2         EX3
## 2    rep2_AGCGATAGAACCAGGTGAAGTATGTATAGCCT        rep2          OC
## 3    rep2_AGCGATAGAACCAGGTGGATACTATATAGCCT        rep2          MG
## 4    rep2_AGCGATAGAACCAGGTTTACGACCCAGGACGT        rep2          OC
## 5    rep1_AGCGATAGAAGAGGCACCTAGAGTTATAGCCT        rep1         EX2
## ...                                    ...         ...         ...
## 2084 rep2_TCCGGAGATTCCATCCCGAATTCCATAGAGGC        rep2         IN1
## 2085 rep2_TCTCGCGCCGCTTGATTAAGATCCTATAGCCT        rep2         EX2
## 2086 rep1_TCTCGCGCCTCTGGTAGTAAGGAGCAGGACGT        rep1         EX1
## 2087 rep2_TCTCGCGCCTTCAATGTGCCTTACTATAGCCT        rep2         EX2
## 2088 rep2_TCTCGCGCGAATTACGTGCCTTACCAGGACGT        rep2         EX1

rowRanges is a description of the peaks

rowRanges(rse)

## GRanges object with 132506 ranges and 9 metadata columns:
##            seqnames              ranges strand |     score signalValue
##               <Rle>           <IRanges>  <Rle> | <numeric>   <numeric>
##        [1]     chr1     3094846-3095468      * |        54      3.7804
##        [2]     chr1     3119613-3120663      * |       173     7.21925
##        [3]     chr1     3121290-3121630      * |        87     4.63918
##        [4]     chr1     3204803-3205050      * |        40     3.15033
##        [5]     chr1     3210307-3210604      * |        52     3.65854
##        ...      ...                 ...    ... .       ...         ...
##   [132502]     chrX 169983892-169984238      * |        81     3.61446
##   [132503]     chrX 169986766-169987309      * |       288     8.70445
##   [132504]     chrX 169987370-169987586      * |        50     3.19149
##   [132505]     chrX 170004288-170004509      * |        47     3.46536
##   [132506]     chrX 170674358-170674733      * |        70     4.41046
##               pValue    qValue      peak            DHSsum distanceToTSS
##            <numeric> <numeric> <integer>         <numeric>     <numeric>
##        [1]   5.49855   3.43169       192  4.16724245253657         -7171
##        [2]  17.36313  14.67793       827  6.78943142913731         17596
##        [3]   8.71732   6.43025       143  8.41274509803922         19273
##        [4]   4.03729   2.11312       152 0.867724867724868        102786
##        [5]   5.29288   3.25973       163 0.463429816370993        108290
##        ...       ...       ...       ...               ...           ...
##   [132502]   8.10175   5.84506       190  14.6639391109979          1340
##   [132503]  28.84528  25.81656       320  20.0946669071669           173
##   [132504]    5.0883   3.06845        82  19.2388828763829          -431
##   [132505]    4.7497   2.75158        75  4.91579591432533         -5372
##   [132506]   7.09503   4.90308       189  12.1866041366041          1713
##                       feature         region
##                   <character>    <character>
##        [1] ENSMUSG00000064842 distal element
##        [2] ENSMUSG00000064842 distal element
##        [3] ENSMUSG00000064842 distal element
##        [4] ENSMUSG00000064842 distal element
##        [5] ENSMUSG00000064842 distal element
##        ...                ...            ...
##   [132502] ENSMUSG00000087263 distal element
##   [132503] ENSMUSG00000086695       promoter
##   [132504] ENSMUSG00000086695       promoter
##   [132505] ENSMUSG00000095562 distal element
##   [132506] ENSMUSG00000093806 distal element
##   -------
##   seqinfo: 20 sequences from an unspecified genome; no seqlengths

assay is the binary chromatin accessibility matrix in sparse matrix format

rows are peaks
columns are cells

assay(rse)[1:20,1:20]

## 20 x 20 sparse Matrix of class "dgCMatrix"
##                                              
##  [1,] . . . . . . . . . . . . . . . . . . . .
##  [2,] . . . . 1 . . . . . . . . . . . . . . .
##  [3,] 1 . . . . . . . . . . . . . . . . . . .
##  [4,] . . . . . . . . . . . . . . . . . . . .
##  [5,] . . . . . . . . . . . . . . . . . . . .
##  [6,] . . . . . . . . . . . . . . . . . . . .
##  [7,] . . . . . . . . . . . . . . . . . . . .
##  [8,] . . . . . . . . . . . . . . . . . . . .
##  [9,] . . . . 1 . . . . . . . . . . . . . 1 .
## [10,] . . . . . . . . . . . . . . . . . . . 1
## [11,] 1 . . . . . . . . . . . . . . . . . . .
## [12,] . . . . . . . . . . . . . . . . . . . .
## [13,] . . . . 1 . . . . . . . . . . . . . . .
## [14,] . . . . . . . . . . . . 1 . . . . . . .
## [15,] 1 . . . . . . . . . . . . . . . . . . .
## [16,] . . . . . . . . . . . . . . . . . . . .
## [17,] . . . . . . . . . . . . . . . . . . . .
## [18,] . . . 1 . . . . . . . . . . . . . . . .
## [19,] . . . . . . . . . . . . . . . . . . . .
## [20,] . . . . . . . . . . . . . . . . . . . .

3.6. QC

Destin begins with quality control, retaining chromatin regions accessible in at least 5 cells, and retaining cells with total number of chromatin accessible regions within 3 standard deviations from median (robustly calculated using median absolute deviations). These default parameters can be modified if desired.

rse = doQC(rse, regionSumCutoff = 5, cellSumCutoffSDs = 3)

3.7. Determine number of clusters

Destin uses an unsupervised model-based likelihood (logLikeElbow) to estimate number of clusters. Like WCSSE, the likelihood is non-decreasing with increasing number of clusters. We fit the model log-likelihood as a linear spline function of number of clusters with a single knot. The knot placement which results in the best fit (lowest sum of squared errors) is selected as number of clusters (n_k).

Also shown for display purposes are results using 4 other methods to estimate number of clusters:

wcsseElbow: They fit WCSSE as a linear spline function of k with a single knot. The knot placement which results in the best fit (lowest sum of squared errors) is selected as n_k

silhouette: n_k is selected as the number of clusters that maximizes average silhouette width across all cells

distortion: accounts for the decrease in WCSSE by comparing to a baseline model incorporating a weight factor based on number of dimensions

GapStat: The gap (Tibshirani et al., 2001) statistic accounts for the tendency to overestimate n_k by using a permutation approach and fitting a null distribution to the WCSSE which naturally decreases as n_k increases. Then the empirical WCSSE is compared to the null distribution with the gap statistic as output. To select the number of clusters the “first SE” method is used which determines the first nk where the gap statistic is within 1 standard deviation of the following gap statistic.

Note: cluster::clusGap will produce warnings when computing the gap statistic. It also runs extremely slow. The gap statistic is for display purposes only and not used by Destin, so this is not a need for concern.

We search a space of 1 to 20 number of clusters (user-modifiable). Here we find 5 clusters.

set.seed(10)
nClustersMax = 20
clusterEst = estimateNClusters(rse, allMethods = F)

## [1] "computing elbow methods"

plotNClusters(clusterEst)

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

Precomputed results for all 5 metrics are loaded below. Estimated number of clusters shown in blue by method.

load(system.file("results", "clusterEstPreisslP56.Rdata", package = "destin"))
plotNClusters(clusterEst)

## Warning: Removed 1 rows containing missing values (geom_point).

3.8. Clustering cells

For each combination of TSS distance weights, DHS frequency weights, and number of PCs, cell type clustering is performed via k-means. The likelihood is calculated according to the multinomial model, and the clustering result producing the highest multinomial likelihood is selected as the final clustering result.

Set nCores to an appropriate value based on your system.

#set nCores to an appropriate value based on your system
nCores = 7 
nClusters = clusterEst$nClustersList$logLikeElbow
results = destinGrid (rse, nClusters = nClusters,
                      nCores = nCores)
colData(rse)$cluster = results$cluster$cluster
results$summary

##   nPCs TSSWeight1 TSSWeight2 DHSWeight1 DHSWeight2 nPeaksActual   logLike
## 1   10          1          2          1          3       130973 -21165323

Best results were found using 10 PCs, TSS weights (1,2) and DHS weights (1,3).

We label the clusters based on cell type labels provided by Preissl et al.
cluster 1: inhibitory neurons 1 cluster 2: microglia and astrocytes (which we refer to as neuroglia) cluster 3: excitatory neurons cluster 4: oligodendrocytes cluster 5: inhibitory neuron 2

We can visualize the clustering results using t-SNE, using default parameters. Note that we do not expect tSNE to match up precisely with our clustering results as we do not use tSNE to cluster. tSNE is highly variable, which can be verified by setting the seed. It is for broad visual purposes only.

Annotation by cell type is optional, as cell type is unknown in practice.

set.seed(1)
plotCluster(clusterResults = results, type = "t-SNE",
                clusterLabels = c("IN1", "NG", "EX", "OC", "IN2"))

For completeness, we show that raising the number of clusters resolves microglia and astrocyte cell types. Downstream analysis retains 5 clusters, however, as we assume we did not know the true number of cell types.

PCrange = 10
TSSWeightsList = list(c(1,2))
DHSWeightsList = list(c(1,3))
nClusters = 6
clusterResults6 = destinGrid (rse,  nClusters = nClusters,
                      PCrange = PCrange,
                      TSSWeightsList = TSSWeightsList,
                      DHSWeightsList = DHSWeightsList)

set.seed(1)
plotCluster(clusterResults6, type = "t-SNE",
                clusterLabels = c("OC", "IN1", "IN2", "EX", "AC", "MG"))

Next we reassign cells by maximizing each cell’s post-classification multinomial likelihood. The input to the multinomial model is the chromatin accessibility count matrix and the cell cluster assignments. The output is a cell by cluster likelihood matrix. In this example there is no change in clustering assignment

countMat = assay(rse)
cluster = results$cluster$cluster
logLikeMatrix = getLogLike(countMat, cluster, reassign = T)
clusterReassigned = apply(logLikeMatrix, 1, which.max)

3.9. Differential Accessibility

Destin performs differential chromatin accessibility testing via the Fisher test, generating a chromatin accessibility profile for each cluster. We compare each cell type to each other in a 1 vs. all others comparison, treating all chromatin accessible regions independently.

The output includes a genetic feature, Bonferroni corrected p-value and log2 fold change for each chromatin accessible region. We show the first two clusters for the first 1K peaks here.

#set nCores to an appropriate value based on your system
nCores = 7
rse1000 = rse[1:1000,]
rse1000 = getDiffAccess(rse1000, clusterMap = results$cluster, nCores = nCores)

Results of differential Accessibility: clusters 1 and 2

rowRanges(rse1000)[order(rowRanges(rse1000)$pValsCor_1, decreasing = F)][1:10, c("feature", "pValsCor_1", "log2FC_1")]

## GRanges object with 10 ranges and 3 metadata columns:
##        seqnames            ranges strand |            feature
##           <Rle>         <IRanges>  <Rle> |        <character>
##    [1]     chr1 15307453-15307975      * | ENSMUSG00000092083
##    [2]     chr1 23956610-23957021      * | ENSMUSG00000026155
##    [3]     chr1   6800135-6800854      * | ENSMUSG00000051285
##    [4]     chr1   3654966-3655506      * | ENSMUSG00000051951
##    [5]     chr1 21014415-21014955      * | ENSMUSG00000041809
##    [6]     chr1 23948318-23948688      * | ENSMUSG00000026155
##    [7]     chr1 12990988-12992968      * | ENSMUSG00000025938
##    [8]     chr1   8496345-8496726      * | ENSMUSG00000086235
##    [9]     chr1 25150356-25150859      * | ENSMUSG00000080849
##   [10]     chr1   5208010-5208629      * | ENSMUSG00000033793
##                  pValsCor_1         log2FC_1
##                   <numeric>        <numeric>
##    [1] 1.54561773128571e-12 4.25683306666548
##    [2] 5.40756260927268e-12 4.08690806522316
##    [3] 2.87105906231028e-08  3.7649799703358
##    [4] 2.88890517173067e-07 3.78290187833306
##    [5] 1.02340787617687e-05 3.05354946827673
##    [6] 1.02340787617687e-05 3.05354946827673
##    [7] 1.27698100943337e-05 1.05877662994028
##    [8] 1.47727668205138e-05 4.31341659503185
##    [9] 3.62415593056725e-05 2.54884653947104
##   [10] 0.000294925200934799 2.47547335314082
##   -------
##   seqinfo: 20 sequences from an unspecified genome; no seqlengths

rowRanges(rse1000)[order(rowRanges(rse1000)$pValsCor_2, decreasing = F)][1:10, c("feature", "pValsCor_2", "log2FC_2")]

## GRanges object with 10 ranges and 3 metadata columns:
##        seqnames            ranges strand |            feature
##           <Rle>         <IRanges>  <Rle> |        <character>
##    [1]     chr1 25203162-25203603      * | ENSMUSG00000080849
##    [2]     chr1 11044199-11044604      * | ENSMUSG00000048960
##    [3]     chr1 16224379-16224750      * | ENSMUSG00000089982
##    [4]     chr1 22763412-22763977      * | ENSMUSG00000097642
##    [5]     chr1   5753353-5753686      * | ENSMUSG00000093015
##    [6]     chr1 16120514-16120920      * | ENSMUSG00000025921
##    [7]     chr1 21563350-21563736      * | ENSMUSG00000082935
##    [8]     chr1 21491860-21492191      * | ENSMUSG00000082935
##    [9]     chr1   5866300-5866514      * | ENSMUSG00000033774
##   [10]     chr1 18416133-18416730      * | ENSMUSG00000067773
##                  pValsCor_2         log2FC_2
##                   <numeric>        <numeric>
##    [1] 4.76454361137892e-14 3.72446191370926
##    [2] 3.19342257487455e-09 4.65407258581786
##    [3] 1.03978931552725e-08 4.15657292634704
##    [4] 1.31244787651507e-07 2.77960346790172
##    [5] 1.32485145339735e-06 3.82399758726017
##    [6] 8.96146343355408e-06 3.72446191370926
##    [7] 1.15994879501643e-05 4.69846670517631
##    [8] 0.000102426056843116 4.11350420445516
##    [9] 0.000290261486979411 4.23903508653902
##   [10] 0.000554347162737379 2.82399758726017
##   -------
##   seqinfo: 20 sequences from an unspecified genome; no seqlengths

Time to run

TotalTime = Sys.time() - startTime
TotalTime

## Time difference of 6.210266 mins

Destin Preissl P56 forebrain