Running the Cancer Genome Project pipeline

The Sanger have made their cancer genome analysis pipeline available as a Docker container.

It uses some of the tools that we have introduced in this course

plus some we haven’t mentioned

A configuration file is used to control the pipeline. Here we grab an example file :-

cd /home/participant/Desktop/Course_Materials/
wget https://raw.githubusercontent.com/cancerit/cgpbox/master/examples/run.params

The example configuration file contains steps to download some example data. The example dataset used is a “downsampled” version of the HCC1143 cell-line that we have been following in the course.

We need to define a directory where our output results are going to be produced. The directory will need to have permissions for other users to write to it

mkdir cgp-run
chmod 777 cgp-run

The command required to run the pipeline is as follows:-

docker run --rm -v /home/participant/Course_Materials/cgp-run:/datastore -v /home/participant/Course_Materials/run.params:/datastore/run.params quay.io/wtsicgp/cgp_in_a_box

docker run :- command required to run a docker container

-v /home/participant/Course_Materials/cgp-run:/datastore :- allows the directory we created to be visible to the container

-v /home/participant/Course_Materials/run.params:/datastore/run.params :- allows the configuration file to be visible to the container

quay.io/wtsicgp/cgp_in_a_box :- the name of the container we want to run

Running on the pipeline on your own data

  1. If you have a Mac or Windows machine you need to download the Docker toolbox for your particular OS.
  1. Download the latest version of the pipeline

    docker pull quay.io/wtsicgp/cgp_in_a_box
  2. Copy the files you want to analyse to a particular directory

  3. Create a configuration file, run.params.template to define the location of your files, and their corresponding file names
  4. Replace YOUR_DIRECTORY with the path to the directory in the following command, and press enter

docker run --rm -v YOUR_DIRECTORY:/datastore -v YOUR_DIRECTORY/run.params.template:/datastore/run.params quay.io/wtsicgp/cgp_in_a_box
  1. Be prepared to wait…

Running the CRUK docker container

We have packaged some of the command-line tools used in the course as a Docker container

Plus some extra ones that you might find useful:-

docker pull markdunning/summer-school2016

To connect the docker container to a directory on your hard drive you have to use the -v argument, specifying a directory on your hard drive. This directory will get mapped to the directory /home/participant/Course_Materials in the docker container.

N.B. this is the command that you have been running when you click the “CRUK Docker” link on the Desktop during the course

docker run -ti -v YOUR_DIRECTORY:/home/participant/Course_Materials markdunning/summer-school2016 /bin/bash