ensembl-hive 2.1

All Classes Namespaces Files Functions Pages

Inheritance diagram for MiniPecanMulti_conf:

Collaboration diagram for MiniPecanMulti_conf:

Public Member Functions
public	default_options ()

public	pipeline_analyses ()

Public Member Functions inherited from Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf
public	default_options ()

public	pipeline_create_commands ()

public	pipeline_wide_parameters ()

public	resource_classes ()

public	pipeline_analyses ()

public	beekeeper_extra_cmdline_options ()

public	hive_meta_table ()

public	pre_options ()

public	dbconn_2_mysql ()

public	dbconn_2_pgsql ()

public	db_connect_command ()

public	db_execute_command ()

public	dbconn_2_url ()

public	pipeline_url ()

public	db_cmd ()

public	pipeline_name ()

public	process_options ()

public	overridable_pipeline_create_commands ()

public	run_pipeline_create_commands ()

public	add_objects_from_config ()

public	useful_commands_legend ()

Public Member Functions inherited from Bio::EnsEMBL::Hive::DependentOptions
public	new ()

public	use_cases ()

public	load_cmdline_options ()

public	root ()

public	is_fully_substituted_string ()

public	is_fully_substituted_structure ()

public	hash_leaves ()

public	o ()

public	substitute ()

public	merge_from_rules ()

public	process_options ()

Detailed Description

Synopsis

init_pipeline.pl MiniPecanMulti_conf -password <your_password>
init_pipeline.pl MiniPecanMulti_conf -hive_driver sqlite -password <FOO>

Description

    This is an example pipeline put together from basic building blocks:

    Analysis_1: SystemCmd.pm is used to run Pecan on a set of files

        the job is sent down the branch #1 into the second analysis

    Analysis_2: SystemCmd.pm is used to run gerp_col on the resulting alignment

        the job is sent down the branch #1 into the third analysis

    Analysis_3: SystemCmd.pm is used to run gerp_elem on the GERP scores

Member Function Documentation

public MiniPecanMulti_conf::default_options ( )

    Description : Implements default_options() interface method of Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf that is used to initialize default options.

Code:

click to view

public MiniPecanMulti_conf::pipeline_analyses ( )

    Description : Implements pipeline_analyses() interface method of Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf that defines the structure of the pipeline: analyses, jobs, rules, etc.
                  Here it defines two analyses:

                      'pecan'  aligns sequences with Pecan
                      Each job of this analysis will dataflow (create jobs) via branch #1 into 'gerp_col' analysis.

                      'gerp_col' runs gerp_col on Pecan output
                      Each job of this analysis will dataflow (create jobs) via branch #1 into 'gerp_elem' analysis.

                      'gerp_elem' runs gerp_elem on gerp_col output

Code:

click to view

sub pipeline_analyses {
    my ($self) = @_;
    return [
        ## First analysis: PECAN
        {   -logic_name => 'pecan',
            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::SystemCmd',
            -parameters => {
                # The cmd parameter is required by the SystemCmd module. It defines the command line to be run.
                # Note that some values are written between #hashes#. Those will be subtituted by the corresponding input values
                'cmd'     => 'java -cp /soft/pecan_v0.8/pecan_v0.8.jar bp.pecan.Pecan -E "#tree_string#" -F #input_files# -G #msa_file#',
            },
            
            -hive_capacity  => 200, # max. number of parallel jobs
            -input_ids  => [
                # Each input_id is a new job for this analysis. Here we are defining the input_files and the msa_file for
                # three different jobs.
                {
                  'tree_string' => '(((HUMAN,(MOUSE,RAT)),COW),OPOSSUM);',
                  'input_files' => 'human.fa mouse.fa rat.fa cow.fa opossum.fa',
                  'msa_file'    => "pecan_no_chicken.mfa",
                },
                {
                  'tree_string' => '((((HUMAN,MOUSE),COW),OPOSSUM),CHICKEN);',
                  'input_files' => 'human.fa mouse.fa cow.fa opossum.fa chicken.fa',
                  'msa_file' => "pecan_no_rat.mfa",
                },
                {
                  'tree_string' => '(((HUMAN,COW),OPOSSUM),CHICKEN);',
                  'input_files' => 'human.fa cow.fa opossum.fa chicken.fa',
                  'msa_file' => "pecan_no_rodents.mfa",
                },
            ],
            -flow_into => {
                # dataflow rule. Once a 'pecan' job is done, it will create a new 'gerp_col' job.
                # The input_id for the new job will be the same as for the previous job (this is
                # only true for branch 1. In this case, 'tree_string', 'input_files' and 'msa_file'
                # values are used to create a new 'gerp_col' job (only msa_file is actually required).
                1 => [ 'gerp_col' ],
            },
        },
        ## Second analysis: GERP_COL
        {   -logic_name => 'gerp_col',
            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::SystemCmd',
            -parameters => {
                # In this case, #msa_file# comes from the parent 'pecan' job.
                'cmd'         => 'gerpcol -t tree.nw -f #msa_file# -a -e HUMAN',
            },
            -hive_capacity  => 20, # max. number of parallel jobs
            -flow_into => {
                # dataflow rule, branch 1. The input_id for the new job will be the same as for the
                # previous job, i.e. 'tree_string', 'input_files' and 'msa_file' values are used to
                # create a new 'gerp_elem' job (only msa_file is actually required).
                1 => [ 'gerp_elem' ],
            },
        },
        ## Third analysis: GERP_ELEM
        {   -logic_name => 'gerp_elem',
            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::SystemCmd',
            -parameters => {
                # In this case, #msa_file# comes from the parent 'gerp_col' job, which in turn comes from its parent 'pecan' job.
                'cmd'   => 'gerpelem -f #msa_file#.rates -c chr13 -s 32878016 -x .bed',
            },
            -hive_capacity  => 200, # max. number of parallel jobs
        },
    ];
}

The documentation for this class was generated from the following file:

docs/presentations/HiveWorkshop_22Feb2013/MiniPecanMulti_conf.pm

MiniPecanMulti_conf
Generated by 1.8.6