Bionode Documentation

This documentation is autogenerated and merges the documentation of each bionode module.

Usually each module section starts with some usage examples and is then subdivided by the methods the module provides.

¶

bionode-fasta (see source)

Streamable FASTA parser.

doi: ? author: Bruno Vieira email: mail@bmpvieira.com license: MIT

Usage

This module can be used in Node.js as described further below, or as a command line tool. Examples:

$ npm install -g bionode-fasta

# bionode-fasta [options] [input file] [output file]
$ bionode-fasta input.fasta.gz output.json

# You can also use fasta files compressed with gzip
# If no output is provided, the result will be printed to stdout
# Options: -p, --path: Includes the path of the original file as a property of the output objects

Fasta

Returns a Writable Stream that parses a FASTA content Buffer into a JSON Buffer

var fasta = require('bionode-fasta')

fs.createReadStream('./input.fasta')
.pipe(fasta())
.pipe(process.stdout)

=> { "id": "contig1",
     "seq": "AGTCATGACTGACGTACGCATG" }
=> { "id": "contig2",
     "seq": "ATGTACGTACTGCATGC" }
=> [...]

Can also parse content from filenames Strings streamed to it

fs.createReadStream('./fasta-list.txt')
.pipe(split())
.pipe(fasta({filenameMode: true}))
.pipe(process.stdout)

When filenames are Streamed like in the previous example, or passed directly to the parser Stream, they can be added to the output Objects

fasta({includePath: true}, './input.fasta')
.pipe(process.stdout)

=> { "id": "contig1",
     "seq": "AGTCATGACTGACGTACGCATG" }
     "path": "./input.fasta" }

The output from the parser can also be available as Objects instead of Buffers

fasta({objectMode: true}, './input.fasta')
.on('data', console.log)

Shortcut version of previous example

fasta.obj('./input.fasta').on('data', console.log)

Callback style can also be used, however they might not be the best for large files

fasta.obj('./input.fasta', function(data) {
  console.log(data)
})

¶

bionode-ncbi (see source)

Node.js module for working with the NCBI API (aka e-utils) using Streams.

doi: 10.5281/zenodo.10610 author: Bruno Vieira email: mail@bmpvieira.com license: MIT

Usage

This module can be used in Node.js as described further below, or as a command line tool. Examples:

$ npm install -g bionode-ncbi

# bionode-ncbi [command] [arguments] --limit (-l) --throughput (-t)
$ bionode-ncbi search taxonomy solenopsis
$ bionode-ncbi search sra human --limit 500 # only return 500 items
$ bionode-ncbi search sra human --throughput 250 # fetch 250 items per API request
$ bionode-ncbi download assembly solenopsis invicta
$ bionode-ncbi urls sra solenopsis invicta
$ bionode-ncbi link assembly bioproject 244018
$ bionode-ncbi search gds solenopsis | dat import --json

Search

Takes a NCBI database string and a optional search term and returns a stream of objects found:

ncbi.search('sra', 'solenopsis').on('data', console.log)
=> { uid: '280116',
     expxml: {"Summary":{"Title":"Single Solenopsis invicta male","Platform":{"_":"ILLUMINA", [...],
     runs: {"Run":[{"acc":"SRR620577","total_spots":"23699662","total_bases":"4787331724", [...],
     extlinks: '    ',
     createdate: '2013/02/07',
     updatedate: '2012/11/28' }
=> { uid: '280243',
     expxml: {"Summary":{"Title":"Illumina small-insert paired end","Platform":{"_":"ILLUMINA", [...],
     runs: {"Run":[{"acc":"SRR621118","total_spots":"343209818","total_bases":"34320981800", [...],
     extlinks: '    ',
     createdate: '2013/02/07,
     updatedate: '2012/11/28' }
=> [...]

Arguments can be passed as an object instead:

ncbi.search({ db: 'sra', term: 'solenopsis' })
.on('data', console.log)

Advanced options can be passed using the previous syntax:

var options = {
  db: 'assembly', // database to search
  term: 'human',  // optional term for search
  limit: 500,     // optional limit of NCBI results
  throughput: 100 // optional number of items per request
}

The search term can also be passed with write:

var search = ncbi.search('sra').on('data', console.log)
search.write('solenopsis')

Or piped, for example, from a file:

var split = require('split')

fs.createReadStream('searchTerms.txt')
.pipe(split())
.pipe(search)

Link

Takes a string for source NCBI database and another for destination db and returns a objects stream with unique IDs linked to the passed source db unique ID.

ncbi.link('taxonomy', 'sra', 443821)
=> { "srcDB":"taxonomy",
     "destDB":"sra",
     "srcUID":"443821",

     "destUID":"677548" }
=> { "srcDB":"taxonomy",
     "destDB":"sra",
     "srcUID":"443821",
     "destUID":"677547" }
=> [...]

Also works with write and pipe, like Search.

¶

Property link (Plink)

Similar to Link but taked the srcID from a property of the Streamed object and attached the result to a property with the name of the destination DB.
```
ncbi.search('genome', 'arthropoda')
.pipe(ncbi.expand('tax'))
.pipe(ncbi.plink('tax', 'sra')
```
¶

Download

Takes a NCBI database string and a optional search term and downloads the datasets/sequence files. Currently only supports sra and assembly databases. Also accepts the keyword gff for annotations. Returns a stream that emits download progress and ends with download path The name of the folder where the file is saved corresponds to the UID from NCBI.
```
ncbi.download('assembly', 'solenopsis invicta')
.on('data', console.log)
.on('end', function(path) {
  console.log('File saved at ' + path)
}
=> Downloading 244018/unplaced.scaf.fa.gz 0.94 % of 106 MB at 0.48 MB/s
=> Downloading 244018/unplaced.scaf.fa.gz 100.00 % of 106 MB at 0.49 MB/s"
=> File saved at 244018/unplaced.scaf.fa.gz
```
¶

URLs

Takes a NCBI database string and a optional search term and returns as stream of dataset/sequence files URLs. Currently only supports sra and assembly databases. Also accepts the keyword gff for annotations. The value of the uid property corresponds to the UID from NCBI.
```
ncbi.urls('assembly', 'solenopsis invicta')
.on('data', console.log)
=> {"url":"http://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/invertebrates/Solenopsis_invicta/Si_gnG/Primary_Assembly/unplaced_scaffolds/FASTA/unplaced.scaf.fa.gz",
    "uid":"244018/"}
```
¶

Expand

Takes a property (e.g., biosample) and optional destination property (e.g., sample) and looks for a field named property+id (biosampleid) in the Streamed object. Then it will do a ncbi.search for that id and save the result under Streamed object.property.
```
ncbi.search('genome', 'arthropoda').pipe(ncbi.expand('assembly'))
```
¶

Taxonomy doesn’t work just with ID number
¶

Fetch

Allows retrieval of records from NCBI databases. Takes the database name, and a search term, and returns the records from the database that match the search term. There are optional advanced parameters that allow you to define how many records to retrieve and extra options for genes. These parameters should be passed as an object.

i.e it can return a subset of a genetic sequence of a requested species
```
 ncbi.fetch('sra', 'solenopsis_invicta')
 => {"EXPERIMENT_PACKAGE_SET":
       {"EXPERIMENT_PACKAGE":
         [{"EXPERIMENT":
           [{"$":{"xmlns":"","alias":"Me","accession":"SRX757228,
           ...
```
With advanced parameters for sequence databases(all are optional):
```
 var opts = {
   db: 'nucest',
   term: 'guillardia_theta',
   strand: 1,
   complexity: 4
 }
 ncbi.fetch(opts)
 => { id: 'gi|557436392|gb|HE992975.1|HE992975 HE992975 Guillardia theta CCMP 327 Guillardia theta cDNA clone sg-p_014_h06, mRNA sequence',
     seq: 'GAAGGCGATTCCAATGGTGCGAGCGAGGCAGCGAACAGACGCAGCGGGGAGAG...
    }
 => [...]
```
For some databases there are multiple return types. A default one will be chosen automatically, however it is possible to specify this via the rettype option e.g:

The NCBI website provides a list of databasese supported by efetch here: http://www.ncbi.nlm.nih.gov/books/NBK25497/table/chapter2.T._entrez_unique_identifiers_ui/?report=objectonly
¶

Default rettypes if user doesn’t provide any
¶

bionode-seq (see source)

Module for DNA, RNA and protein sequences manipulation.

doi: ? author: Bruno Vieira email: mail@bmpvieira.com license: MIT
¶

Usage

See the methods below.
¶

Check sequence type

Takes a sequence string and checks if it’s DNA, RNA or protein. Follows IUPAC notation which allows ambiguous sequence notation. In this case the sequence is labelled as ambiguous nucleotide rather than amino acid sequence.

seq.checkType("ATGACCCTGAGAAGAGCACCG");
=> "dna"
seq.checkType("AUGACCCUGAAGGUGAAUGAA");
=> "rna"
seq.checkType("MAYKSGKRPTFFEVFKAHCSDS");
=> "protein"
seq.checkType("AMTGACCCTGAGAAGAGCACCG");
=> "ambiguousDna"
seq.checkType("AMUGACCCUGAAGGUGAAUGAA");
=> "ambiguousRna"

¶

Takes a sequence type argument and returns a function to complement bases.
¶

Reverse sequence

Takes sequence string and returns the reverse sequence.

seq.reverse("ATGACCCTGAAGGTGAA");
=> "AAGTGGAAGTCCCAGTA"

¶

(Reverse) complement sequence

Takes a sequence string and optional boolean for reverse, and returns its complement.

seq.complement("ATGACCCTGAAGGTGAA");
=> "TACTGGGACTTCCACTT"
seq.complement("ATGACCCTGAAGGTGAA", true);
=> "TTCACCTTCAGGGTCAT"
//Alias
seq.reverseComplement("ATGACCCTGAAGGTGAA");
=> "TTCACCTTCAGGGTCAT"

¶

Takes a sequence string and returns the reverse complement (syntax sugar).
¶

Transcribe base

Takes a base character and returns the transcript base.

seq.getTranscribedBase("A");
=> "U"
seq.getTranscribedBase("T");
=> "A"
seq.getTranscribedBase("t");
=> "a"
seq.getTranscribedBase("C");
=> "G"

¶

Get codon amino acid

Takes an RNA codon and returns the translated amino acid.

seq.getTranslatedAA("AUG");
=> "M"
seq.getTranslatedAA("GCU");
=> "A"
seq.getTranslatedAA("CUU");
=> "L"

¶

Remove introns

Take a sequence and an array of exonsRanges and removes them.

seq.removeIntrons("ATGACCCTGAAGGTGAATGACAG", [[1, 8]]);
=> "TGACCCT"
seq.removeIntrons("ATGACCCTGAAGGTGAATGACAG", [[2, 9], [12, 20]]);
=> "GACCCTGGTGAATGA"

¶

Transcribe sequence
¶

Takes a sequence string and returns the transcribed sequence (dna <-> rna). If an array of exons is given, the introns will be removed from the sequence.
```
seq.transcribe("ATGACCCTGAAGGTGAA");
=> "AUGACCCUGAAGGUGAA"
seq.transcribe("AUGACCCUGAAGGUGAA"); //reverse
=> "ATGACCCTGAAGGTGAA"
```
¶

Translate sequence

Takes a DNA or RNA sequence and translates it to protein If an array of exons is given, the introns will be removed from the sequence.

seq.translate("ATGACCCTGAAGGTGAATGACAGGAAGCCCAAC"); //dna
=> "MTLKVNDRKPN"
seq.translate("AUGACCCUGAAGGUGAAUGACAGGAAGCCCAAC"); //rna
=> "MTLKVNDRKPN"
seq.translate("ATGACCCTGAAGGTGAATGACAGGAAGCC", [[3, 21]]);
=> "LKVND"

¶

Reverse exons

Takes an array of exons and the length of the reference and returns inverted coordinates.

seq.reverseExons([[2,8]], 20);
=> [ [ 12, 18 ] ]
seq.reverseExons([[10,45], [65,105]], 180);
=> [ [ 135, 170 ], [ 75, 115 ] ]

¶

Find non-canonical splice sites

Takes a sequence and exons ranges and returns an array of non canonical splice sites.

seq.findNonCanonicalSplices("GGCGGCGGCGGTGAGGTGGACCTGCGCGAATACGTGGTCGCCCTGT", [[0, 10], [20, 30]]);
=> [ 20 ]
seq.findNonCanonicalSplices("GGCGGCGGCGGTGAGGTGAGCCTGCGCGAATACGTGGTCGCCCTGT", [[0, 10], [20, 30]]);
=> []

¶

Check canonical translation start site

Takes a sequence and returns boolean for canonical translation start site.

seq.checkCanonicalTranslationStartSite("ATGACCCTGAAGGT");
=> true
seq.checkCanonicalTranslationStartSite("AATGACCCTGAAGGT");
=> false

¶

Get reading frames

Takes a sequence and returns an array with the six possible Reading Frames (+1, +2, +3, -1, -2, -3).

seq.getReadingFrames("ATGACCCTGAAGGTGAATGACAGGAAGCCCAAC");
=> [ 'ATGACCCTGAAGGTGAATGACAGGAAGCCCAAC',
     'TGACCCTGAAGGTGAATGACAGGAAGCCCAAC',
     'GACCCTGAAGGTGAATGACAGGAAGCCCAAC',
     'GTTGGGCTTCCTGTCATTCACCTTCAGGGTCAT',
     'TTGGGCTTCCTGTCATTCACCTTCAGGGTCAT',
     'TGGGCTTCCTGTCATTCACCTTCAGGGTCAT' ]

¶

Get open reading frames

Takes a Reading Frame sequence and returns an array of Open Reading Frames.

seq.getOpenReadingFrames("ATGACCCTGAAGGTGAATGACAGGAAGCCCAAC");
=> [ 'ATGACCCTGAAGGTGAATGACAGGAAGCCCAAC' ]
seq.getOpenReadingFrames("AUGACCCUGAAGGUGAAUGACAGGAAGCCCAAC");
=> [ 'AUGACCCUGAAGGUGAAUGACAGGAAGCCCAAC' ]
seq.getOpenReadingFrames("ATGAGAAGCCCAACATGAGGACTGA");
=> [ 'ATGAGAAGCCCAACATGA', 'GGACTGA' ]

¶

Get all open reading frames

Takes a sequence and returns all Open Reading Frames in the six Reading Frames.

seq.getAllOpenReadingFrames("ATGACCCTGAAGGTGAATGACA");
=> [ [ 'ATGACCCTGAAGGTGAATGACA' ],
     [ 'TGA', 'CCCTGA', 'AGGTGA', 'ATGACA' ],
     [ 'GACCCTGAAGGTGAATGA', 'CA' ],
     [ 'TGTCATTCACCTTCAGGGTCAT' ],
     [ 'GTCATTCACCTTCAGGGTCAT' ],
     [ 'TCATTCACCTTCAGGGTCAT' ] ]

¶

Find longest open reading frame
¶

Takes a sequence and returns the longest ORF from all six reading frames and corresponding frame symbol (+1, +2, +3, -1, -2, -3). If a frame symbol is specified, only look for longest ORF on that frame. When sorting ORFs, if there’s a tie, choose the one that starts with start codon Methionine. If there’s still a tie, return one randomly.
```
seq.findLongestOpenReadingFrame("ATGACCCTGAAGGTGAATGACA");
=> [ 'ATGACCCTGAAGGTGAATGACA', '+1' ]
seq.findLongestOpenReadingFrame("ATGACCCTGAAGGTGAATGACA", "-1");
=> "TGTCATTCACCTTCAGGGTCAT"
```
¶

Get longest ORFs for all six possible reading frames
¶

Get longest ORF
¶

Helper that sorts by length, giving priority to ones that start with ATG/AUG
¶

Helper that takes an array and returns longest Reading Frame

Bionode Documentation

bionode-fasta (see source)

Usage

Fasta

bionode-ncbi (see source)

Usage

Search

Link

Property link (Plink)

Download

URLs

Expand

Fetch

bionode-seq (see source)

Usage

Check sequence type

Reverse sequence

(Reverse) complement sequence

Transcribe base

Get codon amino acid

Remove introns

Transcribe sequence

Translate sequence

Reverse exons

Find non-canonical splice sites

Check canonical translation start site

Get reading frames

Get open reading frames

Get all open reading frames

Find longest open reading frame