Boost C++ Libraries Home Libraries People FAQ More

Next

Boost.Genetics 0.1

Andy Thomason

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)


Table of Contents

Overview
Containers and Datatypes
File Formats
File Mapping
Examples
DNA string
Reverse complement
Inexact searches (or distance searches)
Rationale
History
Frequently Asked Questions FAQ
Credits and Acknowledgements
Navigation
Trac Tickets
Document Conventions
Documentation Version Info
Boost.Genetics C++ Reference
Header <boost/genetics/augmented_string.hpp>
Header <boost/genetics/dna_string.hpp>
Header <boost/genetics/fasta.hpp>
Header <boost/genetics/two_stage_index.hpp>
Header <boost/genetics/utils.hpp>

Boost.Genetics, a proposed library for the boost collection, provides containers and algorithms for working with genetic sequences which are mostly composed of the letters (or bases):

A G C T

Boost.Genetics provides methods for searching genomes. The human genome, for example, has around 3.2 billion such bases and can be read in either direction making it necessary to search around 6.4 billion bases. Searches are usually inexact. For example we may wish to search for

G A T A C A

But allow one error so that

G A A A C A or

T A T A C A

Are both allowed. This is very common because genomes vary by only a few bases or there may be errors in the data we read from sequencing machines.

There are some special characters that may occur in sequences especially N which represents an unknown base. It is used in reads (the data from a sequencing machine) to denote low quality values and in the genome to represent unknown or variable regions.

The initial release of Boost.Genetics will focus on Aligning. Aligning is the process of matching reads from a sequencing machine against a reference genome such as ENSEMBL. Alignment must be done on both strands of the DNA and may involve discovering gaps called "introns" between the coding regions "exons" and coping with errors to get the closest match.

Index

Last revised: June 28, 2015 at 12:19:41 GMT


Next