1st Edition

Biological Sequence Analysis Using the SeqAn C++ Library

By Andreas Gogol-Döring, Knut Reinert Copyright 2010
    330 Pages 47 B/W Illustrations
    by CRC Press

    330 Pages 47 B/W Illustrations
    by CRC Press

    An Easy-to-Use Research Tool for Algorithm Testing and Development

    Before the SeqAn project, there was clearly a lack of available implementations in sequence analysis, even for standard tasks. Implementations of needed algorithmic components were either unavailable or hard to access in third-party monolithic software products. Addressing these concerns, the developers of SeqAn created a comprehensive, easy-to-use, open source C++ library of efficient algorithms and data structures for the analysis of biological sequences. Written by the founders of this project, Biological Sequence Analysis Using the SeqAn C++ Library covers the SeqAn library, its documentation, and the supporting infrastructure.

    The first part of the book describes the general library design. It introduces biological sequence analysis problems, discusses the benefit of using software libraries, summarizes the design principles and goals of SeqAn, details the main programming techniques used in SeqAn, and demonstrates the application of these techniques in various examples. Focusing on the components provided by SeqAn, the second part explores basic functionality, sequence data structures, alignments, pattern and motif searching, string indices, and graphs. The last part illustrates applications of SeqAn to genome alignment, consensus sequence in assembly projects, suffix array construction, and more.

    This handy book describes a user-friendly library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn enables not only the implementation of new algorithms, but also the sound analysis and comparison of existing algorithms.

    Visit SeqAn for more information.

    THE SEQAN PROJECT

    Background

    Sequences in Bioinformatics

    Sequence Analysis

    Software Libraries

    SeqAn

    Design of SeqAn

    Contents of SeqAn

    Testing

    Documentation

    Distribution

    Library Design

    Design Overview

    Design Goals

    Programming Techniques

    The C++ Programming Language

    Generic Programming

    Template Subclassing

    Global Function Interfaces

    Metafunctions

    Further Techniques

    The Design in Examples

    Example 1: Value Counting

    Example 2: Locality-Sensitive Hashing

    LIBRARY CONTENTS

    Basics

    Containers and Values

    Memory Allocation

    Move Operations

    Alphabets

    Iterators

    Conversions

    Sequences

    Strings

    Overflow Strategies

    String Specializations

    Sequence Adaptors

    Iterating Sequences

    Sequence Modifiers

    Segments

    Comparators

    String Sets

    Sequence Conversion

    File Input/Output

    Alignments

    Gaps Data Structures

    Alignment Data Structures

    Alignment Scoring

    Alignment Problems Overview

    Global Alignments

    Chaining

    Pattern Matching

    Exact Searching

    Exact Searching of Multiple Needles

    Approximate Searching

    Other Pattern Matching Problems

    Motif Finding

    Local Alignments

    Seed-Based Motif Search

    Multiple Sequence Motifs

    Indices

    Working with Indices

    q-Gram Indices

    Suffix Arrays

    Enhanced Suffix Arrays

    Graphs

    Automata

    Alignment Graphs

    APPLICATIONS

    Aligning Sequences with LAGAN

    The LAGAN Algorithm

    Implementation of LAGAN

    Results

    Multiple Alignment with Segments

    The Algorithm

    Implementation

    Results

    Basic Statistical Indices for SeqAn

    Statistical Indices and Biological Sequence Analysis

    Mathematical Outline

    SeqAn Algorithms and Data Types

    Implementation Outline

    A BWT-Based Suffix Array Construction

    Introduction to BWTWalk

    The Main Idea of BWTWalk

    Saving Space

    SeqAn Implementation of BWTWalkFast

    Containers with and without Fast Random Access

    In-Place Version

    Experiments

    Conclusion

    Bibliography

    Index

    Biography

    Co-founder of the SeqAn project, Andreas Gogol-Döring works at the Max Delbrück Center for Molecular Medicine in Berlin, Germany. He was previously a research associate in the Algorithmic Bioinformatics group in the Department of Computer Science at Freie Universität Berlin in Germany.

    Co-founder of the SeqAn project, Knut Reinert is a professor and head of the Algorithmic Bioinformatics group in the Department of Computer Science at Freie Universität Berlin in Germany.