Biological Sequence Analysis Using the SeqAn C++ Library

Series:
Published:
Author(s):

Purchasing Options

Hardback
$96.95
Add to cart
ISBN 9781420076233
Cat# C7623
 

Features

  • Offers a collection of functional, well-designed algorithmic components
  • Illustrates the advantage of using a software library
  • Covers the design principles and programming techniques of SeqAn
  • Presents the components supplied by SeqAn, such as sequence data structures, pattern and motif searching, string indices, and graphs
  • Shows how SeqAn can be used to solve biological sequence problems, including genome alignment, consensus sequence computation, and the construction of a suffix array algorithm

Summary

An Easy-to-Use Research Tool for Algorithm Testing and Development

Before the SeqAn project, there was clearly a lack of available implementations in sequence analysis, even for standard tasks. Implementations of needed algorithmic components were either unavailable or hard to access in third-party monolithic software products. Addressing these concerns, the developers of SeqAn created a comprehensive, easy-to-use, open source C++ library of efficient algorithms and data structures for the analysis of biological sequences. Written by the founders of this project, Biological Sequence Analysis Using the SeqAn C++ Library covers the SeqAn library, its documentation, and the supporting infrastructure.

The first part of the book describes the general library design. It introduces biological sequence analysis problems, discusses the benefit of using software libraries, summarizes the design principles and goals of SeqAn, details the main programming techniques used in SeqAn, and demonstrates the application of these techniques in various examples. Focusing on the components provided by SeqAn, the second part explores basic functionality, sequence data structures, alignments, pattern and motif searching, string indices, and graphs. The last part illustrates applications of SeqAn to genome alignment, consensus sequence in assembly projects, suffix array construction, and more.

This handy book describes a user-friendly library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn enables not only the implementation of new algorithms, but also the sound analysis and comparison of existing algorithms.

Visit SeqAn for more information.

Table of Contents

THE SEQAN PROJECT

Background

Sequences in Bioinformatics

Sequence Analysis

Software Libraries

SeqAn

Design of SeqAn

Contents of SeqAn

Testing

Documentation

Distribution

Library Design

Design Overview

Design Goals

Programming Techniques

The C++ Programming Language

Generic Programming

Template Subclassing

Global Function Interfaces

Metafunctions

Further Techniques

The Design in Examples

Example 1: Value Counting

Example 2: Locality-Sensitive Hashing

LIBRARY CONTENTS

Basics

Containers and Values

Memory Allocation

Move Operations

Alphabets

Iterators

Conversions

Sequences

Strings

Overflow Strategies

String Specializations

Sequence Adaptors

Iterating Sequences

Sequence Modifiers

Segments

Comparators

String Sets

Sequence Conversion

File Input/Output

Alignments

Gaps Data Structures

Alignment Data Structures

Alignment Scoring

Alignment Problems Overview

Global Alignments

Chaining

Pattern Matching

Exact Searching

Exact Searching of Multiple Needles

Approximate Searching

Other Pattern Matching Problems

Motif Finding

Local Alignments

Seed-Based Motif Search

Multiple Sequence Motifs

Indices

Working with Indices

q-Gram Indices

Suffix Arrays

Enhanced Suffix Arrays

Graphs

Automata

Alignment Graphs

APPLICATIONS

Aligning Sequences with LAGAN

The LAGAN Algorithm

Implementation of LAGAN

Results

Multiple Alignment with Segments

The Algorithm

Implementation

Results

Basic Statistical Indices for SeqAn

Statistical Indices and Biological Sequence Analysis

Mathematical Outline

SeqAn Algorithms and Data Types

Implementation Outline

A BWT-Based Suffix Array Construction

Introduction to BWTWalk

The Main Idea of BWTWalk

Saving Space

SeqAn Implementation of BWTWalkFast

Containers with and without Fast Random Access

In-Place Version

Experiments

Conclusion

Bibliography

Index

Author Bio(s)

Co-founder of the SeqAn project, Andreas Gogol-Döring works at the Max Delbrück Center for Molecular Medicine in Berlin, Germany. He was previously a research associate in the Algorithmic Bioinformatics group in the Department of Computer Science at Freie Universität Berlin in Germany.

Co-founder of the SeqAn project, Knut Reinert is a professor and head of the Algorithmic Bioinformatics group in the Department of Computer Science at Freie Universität Berlin in Germany.

Related Titles