1st Edition

Combinatorial Pattern Matching Algorithms in Computational Biology Using Perl and R

By Gabriel Valiente Copyright 2009
    368 Pages 89 B/W Illustrations
    by Chapman & Hall

    368 Pages 89 B/W Illustrations
    by Chapman & Hall

    Emphasizing the search for patterns within and between biological sequences, trees, and graphs, Combinatorial Pattern Matching Algorithms in Computational Biology Using Perl and R shows how combinatorial pattern matching algorithms can solve computational biology problems that arise in the analysis of genomic, transcriptomic, proteomic, metabolomic, and interactomic data. It implements the algorithms in Perl and R, two widely used scripting languages in computational biology.

    The book provides a well-rounded explanation of traditional issues as well as an up-to-date account of more recent developments, such as graph similarity and search. It is organized around the specific algorithmic problems that arise when dealing with structures that are commonly found in computational biology, including biological sequences, trees, and graphs. For each of these structures, the author makes a clear distinction between problems that arise in the analysis of one structure and in the comparative analysis of two or more structures. He also presents phylogenetic trees and networks as examples of trees and graphs in computational biology.

    This book supplies a comprehensive view of the whole field of combinatorial pattern matching from a computational biology perspective. Along with thorough discussions of each biological problem, it includes detailed algorithmic solutions in pseudo-code, full Perl and R implementation, and pointers to other software, such as those on CPAN and CRAN.

    Introduction

    Combinatorial Pattern Matching

    Computational Biology

    A Motivating Example: Gene Prediction

    SEQUENCE PATTERN MATCHING

    Sequences

    Sequences in Mathematics

    Sequences in Computer Science

    Sequences in Computational Biology

    Simple Pattern Matching in Sequences

    Finding Words in Sequences

    General Pattern Matching in Sequences

    Finding Subsequences

    Finding Common Subsequences

    Comparing Sequences

    TREE PATTERN MATCHING

    Trees

    Trees in Mathematics

    Trees in Computer Science

    Trees in Computational Biology

    Simple Pattern Matching in Trees

    Finding Paths in Unrooted Trees

    Finding Paths in Rooted Trees

    General Pattern Matching in Trees

    Finding Subtrees

    Finding Common Subtrees

    Comparing Trees

    GRAPH PATTERN MATCHING

    Graphs

    Graphs in Mathematics

    Graphs in Computer Science

    Graphs in Computational Biology

    Simple Pattern Matching in Graphs

    Finding Paths in Graphs

    Finding Trees in Graphs

    General Pattern Matching in Graphs

    Finding Subgraphs

    Finding Common Subgraphs

    Comparing Graphs

    Appendix A: Elements of Perl

    Perl Scripts

    Overview of Perl

    Perl Quick Reference Card

    Appendix B: Elements of R

    R Scripts

    Overview of R

    R Quick Reference Card

    References

    Index

    Bibliographic Notes appear at the end of each chapter.

    Biography

    Gabriel Valiente is Associate Professor in the Department of Software at the Technical University of Catalonia. His research interests include computational biology, bioinformatics, exact and approximate matching in graphs and patterns, and graph transformation.

    I like the hands-on approach this book offers, and the very pedagogical structure it follows … . The book also has tons of examples, thoughtfully chosen and beautifully laid out … the book is very well-written and accessible, undoubtedly written by an author who takes great care in preparing his manuscripts and teaching about an area he enjoys working on.
    —Anthony Labarre, SIGACT News, July 2012

    This text provides a solid foundation to the field. It will work as a practical handbook for pattern matching applications in computational biology. 
    —Michael Goldberg, Computing Reviews, February 2010

    … the book makes a clear distinction between problems that emerge in the analysis of the structure and in the comparative analysis of two or more structures. … Well-known computational biology tools that allow searching nucleotide and protein databases for local sequence alignment are based on CPM algorithms only. The techniques presented in this book go beyond that. … detailed algorithm solutions in pseudocode, full Perl and R implementation, and pointers to software and implementation are presented. This … is what makes Valiente’s effort unique. …
    —Ernesto D’Avanzo, Computing Reviews, February 2010

    … It is a well-sorted collection of pattern matching algorithms that are used to work with problems in computational biology. … You can find all of the sources on the author’s website, which come in handy when you actually want to use them, since you do not have to retype them. And there is an introduction to Perl as well as to R, showing how to decode DNA/RNA-triplets to amino acids and giving some basic overview over standard functions. … I certainly recommend this as an introduction and reference to some algorithms of pattern matching in computational biology. You actually learn how algorithms over the most important data types are designed in a straightforward, logical way. …
    —Jannik Pewny, IACR Book Reviews, 2009

    …after a few minutes of random browsing, I was left with a feeling of total appreciation of the book, admiration for Prof. Gabriel Valiente, and a realization that this book will be part of my fundamental library for me and my group from the moment it is published. There are so many good things to say that I do not know where to start. The organization is straightforward with major sections that extend from simple sequences to trees to graphs. … This parallel structure makes it easy to apply lessons used on the simplest object (sequences) to objects of medium (trees) and significant (graphs) difficulty. …a wonderful way to learn leveraging … The Perl is beautifully clear and the examples have already taught me how to improve my own code.
    —Michael Levitt, Professor and Chair, Department of Structural Biology, Stanford University, California, USA

    …Balancing a careful mixture of formal methods, programming, and examples, Gabriel Valiente has managed to harmoniously bridge languages and contents into a self-contained source of lasting influence. It is not difficult to predict that this book will be studied indifferently by the specialist of biology and computer science, helping each to walk a few steps toward the other. It will entice new generations of scholars to engage in its beautiful subject.
    —From the Foreword, Alberto Apostolico, Professor, College of Computing, Georgia Tech, Atlanta, USA

    Unlocks the power for R for Perl programmers, and vice versa. Reveals R to be a powerful and accessible tool for bioinformatics. The title is a mouthful, but the use of both R and Perl for bioinformatics is revealing.
    —Steven Skiena, Professor, Department of Computer Science, Stony Brook University, New York, USA