The computational methods of bioinformatics are being used more and more to process the large volume of current biological data. Promoting an understanding of the underlying biology that produces this data, Pattern Discovery in Bioinformatics: Theory and Algorithms provides the tools to study regularities in biological data.
Taking a systematic approach to pattern discovery, the book supplies sound mathematical definitions and efficient algorithms to explain vital information about biological data. It explores various data patterns, including strings, clusters, permutations, topology, partial orders, and boolean expressions. Each of these classes captures a different form of regularity in the data, providing possible answers to a wide range of questions. The book also reviews basic statistics, including probability, information theory, and the central limit theorem.
This self-contained book provides a solid foundation in computational methods, enabling the solution of difficult biological questions.
INTRODUCTION
Ubiquity of Patterns
Motivations Form Biology
The Need for Rigor
Who Is a Reader of This Book?
THE FUNDAMENTALS
BASIC ALGORITHMICS
Introduction
Graphs
Tree Problem 1: (Minimum Spanning Tree)
Tree Problem 2: (Steiner Tree)
Tree Problem 3: (Minimum Mutation Labeling)
Storing and Retrieving Elements
Asymptotic Functions
Recurrence Equations
NP-Complete Class of Problems
BASIC STATISTICS
Introduction
Basic Probability
The Bare Truth about Inferential Statistics
Summary
WHAT ARE PATTERNS?
Introduction
Common Thread
Pattern Duality
Irredundant Patterns
Constrained Patterns
When Is a Pattern Specification Non-Trivial?
Classes of Patterns
PATTERNS ON LINEAR STRINGS
MODELING THE STREAM OF LIFE
Introduction
Modeling a Biopolymer
Bernoulli Scheme
Markov Chain
Hidden Markov Model (HMM)
Comparison of the Schemes
Conclusion
STRING PATTERN SPECIFICATIONS
Introduction
Notation
Solid Patterns
Rigid Patterns
Extensible Patterns
Generalizations
ALGORITHMS AND PATTERN STATISTICS
Introduction
Discovery Algorithm
Pattern Statistics
Rigid Patterns
Extensible Patterns
Measure of Surprise
Applications
MOTIF LEARNING
Introduction: Local Multiple Alignment
Probabilistic Model: Motif Profile
The Learning Problem
Importance Measure
Algorithms to Learn a Motif Profile
An Expectation Maximization Framework
A Gibbs Sampling Strategy
Interpreting the Motif Profile in Terms of p
THE SUBTLE MOTIF
Introduction: Consensus Motif
Combinatorial Model: Subtle Motif
Distance between Motifs
Statistics of Subtle Motifs
Performance Score
Enumeration Schemes
A Combinatorial Algorithm
A Probabilistic Algorithm
A Modular Solution
Conclusion
PATTERNS ON META-DATA
PERMUTATION PATTERNS
Introduction
Notation
How Many Permutation Patterns?
Maximality
Parikh Mapping-Based Algorithm
Intervals
Intervals to PQ Trees
Applications
Conclusion
PERMUTATION PATTERN PROBABILITIES
Introduction
Unstructured Permutations
Structured Permutations
TOPOLOGICAL MOTIFS
Introduction
What Are Topological Motifs?
The Topological Motif
Compact Topological Motifs
The Discovery Method
Related Classical Problems
Applications
Conclusion
SET-THEORETIC ALGORITHMIC TOOLS
Introduction
Some Basic Properties of Finite Sets
Partial Order Graph G(S,E) of Sets
Boolean Closure of Sets
Consecutive (Linear) Arrangement of Set Members
Maximal Set Intersection Problem (maxSIP)
Minimal Set Intersection Problem (minSIP)
Multi-Sets
Adapting the Enumeration Scheme
EXPRESSION AND PARTIAL ORDER MOTIFS
Introduction
Extracting (monotone CNF) Boolean Expressions
Extracting Partial Orders
Statistics of Partial Orders
Redescriptions
Application: Partial Order of Expressions
Summary
REFERENCES
INDEX
Exercises appear at the end of every chapter.
"Pattern Discovery in Bioinformatics provides an important message to all bioinformatics students and researchers. In contrast to some books that present computational biology as a combination of a BLAST user manual and storytelling, Laxmi Parida's new book makes the crucial point that most bioinformatics success stories require algorithmic and statistical ingenuity. It is a comprehensive and excellent book on the subject and I recommend it to everyone who has an interest in bioinformatics."
-Pavel Pevzner, Ronald R. Taylor Professor of Computer Science, University of California, San Diego, USA
"Laxmi Parida has written a comprehensive and very readable treatment of the discovery and analysis of patterns. Not just another textbook on string theory or combinatorics, nor an arcane reference book on the design of algorithms, nor a source book of sequence analysis software, but a coherent treatment of the kinds of patterns that computational biologists discover at the genomic level, in nucleic acid sequence, in protein structure, in microarray data, and in other linear orders and more general formal structures. Eclectically drawing on the whole gamut of discrete mathematical domains, Parida has built on her own numerous contributions during a distinguished career in putting together her unique take on this rapidly growing field. She insists throughout on mathematical rigour but makes the material eminently approachable by careful development from first principles, clear motivation and exposition, and a wealth of interesting and illustrative examples and exercises ranging from easy to challenging. Any enterprising student or neophyte in this field will gain a clear advantage by working through this book and adopting the wide-ranging but precise and sophisticated attitude to the many new problems that are continually emerging as biological data become more diverse and complex, as well as numerous. And established researchers seeking to broaden their perspective will enjoy the leisurely and insightful presentation of both new and familiar aspects of currently relevant research techniques."
-David Sankoff, University of Ottawa, Canada
"Biology has been transformed from a data poor to a data rich field, with massive accumulation of disparate types of data, for example huge databases of sequences (DNA, RNA, or protein). This data allows important biological insights to be made, partly by finding patterns and motifs that are conserved across many individuals or species; there is now a huge biological literature reporting on such conserved patterns and motifs that have been found in biological datasets. In contrast to the area of pattern matching, the patterns and motifs are generally not known ahead of time, but must be identified or discovered from the data; this task is often very subtle and difficult because the patterns and motifs may be short, may be highly degenerate (containing wildcards and variable length elements), may be ordered differently in different genomes, and are generally hidden in that they make up a small fraction of the data. For particular biological applications, even the definition of a relevant pattern may be difficult to state clearly, or may be unresolved. This has lead to a wonderful opportunity to develop a rigorous mathematical and algorithmic theory of patterns and pattern discovery in biological data. Laxmi Parida has been a leader in this effort, making significant contributions to both deep theory and to practical methods in biological pattern discovery. Her book explores a large subset of that theory, motivated by biological sequence and network data. I believe this book will be viewed as a seminal contribution and hope it draws many new people into the area of rigorous pattern discovery in bioinformatics."
-Dan Gusfield, University of California, Davis, USA