Pattern Discovery in Bioinformatics

Pattern Discovery in Bioinformatics: Theory & Algorithms

Series:
Published:
Author(s):
Free Standard Shipping

Purchasing Options

Hardback
$99.95
Add to cart
ISBN 9781584885498
Cat# C5491
eBook
ISBN 9781420010732
Cat# CE5491
 

Features

  • Covers a broad range of topics from the fundamentals of pattern discovery algorithms to basic statistics to state-of-the-art implementation techniques
  • Clarifies how computational methods work and what problems they solve
  • Introduces basic algorithmic concepts, such as trees, before exploring them in detail
  • Focuses on models of biological sequences, including DNA, RNA, and protein sequences
  • Includes illustrative proofs to justify concise factual statements as well as exercises at the end of each chapter for self-study or classroom teaching
  • Summary

    The computational methods of bioinformatics are being used more and more to process the large volume of current biological data. Promoting an understanding of the underlying biology that produces this data, Pattern Discovery in Bioinformatics: Theory and Algorithms provides the tools to study regularities in biological data.

    Taking a systematic approach to pattern discovery, the book supplies sound mathematical definitions and efficient algorithms to explain vital information about biological data. It explores various data patterns, including strings, clusters, permutations, topology, partial orders, and boolean expressions. Each of these classes captures a different form of regularity in the data, providing possible answers to a wide range of questions. The book also reviews basic statistics, including probability, information theory, and the central limit theorem.

    This self-contained book provides a solid foundation in computational methods, enabling the solution of difficult biological questions.

    Table of Contents

    INTRODUCTION
    Ubiquity of Patterns
    Motivations Form Biology
    The Need for Rigor
    Who Is a Reader of This Book?


    THE FUNDAMENTALS
    BASIC ALGORITHMICS
    Introduction
    Graphs
    Tree Problem 1: (Minimum Spanning Tree)
    Tree Problem 2: (Steiner Tree)
    Tree Problem 3: (Minimum Mutation Labeling)
    Storing and Retrieving Elements
    Asymptotic Functions
    Recurrence Equations
    NP-Complete Class of Problems

    BASIC STATISTICS
    Introduction
    Basic Probability
    The Bare Truth about Inferential Statistics
    Summary

    WHAT ARE PATTERNS?
    Introduction
    Common Thread
    Pattern Duality
    Irredundant Patterns
    Constrained Patterns
    When Is a Pattern Specification Non-Trivial?
    Classes of Patterns


    PATTERNS ON LINEAR STRINGS
    MODELING THE STREAM OF LIFE
    Introduction
    Modeling a Biopolymer
    Bernoulli Scheme
    Markov Chain
    Hidden Markov Model (HMM)
    Comparison of the Schemes
    Conclusion

    STRING PATTERN SPECIFICATIONS
    Introduction
    Notation
    Solid Patterns
    Rigid Patterns
    Extensible Patterns
    Generalizations

    ALGORITHMS AND PATTERN STATISTICS
    Introduction
    Discovery Algorithm
    Pattern Statistics
    Rigid Patterns
    Extensible Patterns
    Measure of Surprise
    Applications

    MOTIF LEARNING
    Introduction: Local Multiple Alignment
    Probabilistic Model: Motif Profile
    The Learning Problem
    Importance Measure
    Algorithms to Learn a Motif Profile
    An Expectation Maximization Framework
    A Gibbs Sampling Strategy
    Interpreting the Motif Profile in Terms of p

    THE SUBTLE MOTIF
    Introduction: Consensus Motif
    Combinatorial Model: Subtle Motif
    Distance between Motifs
    Statistics of Subtle Motifs
    Performance Score
    Enumeration Schemes
    A Combinatorial Algorithm
    A Probabilistic Algorithm
    A Modular Solution
    Conclusion


    PATTERNS ON META-DATA
    PERMUTATION PATTERNS
    Introduction
    Notation
    How Many Permutation Patterns?
    Maximality
    Parikh Mapping-Based Algorithm
    Intervals
    Intervals to PQ Trees
    Applications
    Conclusion

    PERMUTATION PATTERN PROBABILITIES
    Introduction
    Unstructured Permutations
    Structured Permutations

    TOPOLOGICAL MOTIFS
    Introduction
    What Are Topological Motifs?
    The Topological Motif
    Compact Topological Motifs
    The Discovery Method
    Related Classical Problems
    Applications
    Conclusion

    SET-THEORETIC ALGORITHMIC TOOLS
    Introduction
    Some Basic Properties of Finite Sets
    Partial Order Graph G(S,E) of Sets
    Boolean Closure of Sets
    Consecutive (Linear) Arrangement of Set Members
    Maximal Set Intersection Problem (maxSIP)
    Minimal Set Intersection Problem (minSIP)
    Multi-Sets
    Adapting the Enumeration Scheme

    EXPRESSION AND PARTIAL ORDER MOTIFS
    Introduction
    Extracting (monotone CNF) Boolean Expressions
    Extracting Partial Orders
    Statistics of Partial Orders
    Redescriptions
    Application: Partial Order of Expressions
    Summary

    REFERENCES

    INDEX

    Exercises appear at the end of every chapter.

    Editorial Reviews

    "Pattern Discovery in Bioinformatics provides an important message to all bioinformatics students and researchers. In contrast to some books that present computational biology as a combination of a BLAST user manual and storytelling, Laxmi Parida's new book makes the crucial point that most bioinformatics success stories require algorithmic and statistical ingenuity. It is a comprehensive and excellent book on the subject and I recommend it to everyone who has an interest in bioinformatics."
    -Pavel Pevzner, Ronald R. Taylor Professor of Computer Science, University of California, San Diego, USA

    "Laxmi Parida has written a comprehensive and very readable treatment of the discovery and analysis of patterns. Not just another textbook on string theory or combinatorics, nor an arcane reference book on the design of algorithms, nor a source book of sequence analysis software, but a coherent treatment of the kinds of patterns that computational biologists discover at the genomic level, in nucleic acid sequence, in protein structure, in microarray data, and in other linear orders and more general formal structures. Eclectically drawing on the whole gamut of discrete mathematical domains, Parida has built on her own numerous contributions during a distinguished career in putting together her unique take on this rapidly growing field. She insists throughout on mathematical rigour but makes the material eminently approachable by careful development from first principles, clear motivation and exposition, and a wealth of interesting and illustrative examples and exercises ranging from easy to challenging. Any enterprising student or neophyte in this field will gain a clear advantage by working through this book and adopting the wide-ranging but precise and sophisticated attitude to the many new problems that are continually emerging as biological data become more diverse and complex, as well as numerous. And established researchers seeking to broaden their perspective will enjoy the leisurely and insightful presentation of both new and familiar aspects of currently relevant research techniques."
    -David Sankoff, University of Ottawa, Canada

    "Biology has been transformed from a data poor to a data rich field, with massive accumulation of disparate types of data, for example huge databases of sequences (DNA, RNA, or protein). This data allows important biological insights to be made, partly by finding patterns and motifs that are conserved across many individuals or species; there is now a huge biological literature reporting on such conserved patterns and motifs that have been found in biological datasets. In contrast to the area of pattern matching, the patterns and motifs are generally not known ahead of time, but must be identified or discovered from the data; this task is often very subtle and difficult because the patterns and motifs may be short, may be highly degenerate (containing wildcards and variable length elements), may be ordered differently in different genomes, and are generally hidden in that they make up a small fraction of the data. For particular biological applications, even the definition of a relevant pattern may be difficult to state clearly, or may be unresolved. This has lead to a wonderful opportunity to develop a rigorous mathematical and algorithmic theory of patterns and pattern discovery in biological data. Laxmi Parida has been a leader in this effort, making significant contributions to both deep theory and to practical methods in biological pattern discovery. Her book explores a large subset of that theory, motivated by biological sequence and network data. I believe this book will be viewed as a seminal contribution and hope it draws many new people into the area of rigorous pattern discovery in bioinformatics."
    -Dan Gusfield, University of California, Davis, USA

    Textbooks
    Other CRC Press Sites
    Featured Authors
    STAY CONNECTED
    Facebook Page for CRC Press Twitter Page for CRC Press You Tube Channel for CRC Press LinkedIn Page for CRC Press Google Plus Page for CRC Press
    Sign Up for Email Alerts
    © 2013 Taylor & Francis Group, LLC. All Rights Reserved. Privacy Policy | Cookie Use | Shipping Policy | Contact Us