Statistics and Data Analysis for Microarrays Using R and Bioconductor, Second Edition

Series:
Published:
Content:
Author(s):
Free Standard Shipping

Purchasing Options

Hardback
ISBN 9781439809754
Cat# K10487

$93.95

$75.16

SAVE 20%


eBook (VitalSource)
ISBN 9781439896778
Cat# KE15458

$89.95

$62.97

SAVE 30%


eBook Rentals

Features

  • Presents an in-depth treatment of the statistical and data analysis aspects used in microarrays and bioinformatics
  • Provides the option of learning R in parallel with learning about data analysis
  • Covers background material for those with a limited mathematical, genetic, or molecular biology foundation
  • Includes R code on a CD-ROM

Summary

Richly illustrated in color, Statistics and Data Analysis for Microarrays Using R and Bioconductor, Second Edition provides a clear and rigorous description of powerful analysis techniques and algorithms for mining and interpreting biological information. Omitting tedious details, heavy formalisms, and cryptic notations, the text takes a hands-on, example-based approach that teaches students the basics of R and microarray technology as well as how to choose and apply the proper data analysis tool to specific problems.

New to the Second Edition
Completely updated and double the size of its predecessor, this timely second edition replaces the commercial software with the open source R and Bioconductor environments. Fourteen new chapters cover such topics as the basic mechanisms of the cell, reliability and reproducibility issues in DNA microarrays, basic statistics and linear models in R, experiment design, multiple comparisons, quality control, data pre-processing and normalization, Gene Ontology analysis, pathway analysis, and machine learning techniques. Methods are illustrated with toy examples and real data and the R code for all routines is available on an accompanying CD-ROM.

With all the necessary prerequisites included, this best-selling book guides students from very basic notions to advanced analysis techniques in R and Bioconductor. The first half of the text presents an overview of microarrays and the statistical elements that form the building blocks of any data analysis. The second half introduces the techniques most commonly used in the analysis of microarray data.

Table of Contents

Introduction
Bioinformatics — An Emerging Discipline

The Cell and Its Basic Mechanisms
The Cell
The Building Blocks of Genomic Information
Expression of Genetic Information
The Need for High-Throughput Methods

Microarrays
Microarrays — Tools for Gene Expression Analysis
Fabrication of Microarrays
Applications of Microarrays
Challenges in Using Microarrays in Gene Expression Studies
Sources of Variability

Reliability and Reproducibility Issues in DNA Microarray Measurements
Introduction
What Is Expected from Microarrays?
Basic Considerations of Microarray Measurements
Sensitivity
Accuracy
Reproducibility
Cross Platform Consistency
Sources of Inaccuracy and Inconsistencies in Microarray Measurements
The MicroArray Quality Control (MAQC) Project

Image Processing
Introduction
Basic Elements of Digital Imaging
Microarray Image Processing
Image Processing of cDNA Microarrays
Image Processing of Affymetrix Arrays

Introduction to R
Introduction to R
The Basic Concepts
Data Structures and Functions
Other Capabilities
The R Environment
Installing Bioconductor
Graphics
Control Structures in R
Programming in R vs C/C++/Java

Bioconductor: Principles and Illustrations
Overview
The Portal
Some Explorations and Analyses

Elements of Statistics
Introduction
Some Basic Concepts
Elementary Statistics
Degrees of Freedom
Probabilities
Bayes’ Theorem
Testing for (or Predicting) a Disease

Probability Distributions
Probability Distributions
Central Limit Theorem
Are Replicates Useful?

Basic Statistics in R
Introduction
Descriptive Statistics in R
Probabilities and Distributions in R
Central Limit Theorem

Statistical Hypothesis Testing
Introduction
The Framework
Hypothesis Testing and Significance
"I Do Not Believe God Does Not Exist"
An Algorithm for Hypothesis Testing
Errors in Hypothesis Testing

Classical Approaches to Data Analysis
Introduction
Tests Involving a Single Sample
Tests Involving Two Samples

Analysis of Variance (ANOVA)
Introduction
One-Way ANOVA
Two-Way ANOVA
Quality Control

Linear Models in R
Introduction and Model Formulation
Fitting Linear Models in R
Extracting Information from a Fitted Model: Testing Hypotheses and Making Predictions Some Limitations of the Linear Models
Dealing with Multiple Predictors and Interactions in the Linear Models, and Interpreting Model Coefficients

Experiment Design
The Concept of Experiment Design
Comparing Varieties
Improving the Production Process
Principles of Experimental Design
Guidelines for Experimental Design
A Short Synthesis of Statistical Experiment Designs
Some Microarray Specific Experiment Designs

Multiple Comparisons
Introduction
The Problem of Multiple Comparisons
A More Precise Argument
Corrections for Multiple Comparisons
Corrections for Multiple Comparisons in R

Analysis and Visualization Tools
Introduction
Box Plots
Gene Pies
Scatter Plots
Volcano Plots
Histograms
Time Series
Time Series Plots in R
Principal Component Analysis (PCA)
Independent Component Analysis (ICA)

Cluster Analysis
Introduction
Distance Metric
Clustering Algorithms
Partitioning around Medoids (PAM)
Biclustering
Clustering in R

Quality Control
Introduction
Quality Control for Affymetrix Data
Quality Control of Illumina Data

Data Pre-Processing and Normalization
Introduction
General Pre-Processing Techniques
Normalization Issues Specific to cDNA Data
Normalization Issues Specific to Affymetrix Data
Other Approaches to the Normalization of Affymetrix Data
Useful Pre-Processing and Normalization Sequences
Normalization Procedures in R
Batch Pre-Processing
Normalization Functions and Procedures for Illumina Data

Methods for Selecting Differentially Regulated Genes
Introduction
Criteria
Fold Change
Unusual Ratio
Hypothesis Testing, Corrections for Multiple Comparisons, and Resampling
ANOVA
Noise Sampling
Model-Based Maximum Likelihood Estimation Methods
Affymetrix Comparison Calls
Significance Analysis of Microarrays (SAM)
A Moderated t-Statistic
Other Methods
Reproducibility
Selecting Differentially Expressed (DE) Genes in R

The Gene Ontology (GO)
Introduction
The Need for an Ontology
What Is the Gene Ontology (GO)?
What Does GO Contain?
Access to GO
Other Related Resources

Functional Analysis and Biological Interpretation of Microarray Data
Over-Representation Analysis (ORA)
Onto-Express
Functional Class Scoring
The Gene Set Enrichment Analysis (GSEA)

Uses, Misuses, and Abuses in GO Profiling
Introduction
"Known Unknowns"
Which Way Is Up?
Negative Annotations
Common Mistakes in Functional Profiling
Using a Custom Level of Abstraction through the GO Hierarchy
Correlation between GO Terms
GO Slims and Subsets

A Comparison of Several Tools for Ontological Analysis
Introduction
Existing tools for Ontological Analysis
Comparison of Existing Functional Profiling Tools
Drawbacks and Limitations of the Current Approach

Focused Microarrays — Comparison and Selection
Introduction
Criteria for Array Selection
Onto-Compare
Some Comparisons

ID Mapping Issues
Introduction
Name Space Issues in Annotation Databases
A Comparison of Some ID Mapping Tools

Pathway Analysis
Terms and Problem Definition
Over-Representation and Functional Class Scoring Approaches in Pathway Analysis
An Approach for the Analysis of Metabolic Pathways
An Impact Analysis of Signaling Pathways
Variations on the Impact Analysis Theme
Pathway Guide
Kinetic models vs. Impact Analysis
Conclusions
Data Sets and Software Availability

Machine Learning Techniques
Introduction
Main Concepts and Definitions
Supervised Learning
Practicalities Using R

The Road Ahead
What Next?

References

A Summary appears at the end of each chapter.

Author Bio(s)

Sorin Draghici the Robert J. Sokol MD Endowed Chair in Systems Biology in the Department of Obstetrics and Gynecology, professor in the Department of Clinical and Translational Science and Department of Computer Science, and head of the Intelligent Systems and Bioinformatics Laboratory at Wayne State University. He is also the chief of the Bioinformatics and Data Analysis Section in the Perinatology Research Branch of the National Institute for Child Health and Development. A senior member of IEEE, Dr. Draghici is an editor of IEEE/ACM Transactions on Computational Biology and Bioinformatics, Journal of Biomedicine and Biotechnology, and International Journal of Functional Informatics and Personalized Medicine. He earned a Ph.D. in computer science from the University of St. Andrews.

Editorial Reviews

Praise for the First Edition
The book by Draghici is an excellent choice to be used as a textbook for a graduate-level bioinformatics course. This well-written book with two accompanying CD-ROMs will create much-needed enthusiasm among statisticians.
Journal of Statistical Computation and Simulation, Vol. 74

I really like Draghici's book. As the author explains in the Preface, the book is intended to serve both the statistician who knows very little about DNA microarrays and the biologist who has no expertise in data analysis. The author lays out a study plan for the statistician that excludes 5 of the 17 chapters (4-8). These chapters present the basics of statistical distributions, estimation, hypothesis testing, ANOVA, and experimental design. What that leaves for the statistician is the three-chapter primer on microarrays and image processing, plus all of the data analysis tools specific to the microarray situation. … it includes two CDs with trial versions of several specialised software packages. Anyone who uses microarray data should certainly own a copy.
Technometrics, Vol. 47, No. 1, February 2005