1st Edition

Big Data in Omics and Imaging Association Analysis

By Momiao Xiong Copyright 2018
    700 Pages 60 Color & 3 B/W Illustrations
    by Chapman & Hall

    700 Pages 60 Color & 3 B/W Illustrations
    by Chapman & Hall

    700 Pages 60 Color & 3 B/W Illustrations
    by Chapman & Hall

    Big Data in Omics and Imaging: Association Analysis addresses the recent development of association analysis and machine learning for both population and family genomic data in sequencing era. It is unique in that it presents both hypothesis testing and a data mining approach to holistically dissecting the genetic structure of complex traits and to designing efficient strategies for precision medicine. The general frameworks for association analysis and machine learning, developed in the text, can be applied to genomic, epigenomic and imaging data.



    FEATURES



    Bridges the gap between the traditional statistical methods and computational tools for small genetic and epigenetic data analysis and the modern advanced statistical methods for big data



    Provides tools for high dimensional data reduction



    Discusses searching algorithms for model and variable selection including randomization algorithms, Proximal methods and matrix subset selection



    Provides real-world examples and case studies



    Will have an accompanying website with R code




    The book is designed for graduate students and researchers in genomics, bioinformatics, and data science. It represents the paradigm shift of genetic studies of complex diseases– from shallow to deep genomic analysis, from low-dimensional to high dimensional, multivariate to functional data analysis with next-generation sequencing (NGS) data, and from homogeneous populations to heterogeneous population and pedigree data analysis. Topics covered are: advanced matrix theory, convex optimization algorithms, generalized low rank models, functional data analysis techniques, deep learning principle and machine learning methods for modern association, interaction, pathway and network analysis of rare and common variants, biomarker identification, disease risk and drug response prediction.



     



    Mathematical Foundation

    Sparsity-Inducing Norms, Dual Norms and Fenchel Conjugate

    Subdifferential

    Definition of Subgradient

    Subgradients of differentiable functions

    Calculus of subgradients

    Proximal Methods

    Introduction

    Basics of Proximate Methods

    Properties of the Proximal Operator

    Proximal Algorithms

    Computing the Proximal Operator

    Matrix Calculus

    Derivative of a Function with Respect to a Vector

    Derivative of a Function with Respect to a Matrix

    Derivative of a Matrix with Respect to a Scalar

    Derivative of a Matrix with Respect to a Matrix or a Vector

    Derivative of a Vector Function of a Vector

    Chain Rules

    Widely Used Formulae

    Functional Principal Component Analysis (FPCA)

    Principal Component Analysis (PCA)

    Basic Mathematical Tools for Functional Principal Component Analysis

    Unsmoothed Functional Principal Component Analysis

    Smoothed Principal Component Analysis

    Computations for the Principal Component Function and the Principal Component Score

    Canonical Correlation Analysis

    Exercises

    Appendix

    Linkage Disequilibrium

    Concepts of Linkage Disequilibrium

    Measures of Two-locus Linkage Disequilibrium

    Linkage Disequilibrium Coefficient D

    Normalized Measure of Linkage Disequilibrium

    Correlation Coefficient r

    Composite Measure of Linkage Disequilibrium

    The Relationship Between the Measure of LD and Physical Distance

    Haplotype Reconstruction

    Clark’s Algorithm

    EM algorithm

    Bayesian and Coalescence-based Methods

    Multi-locus Measures of Linkage Disequilibrium

    Mutual Information Measure of LD

    Multi-Information and Multi-locus Measure of LD

    Joint Mutual Information and a Measure of LD between a Marker and a Haplotype Block or Between Two Haplotype Blocks

    Interaction Information

    Conditional Interaction Information

    Normalized Multi-Information

    Distribution of Estimated Mutual Information, Multi-information and Interaction Information

    Canonical Correlation Analysis Measure for LD between Two Genomic Regions

    Association Measure between Two Genomic Regions Based on CCA

    Relationship between Canonical Correlation and Joint Information

    Software Package

    Bibliographical Notes

    Appendices

    Exercises

    Association Studies for Qualitative Traits

    Population-based Association Analysis for Common Variants

    Introduction

    The Hardy-Weinberg Equilibrium

    Genetic Models

    Odds Ratio

    Single Marker Association Analysis

    Multi-marker Association Analysis

    Population-based Multivariate Association Analysis for Next-generation Sequencing

    Multivariate Group Tests

    Score Tests and Logistic Regression

    Application of Score Tests for Association of Rare Variants

    Variance-component Score Statistics and Logistic Mixed Effects Models

    Population-based Functional Association Analysis for Next-generation Sequencing

    Introduction

    Functional Principal Component Analysis for Association Test

    Smoothed Functional Principal Component Analysis for Association Test

    Software Package

    Appendices

    Exercises


    Association Studies for Quantitative Traits

    Fixed Effect Model for a Single Trait

    Introduction

    Genetic Effects

    Linear Regression for a Quantitative Trait

    Multiple Linear Regression for a Quantitative Trait

    Gene-based Quantitative Trait Analysis

    Functional Linear Model for a Quantitative Trait

    Canonical Correlation Analysis for Gene-based Quantitative Trait Analysis

    Kernel Approach to Gene-based Quantitative Trait Analysis

    Kernel and RKHS

    Covariance Operator and Dependence Measure

    Simulations and Real Data Analysis

    Power Evaluation

    Application to Real Data Examples

    Software Package

    Appendices

    Exercises


    Multiple Phenotype Association Studies

    Pleiotropic Additive and Dominance Effects

    Multivariate Marginal Regression

    Models

    Estimation of Genetic Effects

    Test Statistics

    Linear Models for Multiple Phenotypes and Multiple Markers

    Multivariate Multiple Linear Regression Models

    Multivariate Functional Linear Models for Gene-based Genetic Analysis of Multiple Phenotypes

    Canonical Correlation Analysis for Gene-based Genetic Pleiotropic Analysis

    Multivariate Canonical Correlation Analysis (CCA)

    Kernel CCA

    Functional CCA

    Quadratically Regularized Functional CCA

    Dependence Measure and Association Tests of Multiple Traits

    Principal Component for Phenotype Dimension Reduction

    Principal Component Analysis

    Kernel Principal Component Analysis

    Quadratically Regularized PCA or Kernel PCA

    Other Statistics for Pleiotropic Genetics Analysis

    Sum of Squared Score Test

    Unified Score-based Association Test (USAT)

    Combining Marginal Tests

    FPCA-based Kernel Measure Test of Independence

    Connection between Statistics

    Simulations and Real Data Analysis

    Type Error Rate and Power Evaluation

    Application to Real Data Example

    Software Package

    Appendices

    Exercises


    Family-based Association Analysis

    Genetic Similarity and Kinship Coefficients

    Kinship Coefficients

    Identity Coefficients

    Relation between identity coefficients and kinship coefficient

    Estimation of Genetic Relations from the Data

    Genetic Covariance between Relatives

    Assumptions and Genetic Models

    Analysis for Genetic Covariance between Relatives

    Mixed Linear Model for a Single Trait

    Genetic Random Effect

    Mixed Linear Model for Quantitative Trait Association Analysis

    Estimating Variance Components

    Hypothesis Test in Mixed Linear Models

    Mixed Linear Models for Quantitative Trait Analysis with Sequencing Data

    Mixed Functional Linear Models for Sequence-based Quantitative Trait Analysis

    Mixed Functional Linear Models (Type )

    Mixed Functional Linear Models (Type : Functional Variance Component Models)

    Multivariate Mixed Linear Model for Multiple Traits

    Multivariate Mixed Linear Model

    Maximum Likelihood Estimate of Variance Components

    REML Estimate of Variance Components

    Heritability

    Heritability Estimation for a Single Trait

    Heritability Estimation for Multiple Traits

    Family-based Association Analysis for Qualitative Trait

    The Generalized T Test with Families and Additional Population Structures

    Collapsing Method

    CMC with Families

    The Functional Principal Component Analysis and Smooth Functional Principal Component Analysis with Families

    Software Package

    Exercise

    Interaction Analysis

    Measures of Gene-gene and Gene-environment Interaction for Qualitative Trait

    Binary Measure of Gene-gene and Gene-environment Interaction

    Disequilibrium Measure of Gene-gene and Gene-environment Interaction

    Information Measure of Gene-gene and Gene-environment Interaction

    Measure of Interaction between Gene and Continuous Environment

    Statistics for Testing Gene-gene and Gene-Environment Interaction for Qualitative Trait with Common Variants

    Relative Risk and Odds-ration-based Statistics for Testing Interaction between Gene and Discrete Environment

    Disequilibrium-based Statistics for Testing Gene-gene Interaction

    Information-based Statistics for Testing Gene-Gene Interaction

    Haplotype-Odds Ratio and Tests for Gene-Gene Interaction

    Multiplicative Measure-based Statistics for Testing Interaction between Gene and Continuous Environment

    Information Measure-based Statistics for Testing Interaction between Gene and Continuous Environment

    Real Example

    Statistics for Testing Gene-gene and Gene-Environment Interaction for Qualitative Trait with Next-generation Sequencing Data

    Multiple Logistic Regression Model for Gene-Gene Interaction Analysis

    Functional logistic regression model for gene-gene interaction analysis

    Statistics for Testing Interaction between Two Genomic Regions

    Statistics for Testing Gene-gene and Gene-Environment Interaction for Quantitative Traits

    Genetic Models for Epistasis Effects of Quantitative Traits

    Regression Model for Interaction Analysis with Quantitative Traits

    Functional Regression Model for Interaction Analysis with a Quantitative Trait

    Functional Regression Model for Interaction Analysis with Multiple Quantitative Traits

    Multivariate and Functional Canonical Correlation as a Unified Framework for Testing Gen-Gene and Gene-Environment Interaction for both Qualitative and Quantitative Traits

    Data Structure of CCA for Interaction Analysis

    CCA and Functional CCA

    Kernel CCA

    Software Package

    Appendices

    Exercise


    Machine Learning, Low Rank Models and Their Application to Disease Risk Prediction and Precision Medicine

    Logistic Regression

    Two Class Logistic Regression

    Multiclass Logistic Regression

    Parameter Estimation

    Test Statistics

    Network Penalized Two-class Logistic Regression

    Network Penalized Multiclass Logistic Regression

    Fisher’s Linear Discriminant Analysis

    Fisher’s Linear Discriminant Analysis for Two Classes

    Multi-class Fisher’s Linear Discriminant Analysis

    Connections between Linear Discriminant Analysis, Optimal Scoring and Canonical Correlation Analysis (CCA)

    Support Vector Machine

    Introduction

    Linear Support Vector Machines

    Nonlinear SVM

    Penalized SVMs

    Low Rank Approximation

    Quadratically Regularized PCA

    Generalized Regularization

    Generalized Canonical Correlation Analysis (CCA)

    Quadratically Regularized Canonical Correlation Analysis

    Sparse Canonical Correlation Analysis

    Sparse Canonical Correlation Analysis via a Penalized Matrix Decomposition

    Inverse Regression (IR) and Sufficient Dimension Reduction

    Sufficient Dimension Reduction (SDR) and Sliced Inverse Regression (SIR)

    Sparse SDR

    Software Package

    Appendices

    Exercises

     


    Biography

    Momiao Xiong, is a professor in the Department of Biostatistics, University of Texas School of Public Health, and a regular member in the Genetics & Epigenetics (G&E) Graduate Program at The University of Texas MD Anderson Cancer Center, UTHealth Graduate School of Biomedical Science.

    "This is a fantastic book intensively focusing on the mathematical underpinnings of modern genome-wide association studies (GWAS). It serves well for senior graduate students in applied mathematics, computer science, and statistics who are interested in building a solid mathematical understanding of GWAS. Backgrounds of advanced mathematics and genetics are expected. It can also be used as a handbook for professionals to quickly check mathematical contexts of GWAS approaches and tools. This book is especially helpful for the latest generation of statistical geneticists who are pursuing academic career paths."
    ~Journal of the American Statistical Association, Jing Su (Wake Forest School of Medicine)