1st Edition

Big Data in Omics and Imaging, Two Volume Set

Edited By Momiao Xiong
    1404 Pages
    by Chapman & Hall

    FEATURES

    Bridges the gap between the traditional statistical methods and computational tools for small genetic and epigenetic data analysis and the modern advanced statistical methods for big data

    Provides tools for high dimensional data reduction

    Discusses searching algorithms for model and variable selection including randomization algorithms, Proximal methods and matrix subset selection

    Provides real-world examples and case studies

    Will have an accompanying website with R code

    Provides a natural extension and companion volume to Big Data in Omic and Imaging: Association Analysis, but can be read independently.

    Introduce causal inference theory to genomic, epigenomic and imaging data analysis

    Develop novel statistics for genome-wide causation studies and epigenome-wide causation studies.

    Bridge the gap between the traditional association analysis and modern causation analysis

    Use combinatorial optimization methods and various causal models as a general framework for inferring multilevel omic and image causal networks

    Present statistical methods and computational algorithms for searching causal paths from genetic variant to disease

    Develop causal machine learning methods integrating causal inference and machine learning

    Develop statistics for testing significant difference in directed edge, path, and graphs, and for assessing causal relationships between two networks

    The book is designed for graduate students and researchers in genomics, bioinformatics, and data science. It represents the paradigm shift of genetic studies of complex diseases– from shallow to deep genomic analysis, from low-dimensional to high dimensional, multivariate to functional data analysis with next-generation sequencing (NGS) data, and from homogeneous populations to heterogeneous population and pedigree data analysis. Topics covered are: advanced matrix theory, convex optimization algorithms, generalized low rank models, functional data analysis techniques, deep learning principle and machine learning methods for modern association, interaction, pathway and network analysis of rare and common variants, biomarker identification, disease risk and drug response prediction.

    Mathematical Foundation

    Sparsity-Inducing Norms, Dual Norms and Fenchel Conjugate

    Subdifferential

    Definition of Subgradient

    Subgradients of differentiable functions

    Calculus of subgradients

    Proximal Methods

    Introduction

    Basics of Proximate Methods

    Properties of the Proximal Operator

    Proximal Algorithms

    Computing the Proximal Operator

    Matrix Calculus

    Derivative of a Function with Respect to a Vector

    Derivative of a Function with Respect to a Matrix

    Derivative of a Matrix with Respect to a Scalar

    Derivative of a Matrix with Respect to a Matrix or a Vector

    Derivative of a Vector Function of a Vector

    Chain Rules

    Widely Used Formulae

    Functional Principal Component Analysis (FPCA)

    Principal Component Analysis (PCA)

    Basic Mathematical Tools for Functional Principal Component Analysis

    Unsmoothed Functional Principal Component Analysis

    Smoothed Principal Component Analysis

    Computations for the Principal Component Function and the Principal Component Score

    Canonical Correlation Analysis

    Linkage Disequilibrium

    Concepts of Linkage Disequilibrium

    Measures of Two-locus Linkage Disequilibrium

    Linkage Disequilibrium Coefficient D

    Normalized Measure of Linkage Disequilibrium

    Correlation Coefficient r

    Composite Measure of Linkage Disequilibrium

    The Relationship Between the Measure of LD and Physical Distance

    Haplotype Reconstruction

    Clark’s Algorithm

    EM algorithm

    Bayesian and Coalescence-based Methods

    Multi-locus Measures of Linkage Disequilibrium

    Mutual Information Measure of LD

    Multi-Information and Multi-locus Measure of LD

    Joint Mutual Information and a Measure of LD between a Marker and a Haplotype Block or Between Two Haplotype Blocks

    Interaction Information

    Conditional Interaction Information

    Normalized Multi-Information

    Distribution of Estimated Mutual Information, Multi-information and Interaction Information

    Canonical Correlation Analysis Measure for LD between Two Genomic Regions

    Association Measure between Two Genomic Regions Based on CCA

    Relationship between Canonical Correlation and Joint Information

    Software Package

    Association Studies for Qualitative Traits

    Population-based Association Analysis for Common Variants

    Introduction

    The Hardy-Weinberg Equilibrium

    Genetic Models

    Odds Ratio

    Single Marker Association Analysis

    Multi-marker Association Analysis

    Population-based Multivariate Association Analysis for Next-generation Sequencing

    Multivariate Group Tests

    Score Tests and Logistic Regression

    Application of Score Tests for Association of Rare Variants

    Variance-component Score Statistics and Logistic Mixed Effects Models

    Population-based Functional Association Analysis for Next-generation Sequencing

    Introduction

    Functional Principal Component Analysis for Association Test

    Smoothed Functional Principal Component Analysis for Association TestSoftware Package

    Association Studies for Quantitative Traits

    Fixed Effect Model for a Single Trait

    Introduction

    Genetic Effects

    Linear Regression for a Quantitative Trait

    Multiple Linear Regression for a Quantitative Trait

    Gene-based Quantitative Trait Analysis

    Functional Linear Model for a Quantitative Trait

    Canonical Correlation Analysis for Gene-based Quantitative Trait Analysis

    Kernel Approach to Gene-based Quantitative Trait Analysis

    Kernel and RKHS

    Covariance Operator and Dependence Measure

    Simulations and Real Data Analysis

    Power Evaluation

    Application to Real Data Examples

    Software Package

    Multiple Phenotype Association Studies

    Pleiotropic Additive and Dominance Effects

    Multivariate Marginal Regression

    Models

    Estimation of Genetic Effects

    Test Statistics

    Linear Models for Multiple Phenotypes and Multiple Markers

    Multivariate Multiple Linear Regression Models

    Multivariate Functional Linear Models for Gene-based Genetic Analysis of Multiple Phenotypes

    Canonical Correlation Analysis for Gene-based Genetic Pleiotropic Analysis

    Multivariate Canonical Correlation Analysis (CCA)

    Kernel CCA

    Functional CCA

    Quadratically Regularized Functional CCA

    Dependence Measure and Association Tests of Multiple Traits

    Principal Component for Phenotype Dimension Reduction

    Principal Component Analysis

    Kernel Principal Component Analysis

    Quadratically Regularized PCA or Kernel PCA

    Other Statistics for Pleiotropic Genetics Analysis

    Sum of Squared Score Test

    Unified Score-based Association Test (USAT)

    Combining Marginal Tests

    FPCA-based Kernel Measure Test of Independence

    Connection between Statistics

    Simulations and Real Data Analysis

    Type Error Rate and Power Evaluation

    Application to Real Data Example

    Software Package

    Family-based Association Analysis

    Genetic Similarity and Kinship Coefficients

    Kinship Coefficients

    Identity Coefficients

    Relation between identity coefficients and kinship coefficient

    Estimation of Genetic Relations from the Data

    Genetic Covariance between Relatives

    Assumptions and Genetic Models

    Analysis for Genetic Covariance between Relatives

    Mixed Linear Model for a Single Trait

    Genetic Random Effect

    Mixed Linear Model for Quantitative Trait Association Analysis

    Estimating Variance Components

    Hypothesis Test in Mixed Linear Models

    Mixed Linear Models for Quantitative Trait Analysis with Sequencing Data

    Mixed Functional Linear Models for Sequence-based Quantitative Trait Analysis

    Mixed Functional Linear Models (Type )

    Mixed Functional Linear Models (Type : Functional Variance Component Models)

    Multivariate Mixed Linear Model for Multiple Traits

    Multivariate Mixed Linear Model

    Maximum Likelihood Estimate of Variance Components

    REML Estimate of Variance Components

    Heritability

    Heritability Estimation for a Single Trait

    Heritability Estimation for Multiple Traits

    Family-based Association Analysis for Qualitative Trait

    The Generalized T Test with Families and Additional Population Structures

    Collapsing Method

    CMC with Families

    The Functional Principal Component Analysis and Smooth Functional Principal Component Analysis with Families

    Software Package

    Interaction Analysis

    Measures of Gene-gene and Gene-environment Interaction for Qualitative Trait

    Binary Measure of Gene-gene and Gene-environment Interaction

    Disequilibrium Measure of Gene-gene and Gene-environment Interaction

    Information Measure of Gene-gene and Gene-environment Interaction

    Measure of Interaction between Gene and Continuous Environment

    Statistics for Testing Gene-gene and Gene-Environment Interaction for Qualitative Trait with Common Variants

    Relative Risk and Odds-ration-based Statistics for Testing Interaction between Gene and Discrete Environment

    Disequilibrium-based Statistics for Testing Gene-gene Interaction

    Information-based Statistics for Testing Gene-Gene Interaction

    Haplotype-Odds Ratio and Tests for Gene-Gene Interaction

    Multiplicative Measure-based Statistics for Testing Interaction between Gene and Continuous Environment

    Information Measure-based Statistics for Testing Interaction between Gene and Continuous Environment

    Real Example

    Statistics for Testing Gene-gene and Gene-Environment Interaction for Qualitative Trait with Next-generation Sequencing Data

    Multiple Logistic Regression Model for Gene-Gene Interaction Analysis

    Functional logistic regression model for gene-gene interaction analysis

    Statistics for Testing Interaction between Two Genomic Regions

    Statistics for Testing Gene-gene and Gene-Environment Interaction for Quantitative Traits

    Genetic Models for Epistasis Effects of Quantitative Traits

    Regression Model for Interaction Analysis with Quantitative Traits

    Functional Regression Model for Interaction Analysis with a Quantitative Trait

    Functional Regression Model for Interaction Analysis with Multiple Quantitative Traits

    Multivariate and Functional Canonical Correlation as a Unified Framework for Testing Gen-Gene and Gene-Environment Interaction for both Qualitative and Quantitative Traits

    Data Structure of CCA for Interaction Analysis

    CCA and Functional CCA

    Kernel CCA

    Software Package

    Machine Learning, Low Rank Models and Their Application to Disease Risk Prediction and Precision Medicine

    Logistic Regression

    Two Class Logistic Regression

    Multiclass Logistic Regression

    Parameter Estimation

    Test Statistics

    Network Penalized Two-class Logistic Regression

    Network Penalized Multiclass Logistic Regression

    Fisher’s Linear Discriminant Analysis

    Fisher’s Linear Discriminant Analysis for Two Classes

    Multi-class Fisher’s Linear Discriminant Analysis

    Connections between Linear Discriminant Analysis, Optimal Scoring and Canonical Correlation Analysis (CCA)

    Support Vector Machine

    Introduction

    Linear Support Vector Machines

    Nonlinear SVM

    Penalized SVMs

    Low Rank Approximation

    Quadratically Regularized PCA

    Generalized Regularization

    Generalized Canonical Correlation Analysis (CCA)

    Quadratically Regularized Canonical Correlation Analysis

    Sparse Canonical Correlation Analysis

    Sparse Canonical Correlation Analysis via a Penalized Matrix Decomposition

    Inverse Regression (IR) and Sufficient Dimension Reduction

    Sufficient Dimension Reduction (SDR) and Sliced Inverse Regression (SIR)

    Sparse SDRSoftware Package

    Genotype-Phenotype Network Analysis

    Undirected Graphs for Genotype Network

    Gaussian Graphic Model

    Alternating Direction Method of Multipliers for Estimation of Gaussian Graphical Model

    Coordinate Descent Algorithm and Graphical Lasso

    Multiple Graphical Models

    Directed Graphs and Structural Equation Models for Networks

    Directed Acyclic Graphs

    Linear Structural Equation Models

    Estimation Methods

    Sparse Linear Structural Equations

    Penalized Maximum Likelihood Estimation

    Penalized Two Stage Least Square Estimation

    Penalized Three Stage Least Square Estimation

    Functional Structural Equation Models for Genotype-Phenotype Networks

    Functional Structural Equation Models

    Group Lasso and ADMM for Parameter Estimation in the Functional Structural Equation Models

    Causal Calculus

    Effect Decomposition and Estimation

    Graphical Tools for Causal Inference in Linear SEMs

    Identification and Single-door Criterion

    Instrument Variables

    Total Effects and Backdoor Criterion

    Counterfactuals and Linear SEMs

    Simulations and Real Data Analysis

    Simulations for Model Evaluation

    Application to Real Data Examples 

    Causal analysis and network biology

    Bayesian Networks as a General Framework for Causal Inference

    Parameter Estimation and Bayesian Dirichlet Equivalent Uniform Score for Discrete Bayesian Networks

    Structural Equations and Score Metrics for Continuous Causal Networks

    Multivariate SEMs for Generating Node Core Metrics

    Mixed SEMs for Pedigree-based Causal Inference

    Bayesian Networks with Discrete and Continuous Variable

    Two-class Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks

    Multiple Network Penalized Functional Logistic Regression Models for NGS Data

    Multi-class Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks

    Other Statistical Models for Quantifying Node Score Function

    Integer Programming for Causal Structure Leaning

    Introduction

    Integer Linear Programming Formulation of DAG Learning

    Cutting Plane for Integer Linear Programming

    Branch and Cut Algorithm for Integer Linear Programming

    Sink Finding Primal Heuristic Algorithm

    Simulations and Real Data Analysis

    Simulations

    Real Data Analysis

    Smoothing Spline Regression for a Single Variable

    Smoothing Spline Regression for Multiple Variables

    Wearable Computing and Genetic Analysis of Function-valued Traits

    Classification of Wearable Biosensor Data

    Introduction

    Functional Data Analysis for Classification of Time Course Wearable Biosensor Data

    Differential Equations for Extracting Features of the Dynamic Process and for Classification of Time Course Data

    Deep Learning for Physiological Time Series Data Analysis

    Association Studies of Function-Valued Traits

    Introduction

    Functional Linear Models with both Functional Response and Predictors for Association Analysis of Function-valued Traits

    Test Statistics

    Null Distribution of Test Statistics

    Power

    Real Data Analysis

    Association Analysis of Multiple Function-valued Traits

    Gene-gene Interaction Analysis of Function-Valued Traits

    Introduction

    Functional Regression Models

    Estimation of Interaction Effect Function

    Test Statistics

    Simulations

    Real Data Analysis

    Networks

    Multilayer Feedforward Pass

    Backpropagation Pass

    Convolutional Layer

    RNA-seq Data Analysis

    Normalization Methods on RNA-seq Data Analysis

    Gene Expression

    RNA Sequencing Expression Profiling

    Methods for Normalization

    Differential Expression Analysis for RNA-Seq Data

    Distribution-based Approach to Differential Expression Analysis

    Functional Expansion Approach to Differential Expression Analysis of RNA-Seq Data

    Differential Analysis of Allele Specific Expressions with RNA-Seq Data

    eQTL and eQTL Epistasis Analysis with RNA-Seq Data

    Matrix Factorization

    Quadratically Regularized Matrix Factorization and Canonical Correlation Analysis

    QRFCCA for eQTL and eQTL Epistasis Analysis of RNA-Seq Data

    Real Data Analysis

    Gene Co-expression Network and Gene Regulatory Networks

    Co-expression Network Construction with RNA-Seq Data by CCA and FCCA

    Graphical Gaussian Models

    Real Data Applications

    Directed Graph and Gene Regulatory Networks

    Hierarchical Bayesian Networks for Whole Genome Regulatory Networks

    Linear Regulatory Networks

    Nonlinear Regulatory Networks

    Dynamic Bayesian Network and Longitudinal Expression Data Analysis

    Single Cell RNA-Seq Data Analysis, Gene Expression Deconvolution and Genetic Screening

    Cell Type Identification

    Gene Expression Deconvolution and Cell Type-Specific Expression

    Normalization

    Variational Methods for expectation-maximization (EM) algorithm

    Variational Methods for Bayesian Learning

    Methylation Data Analysis

    DNA Methylation Analysis

    Epigenome-wide Association Studies (EWAS)

    Single-Locus Test

    Set-based Methods

    Epigenome-wide Causal Studies

    Introduction

    Additive Functional Model for EWCS

    Genome-wide DNA Methylation Quantitative Trait Locus (mQTL) Analysis

    Causal Networks for Genetic-Methylation Analysis

    Structural Equation Models with Scalar Endogenous Variables and Functional Exogenous Variables

    Functional Structural Equation Models with Functional Endogenous Variables and Scalar Exogenous Variables (FSEMS)

    Functional Structural Equation Models with both Functional Endogenous Variables an Exogenous Variables (FSEMF)

    Imaging and Genomics

    Introduction

    Image Segmentation

    Unsupervised Learning Methods for Image Segmentation

    Supervised Deep Learning Methods for Image Segmentation

    Two or Three dimensional Functional Principal Component Analysis for Image Data Reduction 645

    Formulation

    Integral Equation and Eigenfunctions

    Association Analysis of Imaging-Genomic Data

    Multivariate Functional Regression Models for Imaging-Genomic Data Analysis

    Multivariate Functional Regression Models for Longitudinal Imaging-Genetics Analysis

    Quadratically Regularized Functional Canonical Correlation Analysis for Gene-Gene Interaction Detection in Imaging-Genetic Studies

    Causal Analysis of Imaging-Genomic Data

    Sparse SEMs for Joint Causal Analysis of Structural Imaging and Genomic Data

    Sparse Functional Structural Equation Models for phenotype and genotype networks.

    Conditional Gaussian Graphical Models (CGGMs) for Structural Imaging and Genomic Data Analysis.

    Time Series SEMs for Integrated Causal Analysis of fMRI and Genomic Data Models

    Reduced Form Equations

    Single Equation and Generalized Least Square Estimator

    Sparse SEMs and Alternating Direction Method of Multipliers

    Causal machine learning

    From Association Analysis to Integrated Causal Inference

    Genome-wide Causal Studies

    Mathematical Formulation of Causal Analysis

    Basic Causal Assumptions

    Linear Additive SEMs with non-Gaussian Noise

    Information Geometry Approach

    Causal Inference on Discrete Data

    Multivariate Causal Inference and Causal Networks

    Markov Condition, Markov Equivalence, Faithfulness and Minimality

    Multilevel Causal Networks for Integrative Omics and Imaging Data Analysis

    Causal Inference with Confounders

    Causal Sufficiency

    Instrumental Variables

    Biography

    Momiao Xiong is a professor of Biostatistics at the University of Texas Health Science Center in Houston where he has worked since 1997. He received his PhD in 1993 from the University of Georgia.