480 Pages 41 Color & 56 B/W Illustrations
    by Chapman & Hall

    480 Pages 41 Color & 56 B/W Illustrations
    by Chapman & Hall

    Handbook of Big Data provides a state-of-the-art overview of the analysis of large-scale datasets. Featuring contributions from well-known experts in statistics and computer science, this handbook presents a carefully curated collection of techniques from both industry and academia. Thus, the text instills a working understanding of key statistical and computing ideas that can be readily applied in research and practice.

    Offering balanced coverage of methodology, theory, and applications, this handbook:

    • Describes modern, scalable approaches for analyzing increasingly large datasets
    • Defines the underlying concepts of the available analytical tools and techniques
    • Details intercommunity advances in computational statistics and machine learning

    Handbook of Big Data also identifies areas in need of further development, encouraging greater communication and collaboration between researchers in big data sub-specialties such as genomics, computational biology, and finance.

    GENERAL PERSPECTIVES ON BIG DATA

    The Advent of Data Science: Some Considerations on the Unreasonable Effectiveness of Data
    Richard Starmans

    Big n versus Big p in Big Data
    Norman Matloff

    DATA-CENTRIC, EXPLORATORY METHODS

    Divide and Recombine: Approach for Detailed Analysis and Visualization of Large Complex Data
    Ryan Hafen

    Integrate Big Data for Better Operation, Control, and Protection of Power Systems
    Guang Lin

    Interactive Visual Analysis of Big Data
    Carlos Scheidegger

    A Visualization Tool for Mining Large Correlation Tables: The Association Navigator
    Andreas Buja, Abba M. Krieger, and Edward I. George

    EFFICIENT ALGORITHMS

    High-Dimensional Computational Geometry
    Alexandr Andoni

    IRLBA: Fast Partial SVD Method
    James Baglama

    Structural Properties Underlying High-Quality Randomized Numerical Linear Algebra Algorithms
    Michael W. Mahoney and Petros Drineas

    Something for (Almost) Nothing: New Advances in Sublinear-Time Algorithms
    Ronitt Rubinfeld and Eric Blais

    GRAPH APPROACHES

    Networks
    Elizabeth L. Ogburn and Alexander Volfovsky

    Mining Large Graphs
    David F. Gleich and Michael W. Mahoney

    MODEL FITTING AND REGULARIZATION

    Estimator and Model Selection Using Cross-Validation
    Iván Díaz

    Stochastic Gradient Methods for Principled Estimation with Large Datasets
    Panos Toulis and Edoardo M. Airoldi

    Learning Structured Distributions
    Ilias Diakonikolas

    Penalized Estimation in Complex Models
    Jacob Bien and Daniela Witten

    High-Dimensional Regression and Inference
    Lukas Meier

    ENSEMBLE METHODS

    Divide and Recombine Subsemble, Exploiting the Power of Cross-Validation
    Stephanie Sapp and Erin LeDell

    Scalable Super Learning
    Erin LeDell

    CAUSAL INFERENCE

    Tutorial for Causal Inference
    Laura Balzer, Maya Petersen, and Mark van der Laan

    A Review of Some Recent Advances in Causal Inference
    Marloes H. Maathuis and Preetam Nandy

    TARGETED LEARNING

    Targeted Learning for Variable Importance
    Sherri Rose

    Online Estimation of the Average Treatment Effect
    Sam Lendle

    Mining with Inference: Data-Adaptive Target Parameters
    Alan Hubbard and Mark van der Laan

    Biography

    Peter Bühlmann is a professor of statistics at ETH Zürich, Switzerland, fellow of the Institute of Mathematical Statistics, elected member of the International Statistical Institute, and co-author of the book titled Statistics for High-Dimensional Data: Methods, Theory and Applications. He was named a Thomson Reuters’ 2014 Highly Cited Researcher in mathematics, served on various editorial boards and as editor of the Annals of Statistics, and delivered numerous presentations including a Medallion Lecture at the 2009 Joint Statistical Meetings, a read paper to the Royal Statistical Society in 2010, the 14th Bahadur Memorial Lectures at the University of Chicago, Illinois, USA, and other named lectures.

    Petros Drineas is an associate professor in the Computer Science Department at Rensselaer Polytechnic Institute, Troy, New York, USA. He is the recipient of an Outstanding Early Research Award from Rensselaer Polytechnic Institute, an NSF CAREER award, and two fellowships from the European Molecular Biology Organization. He has served as a visiting professor at the US Sandia National Laboratories; visiting fellow at the Institute for Pure and Applied Mathematics, University of California, Los Angeles; long-term visitor at the Simons Institute for the Theory of Computing, University of California, Berkeley; program director in two divisions at the US National Science Foundation; and worked for industrial labs. He is a co-organizer of the series of workshops on Algorithms for Modern Massive Datasets and his research has been featured in numerous popular press articles.

    Michael Kane is a member of the research faculty at Yale University, New Haven, Connecticut, USA. He is a winner of the American Statistical Association’s Chambers Statistical Software Award for The Bigmemory Project, a set of software libraries that allow the R programming environment to accommodate large datasets for statistical analysis. He is a grantee on the Defense Advanced Research Projects Agency’s XDATA project, part of the White House’s Big Data Initiative, and on the Gates Foundation’s Round 11 Grand Challenges Exploration. He has collaborated with companies including AT&T Labs Research, Paradigm4, Sybase, (a SAP company), and Oracle.

    Mark van der Laan is the Jiann-Ping Hsu/Karl E. Peace professor of biostatistics and statistics at the University of California, Berkeley, USA. He is the inventor of targeted maximum likelihood estimation, a general semiparametric efficient estimation method that incorporates the state of the art in machine learning through the ensemble method super learning. He is the recipient of the 2005 COPPS Presidents’ and Snedecor Awards, the 2005-van Dantzig Award, and the 2004 Spiegelman Award. He is also the founding editor of the International Journal of Biostatistics and the Journal of Causal Inference, and the co-author of more than 250 publications and various books.

    "The book contains a nice mix of philosophical musings, survey articles and cutting-edge research. It was designed as ‘a useful resource for seasoned practitioners and enthusiastic neophytes alike’ . . . Enthusiastic neophytes are still left with plenty to get their teeth into. In summary, I am happy to recommend the book to those seeking to broaden their understanding of the underpinning methodologies for analysing Big Data." ~ Richard J. Samworth, University of Cambridge, UK

    “. . . Handbook of Big Data is the first compilation on this emerging subject in our field and is therefore highly recommended to all statisticians and computer scientists."
    ~The International Biometric Society

    "The book strikes a great balance between the breadth and depth of recent research-active topics. It is an excellent reference book to keep for both academic researchers and industrial practitioners. It is also a good reference book for whoever teaches in the area of big data analysis.
    ~Journal of the American Statistical Association