1st Edition

Multivariate Kernel Smoothing and Its Applications

By José E. Chacón, Tarn Duong Copyright 2018
    248 Pages
    by Chapman & Hall

    248 Pages
    by Chapman & Hall

    Kernel smoothing has greatly evolved since its inception to become an essential methodology in the data science tool kit for the 21st century. Its widespread adoption is due to its fundamental role for multivariate exploratory data analysis, as well as the crucial role it plays in composite solutions to complex data challenges.

    Multivariate Kernel Smoothing and Its Applications offers a comprehensive overview of both aspects. It begins with a thorough exposition of the approaches to achieve the two basic goals of estimating probability density functions and their derivatives. The focus then turns to the applications of these approaches to more complex data analysis goals, many with a geometric/topological flavour, such as level set estimation, clustering (unsupervised learning), principal curves, and feature significance. Other topics, while not direct applications of density (derivative) estimation but sharing many commonalities with the previous settings, include classification (supervised learning), nearest neighbour estimation, and deconvolution for data observed with error.



    For a data scientist, each chapter contains illustrative Open data examples that are analysed by the most appropriate kernel smoothing method. The emphasis is always placed on an intuitive understanding of the data provided by the accompanying statistical visualisations. For a reader wishing to investigate further the details of their underlying statistical reasoning, a graduated exposition to a unified theoretical framework is provided. The algorithms for efficient software implementation are also discussed.



    José E. Chacón is an associate professor at the Department of Mathematics of the Universidad de Extremadura in Spain.
    Tarn Duong is a Senior Data Scientist for a start-up which provides short distance carpooling services in France. 

    Both authors have made important contributions to kernel smoothing research over the last couple of decades.

    Introduction. Density estimation. Density derivative estimation. Statistical topics related to density derivative estimation. Kernel smoothing in other selected settings.

    Biography

    José E. Chacón is an associate professor at the Department of Mathematics of the Universidad de Extremadura in Spain.

    Tarn Duong
    is a Senior Data Scientist for a start-up which provides short distance carpooling services in France. 

    Both authors have made important contributions to kernel smoothing research over the last couple of decades.

    "I am very impressed with this book. It addresses issues that are not discussed in any detail in any other book on density estimation. Furthermore, it is very well-written and contains a wealth of interesting examples. In fact, this is probably one of the best books I have seen on density estimation. Some topics in this book that are not covered in detail in any other book include: multivariate bandwidth matrices, details of the asymptotic MSE for general bandwidth matrices, derivative estimation, level sets, density clustering and significance testing for modal regions. This makes the book unique. The authors have written the book in such a way that it can be used by two different types of readers: data analysts who are not interested in the mathematical details, and students/researchers who do want the details. The `how to read this monograph' is very useful."
    ~Larry Wasserman, Carnegie Mellon University

     "This book provides a comprehensive overview of the fundamental issues and the numerous extensions of multivariate kernel density estimation. There are three core aspects that are discussed. Firstly, the method of kernel density estimation is thoroughly described in the multivariate setting. Secondly, the problem of selecting a bandwidth matrix is discussed, with a comparison of numerous alternatives. Thirdly, the performance and asymptotic properties of the estimators and bandwidth selections are comprehensively reviewed: there is an abundance of information on the (asymptotic) mean (integrated) squared error of various combinations of estimators and bandwidths.

    Having examined the above fundamentals, the authors discuss numerous extensions of multivariate kernel density estimation. These include density derivative estimation, level set estimation, density-based clustering, density ridge estimation, feature significance, density di erence estimation, and classification. For all of these methods, there is a strong focus on asy