Clustering: A Data Recovery Approach, Second Edition

Boris Mirkin

October 17, 2012 by Chapman and Hall/CRC
Reference - 374 Pages - 47 B/W Illustrations
ISBN 9781439838419 - CAT# K11742
Series: Chapman & Hall/CRC Computer Science & Data Analysis


Add to Wish List
FREE Standard Shipping!


  • Provides detailed coverage of selected clustering techniques, notably K-means, divisive clustering, network clustering, spectral clustering, additive clustering, and consensus clustering
  • Emphasizes the application of the methods through detailed case studies
  • Offers MATLAB® code for the examples on the book’s website
  • Illustrates methods by computation on specially selected small real-world datasets
  • Gives an extensive advice on computational interpretation of clusters, including use of hierarchical ontologies


Often considered more of an art than a science, books on clustering have been dominated by learning through example with techniques chosen almost through trial and error. Even the two most popular, and most related, clustering methods—K-Means for partitioning and Ward's method for hierarchical clustering—have lacked the theoretical underpinning required to establish a firm relationship between the two methods and relevant interpretation aids. Other approaches, such as spectral clustering or consensus clustering, are considered absolutely unrelated to each other or to the two above mentioned methods.

Clustering: A Data Recovery Approach, Second Edition presents a unified modeling approach for the most popular clustering methods: the K-Means and hierarchical techniques, especially for divisive clustering. It significantly expands coverage of the mathematics of data recovery, and includes a new chapter covering more recent popular network clustering approaches—spectral, modularity and uniform, additive, and consensus—treated within the same data recovery approach. Another added chapter covers cluster validation and interpretation, including recent developments for ontology-driven interpretation of clusters. Altogether, the insertions added a hundred pages to the book, even in spite of the fact that fragments unrelated to the main topics were removed.

Illustrated using a set of small real-world datasets and more than a hundred examples, the book is oriented towards students, practitioners, and theoreticians of cluster analysis. Covering topics that are beyond the scope of most texts, the author’s explanations of data recovery methods, theory-based advice, pre- and post-processing issues and his clear, practical instructions for real-world data mining make this book ideally suited for teaching, self-study, and professional reference.