Clustering for Data Mining: A Data Recovery Approach

Series:
Published:
Author(s):

Purchasing Options

Hardback
$109.95
Add to cart
ISBN 9781584885344
Cat# C5343
 

Features

  • Introduces classical clustering methods extended, via the data recovery approach, to modern data mining tasks
  • Describes the theory that leads to these methods and relevant interpretation aids, fills gaps in the established theory, and corrects common misconceptions
  • Treats the two most popular methods, K-Means and Ward clustering, offering the first theoretically motivated instructions for automating all steps of data mining with clustering
  • Offers an up-to-date description of current data mining issues, such as feature selection and cluster validation
  • Presents a wealth of computational examples covering all stages of clustering
  • Summary

    Often considered more as an art than a science, the field of clustering has been dominated by learning through examples and by techniques chosen almost through trial-and-error. Even the most popular clustering methods--K-Means for partitioning the data set and Ward's method for hierarchical clustering--have lacked the theoretical attention that would establish a firm relationship between the two methods and relevant interpretation aids.

    Rather than the traditional set of ad hoc techniques, Clustering for Data Mining: A Data Recovery Approach presents a theory that not only closes gaps in K-Means and Ward methods, but also extends them into areas of current interest, such as clustering mixed scale data and incomplete clustering. The author suggests original methods for both cluster finding and cluster description, addresses related topics such as principal component analysis, contingency measures, and data visualization, and includes nearly 60 computational examples covering all stages of clustering, from data pre-processing to cluster validation and results interpretation.

    This author's unique attention to data recovery methods, theory-based advice, pre- and post-processing issues that are beyond the scope of most texts, and clear, practical instructions for real-world data mining make this book ideally suited for virtually all purposes: for teaching, for self-study, and for professional reference.

    Table of Contents

    INTRODUCTION: HISTORICAL REMARKS

    WHAT IS CLUSTERING
    Exemplary Problems
    Bird's Eye View

    WHAT IS DATA
    Feature Characteristics
    Bivariate Analysis
    Feature Space and Data Scatter
    Preprocessing and Standardizing Mixed Data

    K-MEANS CLUSTERING
    Conventional K-Means
    Initialization of K-Means
    Intelligent K-Means
    Interpretation Aids
    Overall Assessment

    WARD HIERARCHICAL CLUSTERING
    Agglomeration: Ward Algorithm
    Divisive Clustering with Ward Criterion
    Conceptual Clustering
    Extensions of Ward Clustering
    Overall Assessment

    DATA RECOVERY MODELS
    Statistics Modeling as Data Recovery
    Data Recovery Model for K-Means
    Data Recovery Models for Ward Criterion
    Extensions to Other Data Types
    One-by-One Clustering
    Overall Assessment

    DIFFERENT CLUSTERING APPROACHES
    Extensions of K-Means Clustering
    Graph-Theoretic Approaches
    Conceptual Description of Clusters
    Overall Assessment

    GENERAL ISSUES
    Feature Selection and Extraction
    Data Pre-Processing and Standardization
    Similarity on Subsets and Partitions
    Dealing with Missing Data
    Validity and Reliability
    Overall Assessment

    CONCLUSION: Data Recovery Approach in Clustering

    BIBLIOGRAPHY

    Each chapter also contains a section of Base Words

    Editorial Reviews

    "The particular decomposition studied in this book is the decomposition of the total sum of squares matrix into between and within cluster components, and the book develops this decomposition, and its associated diagnostics, further than I have seen them developed for cluster analysis before. Overall, the book presents an unusual, perhaps even rather idiosyncratic approach to cluster analysis, from the perspective of someone who is clearly an enthusiast for the insights these tools can bring to understanding data."
    -D.J. Hand, Short Book Reviews of the ISI

    Related Titles