1st Edition

Spectral Feature Selection for Data Mining

By Zheng Alan Zhao, Huan Liu Copyright 2012
    224 Pages 53 B/W Illustrations
    by Chapman & Hall

    220 Pages 53 B/W Illustrations
    by Chapman & Hall

    Spectral Feature Selection for Data Mining introduces a novel feature selection technique that establishes a general platform for studying existing feature selection algorithms and developing new algorithms for emerging problems in real-world applications. This technique represents a unified framework for supervised, unsupervised, and semisupervised feature selection.

    The book explores the latest research achievements, sheds light on new research directions, and stimulates readers to make the next creative breakthroughs. It presents the intrinsic ideas behind spectral feature selection, its theoretical foundations, its connections to other algorithms, and its use in handling both large-scale data sets and small sample problems. The authors also cover feature selection and feature extraction, including basic concepts, popular existing algorithms, and applications.

    A timely introduction to spectral feature selection, this book illustrates the potential of this powerful dimensionality reduction technique in high-dimensional data processing. Readers learn how to use spectral feature selection to solve challenging problems in real-life applications and discover how general feature selection and extraction are connected to spectral feature selection.

    Data of High Dimensionality and Challenges
    Dimensionality Reduction Techniques
    Feature Selection for Data Mining
    Spectral Feature Selection
    Organization of the Book

    Univariate Formulations for Spectral Feature Selection
    Modeling Target Concept via Similarity Matrix
    The Laplacian Matrix of a Graph
    Evaluating Features on the Graph
    An Extension for Feature Ranking Functions
    Spectral Feature Selection via Ranking
    Robustness Analysis for SPEC
    Discussions

    Multivariate Formulations
    The Similarity Preserving Nature of SPEC
    A Sparse Multi-Output Regression Formulation
    Solving the L2,1-Regularized Regression Problem
    Efficient Multivariate Spectral Feature Selection
    A Formulation Based on Matrix Comparison
    Feature Selection with Proposed Formulations

    Connections to Existing Algorithms
    Connections to Existing Feature Selection Algorithms
    Connections to Other Learning Models
    An Experimental Study of the Algorithms
    Discussions

    Large-Scale Spectral Feature Selection
    Data Partitioning for Parallel Processing
    MPI for Distributed Parallel Computing
    Parallel Spectral Feature Selection
    Computing the Similarity Matrix in Parallel
    Parallelization of the Univariate Formulations
    Parallel MRSF
    Parallel MCSF
    Discussions

    Multi-Source Spectral Feature Selection
    Categorization of Different Types of Knowledge
    A Framework Based on Combining Similarity Matrices
    A Framework Based on Rank Aggregation
    Experimental Results
    Discussions

    References

    Index

    Biography

    Zheng Zhao is a research statistician at the SAS Institute, Inc. His recent research focuses on designing and developing novel analytic approaches for handling large-scale data of extremely high dimensionality. Dr. Zhao is the author of PROC HPREDUCE, which is a SAS High Performance Analytics procedure for large-scale parallel variable selection. He was co-chair of the 2010 PAKDD Workshop on Feature Selection in Data Mining. He earned a Ph.D. in computer science and engineering from Arizona State University.

    Huan Liu is a professor of computer science and engineering at Arizona State University. Dr. Liu serves on journal editorial boards and conference program committees and is a founding organizer of the International Conference Series on Social Computing, Behavioral-Cultural Modeling, and Prediction. He earned a Ph.D. in computer science from the University of Southern California. With a focus on data mining, machine learning, social computing, and artificial intelligence, his research investigates problems in real-world application with high-dimensional data of disparate forms, such as social media, group interaction and modeling, data preprocessing, and text/web mining.