Text Mining: Classification, Clustering, and Applications

Series:
Published:
Content:
Editor(s):
Free Standard Shipping

Purchasing Options

Hardback
ISBN 9781420059403
Cat# C5940

$96.95

$77.56

SAVE 20%


eBook (VitalSource)
ISBN 9781420059458
Cat# CE5940

$96.95

$67.87

SAVE 30%


eBook Rentals

Other eBook Options:
 

Features

  • Provides broad coverage of many leading methods and applications
  • Contains contributions from the world’s foremost leaders in text mining
  • Goes above and beyond traditional aspects of text mining by including both well-tested methods as well as new research at the forefront of text mining
  • Presents numerous real-world examples

Summary

The Definitive Resource on Text Mining Theory and Applications from Foremost Researchers in the Field

Giving a broad perspective of the field from numerous vantage points, Text Mining: Classification, Clustering, and Applications focuses on statistical methods for text mining and analysis. It examines methods to automatically cluster and classify text documents and applies these methods in a variety of areas, including adaptive information filtering, information distillation, and text search.

The book begins with chapters on the classification of documents into predefined categories. It presents state-of-the-art algorithms and their use in practice. The next chapters describe novel methods for clustering documents into groups that are not predefined. These methods seek to automatically determine topical structures that may exist in a document corpus. The book concludes by discussing various text mining applications that have significant implications for future research and industrial use.

There is no doubt that text mining will continue to play a critical role in the development of future information systems and advances in research will be instrumental to their success. This book captures the technical depth and immense practical potential of text mining, guiding readers to a sound appreciation of this burgeoning field.

Table of Contents

Analysis of Text Patterns Using Kernel Methods
Marco Turchi, Alessia Mammone, and Nello Cristianini

Introduction

General Overview on Kernel Methods

Kernels for Text

Example

Conclusion and Further Reading

Detection of Bias in Media Outlets with Statistical Learning Methods
Blaz Fortuna, Carolina Galleguillos, and Nello Cristianini

Introduction

Overview of the Experiments

Data Collection and Preparation

News Outlet Identification

Topic-Wise Comparison of Term Bias

News Outlets Map

Related Work

Conclusion

Appendix A: Support Vector Machines

Appendix B: Bag of Words and Vector Space Models

Appendix C: Kernel Canonical Correlation Analysis

Appendix D: Multidimensional Scaling

Collective Classification for Text Classification
Galileo Namata, Prithviraj Sen, Mustafa Bilgic, and Lise Getoor

Introduction

Collective Classification: Notation and Problem Definition

Approximate Inference Algorithms for Approaches Based on Local Conditional Classifiers

Approximate Inference Algorithms for Approaches Based on Global Formulations

Learning the Classifiers

Experimental Comparison

Related Work

Conclusion

Topic Models
David M. Blei and John D. Lafferty

Introduction

Latent Dirichlet Allocation (LDA)

Posterior Inference for LDA

Dynamic Topic Models and Correlated Topic Models

Discussion

Nonnegative Matrix and Tensor Factorization for Discussion Tracking
Brett W. Bader, Michael W. Berry, and Amy N. Langville

Introduction

Notation

Tensor Decompositions and Algorithms

Enron Subset

Observations and Results

Visualizing Results of the NMF Clustering

Future Work

Text Clustering with Mixture of von Mises–Fisher Distributions
Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, and Suvrit Sra

Introduction

Related Work

Preliminaries

EM on a Mixture of vMFs (moVMF)

Handling High-Dimensional Text Datasets

Algorithms

Experimental Results

Discussion

Conclusions and Future Work

Constrained Partitional Clustering of Text Data: An Overview
Sugato Basu and Ian Davidson

Introduction

Uses of Constraints

Text Clustering

Partitional Clustering with Constraints

Learning Distance Function with Constraints

Satisfying Constraints and Learning Distance Functions

Experiments

Conclusions

Adaptive Information Filtering
Yi Zhang

Introduction

Standard Evaluation Measures

Standard Retrieval Models and Filtering Approaches

Collaborative Adaptive Filtering

Novelty and Redundancy Detection

Other Adaptive Filtering Topics

Utility-Based Information Distillation
Yiming Yang and Abhimanyu Lad

Introduction

A Sample Task

Technical Cores

Evaluation Methodology

Data

Experiments and Results

Concluding Remarks

Text Search Enhanced with Types and Entities
Soumen Chakrabarti, Sujatha Das, Vijay Krishnan, and Kriti Puniyani

Entity-Aware Search Architecture

Understanding the Question

Scoring Potential Answer Snippets

Indexing and Query Processing

Conclusion

Index

Editor Bio(s)

Ashok N. Srivastava is the Principal Investigator of the Integrated Vehicle Health Management research project in the NASA Aeronautics Research Mission Directorate. Dr. Srivastava also leads the Intelligent Data Understanding group at NASA Ames Research Center.

Mehran Sahami is an Associate Professor and Associate Chair for Education in the computer science department at Stanford University.

Editorial Reviews

… a very good overview of some state-of-the-art capabilities. … In summary, the book provides several algorithms for text mining classification, clustering, and applications, including both mathematical background and experimental observations. For readers interested in specific areas, there are several useful references. Researchers can use this book to learn more about today's field of text mining.
Computing Reviews, March 2010

… Not long ago people were expressing concern about the deluge of information with which we were being faced. Tools such as those described in this book present one way in which we might cope with this deluge. The separate contributions are well written, and there does seem to be a consistency which can only have arisen from sound editorial work … . This would be a perfect volume to give a new Ph.D. student about to start work on statistical and data mining methods of text analysis, and perhaps casting about for a particular area of methodology on which to focus, or for a particular application area to address. It provides a first-class overview of the scope of an area which can only grow in importance in the coming years.
—David J. Hand, International Statistical Review, 2010

This book is a worthy contribution to the field of text mining. By focusing on classification (rather than exhaustively covering extraction, summarization, and other tasks), it achieves the right balance of coherence and comprehensiveness. It collects papers by the leading authors in the field, who employ and explain a variety of techniques—kernel methods, link analysis, latent Dirichlet allocation, non-negative matrix factorization, and others. Together the papers bring unity and clarity to a disjointed and sometimes perplexing field and serve as the perfect introduction for an advanced student.
—Peter Norvig, Director of Research, Google, Inc., Mountain View, California, USA

This is a state-of-the-art, outstanding collection of overviews on text mining by a group of leading researchers in the field. The book meets an imminent need for an up-to-date overview of this exciting, dynamic research frontier and may serve as an excellent textbook on text mining for graduate students and researchers in the field as well.
—Jiawei Han, University of Illinois at Urbana-Champaign, USA