Semisupervised Learning for Computational Linguistics

Series:
Published:
Author(s):

Purchasing Options

Hardback
$94.95
Add to cart
ISBN 9781584885597
Cat# C5599
 

Features

  • Offers applications in information extraction, parsing, and word senses, such as WordNet
  • Provides background material in machine learning that includes the areas of classification and clustering
  • Covers a variety of methods, including co-boosting, transductive SVMs, McLachlan's algorithm, and the EM algorithm
  • Examines in detail the concept of label propagation in a graph
  • Discusses spectral methods, including the definition of harmonics, the eigenvectors of matrices and graphs, spectral clustering, and the connection to label propagation
  • Introduces the necessary mathematics in a just-in-time manner
  • Summary

    The rapid advancement in the theoretical understanding of statistical and machine learning methods for semisupervised learning has made it difficult for nonspecialists to keep up to date in the field. Providing a broad, accessible treatment of the theory as well as linguistic applications, Semisupervised Learning for Computational Linguistics offers self-contained coverage of semisupervised methods that includes background material on supervised and unsupervised learning.

    The book presents a brief history of semisupervised learning and its place in the spectrum of learning methods before moving on to discuss well-known natural language processing methods, such as self-training and co-training. It then centers on machine learning techniques, including the boundary-oriented methods of perceptrons, boosting, support vector machines (SVMs), and the null-category noise model. In addition, the book covers clustering, the expectation-maximization (EM) algorithm, related generative methods, and agreement methods. It concludes with the graph-based method of label propagation as well as a detailed discussion of spectral methods.

    Taking an intuitive approach to the material, this lucid book facilitates the application of semisupervised learning methods to natural language processing and provides the framework and motivation for a more systematic study of machine learning.

    Table of Contents

    INTRODUCTION
    A brief history
    Semisupervised learning
    Organization and assumptions

    SELF-TRAINING AND CO-TRAINING
    Classification
    Self-training
    Co-training

    APPLICATIONS OF SELF-TRAINING AND CO-TRAINING
    Part-of-speech tagging
    Information extraction
    Parsing
    Word senses

    CLASSIFICATION
    Two simple classifiers
    Abstract setting
    Evaluating detectors and classifiers that abstain
    Binary classifiers and ECOC

    MATHEMATICS FOR BOUNDARY-ORIENTED METHODS
    Linear separators
    The gradient
    Constrained optimization

    BOUNDARY-ORIENTED METHODS
    The perceptron
    Game self-teaching
    Boosting
    Support vector machines (SVMs)
    Null-category noise model

    CLUSTERING
    Cluster and label
    Clustering concepts
    Hierarchical clustering
    Self-training revisited
    Graph mincut
    Label propagation
    Bibliographic notes

    GENERATIVE MODELS
    Gaussian mixtures
    The EM algorithm

    AGREEMENT CONSTRAINTS
    Co-training
    Agreement-based self-teaching
    Random fields
    Bibliographic notes

    PROPAGATION METHODS
    Label propagation
    Random walks
    Harmonic functions
    Fluids
    Computing the solution
    Graph mincuts revisited
    Bibliographic notes

    MATHEMATICS FOR SPECTRAL METHODS
    Some basic concepts
    Eigenvalues and eigenvectors
    Eigenvalues and the scaling effects of a matrix
    Bibliographic notes

    SPECTRAL METHODS
    Simple harmonic motion
    Spectra of matrices and graphs
    Spectral clustering
    Spectral methods for semisupervised learning
    Bibliographic notes

    BIBLIOGRAPHY
    INDEX

    Editorial Reviews

    "…I would have loved to have had this book when I started working as a computational linguist … The book is well laid out, enjoyable to read, and the formulae aesthetically presented … The book does a very amicable job of being self-contained given the number of subjects and size of the book. I would recommend this book to mathematicians, statisticians, and libraries alike."
    CHOICE, February 2009

    "However when it works, it works well, and whereas the book provides great breadth, but little depth, it will be a useful springboard for the beginning student."

    – Chris J.C. Burges, Microsoft Research, in Journal of the American Statistical Association, June 2009, Vol. 104, No. 486

     

    Related Titles