1st Edition

Knowledge Discovery from Data Streams

By Joao Gama Copyright 2010
    258 Pages 62 B/W Illustrations
    by Chapman & Hall

    Since the beginning of the Internet age and the increased use of ubiquitous computing devices, the large volume and continuous flow of distributed data have imposed new constraints on the design of learning algorithms. Exploring how to extract knowledge structures from evolving and time-changing data, Knowledge Discovery from Data Streams presents a coherent overview of state-of-the-art research in learning from data streams.

    The book covers the fundamentals that are imperative to understanding data streams and describes important applications, such as TCP/IP traffic, GPS data, sensor networks, and customer click streams. It also addresses several challenges of data mining in the future, when stream mining will be at the core of many applications. These challenges involve designing useful and efficient data mining solutions applicable to real-world problems. In the appendix, the author includes examples of publicly available software and online data sets.

    This practical, up-to-date book focuses on the new requirements of the next generation of data mining. Although the concepts presented in the text are mainly about data streams, they also are valid for different areas of machine learning and data mining.

    Knowledge Discovery from Data Streams
    Introduction
    An Illustrative Example
    A World in Movement
    Data Mining and Data Streams

    Introduction to Data Streams
    Data Stream Models
    Basic Streaming Methods
    Illustrative Applications

    Change Detection
    Introduction
    Tracking Drifting Concepts
    Monitoring the Learning Process
    Final Remarks

    Maintaining Histograms from Data Streams
    Introduction
    Histograms from Data Streams
    The Partition Incremental Discretization (PiD) Algorithm
    Applications to Data Mining

    Evaluating Streaming Algorithms
    Introduction
    Learning from Data Streams
    Evaluation Issues
    Lessons Learned and Open Issues

    Clustering from Data Streams
    Introduction
    Clustering Examples
    Clustering Variables

    Frequent Pattern Mining
    Introduction to Frequent Itemset Mining
    Heavy Hitters
    Mining Frequent Itemsets from Data Streams
    Sequence Pattern Mining

    Decision Trees from Data Streams
    Introduction
    The Very Fast Decision Tree Algorithm
    Extensions to the Basic Algorithm
    OLIN: Info-Fuzzy Algorithms

    Novelty Detection in Data Streams
    Introduction
    Learning and Novelty
    Novelty Detection as a One-Class Classification Problem
    Learning New Concepts
    The Online Novelty and Drift Detection Algorithm

    Ensembles of Classifiers
    Introduction
    Linear Combination of Ensembles
    Sampling from a Training Set
    Ensembles of Trees
    Adapting to Drift Using Ensembles of Classifiers
    Mining Skewed Data Streams with Ensembles

    Time Series Data Streams
    Introduction to Time Series Analysis
    Time Series Prediction
    Similarity between Time Series
    Symbolic Approximation (SAX)

    Ubiquitous Data Mining
    Introduction to Ubiquitous Data Mining
    Distributed Data Stream Monitoring
    Distributed Clustering
    Algorithm Granularity

    Final Comments
    The Next Generation of Knowledge Discovery
    Where We Want to Go

    Appendix: Resources

    Bibliography

    Index

    Notes appear at the end of each chapter.

    Biography

    João Gama is an associate professor and senior researcher in the Laboratory of Artificial Intelligence and Decision Support (LIAAD) at the University of Porto in Portugal.

    … this book is the first authored text (that is, not an edited collection) about the area … The book covers a lot of ground in just 200 pages, including discussion of relatively advanced methods such as wavelets, bagging, boosting, dynamic time warping, and symbolic representation of time series. There is also, I was pleased to see, a chapter on evaluating streaming algorithms … . Evaluation, in general, deserves more attention than it generally receives, so I was delighted to see the focus on it here. … a good introduction to an area of data analysis which is going to be very important indeed.
    —David J. Hand, International Statistical Review, 2012

    Gama is one of the leading investigators in the hottest research topic in machine learning and data mining: data streams. … This book is the first book to didactically cover in a clear, comprehensive and mathematically rigorous way the main machine learning related aspects of this relevant research field. … an up-to-date, broad and useful source of reference for all those interested in knowledge acquisition by learning techniques.
    —From the Foreword by André Ponce de Leon Ferreira de Carvalho, University of São Paulo, Brazil