2nd Edition

Data Mining A Tutorial-Based Primer, Second Edition

By Richard J. Roiger Copyright 2017
    530 Pages 295 B/W Illustrations
    by Chapman & Hall

    529 Pages 295 B/W Illustrations
    by Chapman & Hall

    Data Mining: A Tutorial-Based Primer, Second Edition provides a comprehensive introduction to data mining with a focus on model building and testing, as well as on interpreting and validating results. The text guides students to understand how data mining can be employed to solve real problems and recognize whether a data mining solution is a feasible alternative for a specific problem. Fundamental data mining strategies, techniques, and evaluation methods are presented and implemented with the help of two well-known software tools.

    Several new topics have been added to the second edition including an introduction to Big Data and data analytics, ROC curves, Pareto lift charts, methods for handling large-sized, streaming and imbalanced data, support vector machines, and extended coverage of textual data mining. The second edition contains tutorials for attribute selection, dealing with imbalanced data, outlier analysis, time series analysis, mining textual data, and more.

    The text provides in-depth coverage of RapidMiner Studio and Weka’s Explorer interface. Both software tools are used for stepping students through the tutorials depicting the knowledge discovery process. This allows the reader maximum flexibility for their hands-on data mining experience.

     

     

    Data Mining Fundamentals

    Data Mining: A First View
    DATA SCIENCE, ANALYTICS, MINING, AND KNOWLEDGE DISCOVERY IN DATABASES 
    WHAT CAN COMPUTERS LEARN? 
    IS DATA MINING APPROPRIATE FOR MY PROBLEM? 
    DATA MINING OR KNOWLEDGE ENGINEERING? 
    A NEAREST NEIGHBOR APPROACH
    DATA MINING, BIG DATA, AND CLOUD COMPUTING
    DATA MINING ETHICS
    INTRINSIC VALUE AND CUSTOMER CHURN
    CHAPTER SUMMARY 
    KEY TERMS

    Data Mining: A Closer Look
    DATA MINING STRATEGIES
    SUPERVISED DATA MINING TECHNIQUES
    ASSOCIATION RULES
    CLUSTERING TECHNIQUES
    EVALUATING PERFORMANCE
    CHAPTER SUMMARY
    KEY TERMS

    Basic Data Mining Techniques
    CHAPTER OBJECTIVES
    DECISION TREES
    A BASIC COVERING RULE ALGORITHM
    GENERATING ASSOCIATION RULES
    THE K-MEANS ALGORITHM
    GENETIC LEARNING
    CHOOSING A DATA MINING TECHNIQUE
    CHAPTER SUMMARY
    KEY TERMS

     

    Tools for Knowledge Discovery

    Weka—An Environment for Knowledge Discovery
    GETTING STARTED WITH WEKA
    BUILDING DECISION TREES
    GENERATING PRODUCTION RULES WITH PART
    ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION
    ASSOCIATION RULES
    COST/BENEFIT ANALYSIS
    UNSUPERVISED CLUSTERING WITH THE K-MEANS ALGORITHM
    CHAPTER SUMMARY

    Knowledge Discovery with RapidMiner
    GETTING STARTED WITH RAPIDMINER
    BUILDING DECISION TREES
    GENERATING RULES
    ASSOCIATION RULE LEARNING
    UNSUPERVISED CLUSTERING WITH K-MEANS
    ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION
    CHAPTER SUMMARY

    The Knowledge Discovery Process
    A PROCESS MODEL FOR KNOWLEDGE DISCOVERY
    GOAL IDENTIFICATION 2016.3 CREATING A TARGET DATA SET
    DATA PREPROCESSING
    DATA TRANSFORMATION
    DATA MINING
    INTERPRETATION AND EVALUATION
    TAKING ACTION
    THE CRISP-DM PROCESS MODEL
    CHAPTER SUMMARY
    KEY TERMS

    Formal Evaluation Techniques
    WHAT SHOULD BE EVALUATED?
    TOOLS FOR EVALUATION
    COMPUTING TEST SET CONFIDENCE INTERVALS
    COMPARING SUPERVISED LEARNER MODELS
    UNSUPERVISED EVALUATION TECHNIQUES
    EVALUATING SUPERVISED MODELS WITH NUMERIC OUTPUT
    COMPARING MODELS WITH RAPIDMINER
    ATTRIBUTE EVALUATION FOR MIXED DATA TYPES
    PARETO LIFT CHARTS
    CHAPTER SUMMARY
    KEY TERMS

     

    Building Neural Networks

    Neural Networks
    FEED-FORWARD NEURAL NETWORKS
    NEURAL NETWORK TRAINING: A CONCEPTUAL VIEW
    NEURAL NETWORK EXPLANATION
    GENERAL CONSIDERATIONS
    NEURAL NETWORK TRAINING: A DETAILED VIEW
    CHAPTER SUMMARY
    KEY TERMS

    Building Neural Networks with Weka
    DATA SETS FOR BACKPROPAGATION LEARNING
    MODELING THE EXCLUSIVE-OR FUNCTION: NUMERIC OUTPUT
    MODELING THE EXCLUSIVE-OR FUNCTION: CATEGORICAL OUTPUT
    MINING SATELLITE IMAGE DATA
    UNSUPERVISED NEURAL NET CLUSTERING 
    CHAPTER SUMMARY
    KEY TERMS

    Building Neural Networks with RapidMiner
    MODELING THE EXCLUSIVE-OR FUNCTION
    MINING SATELLITE IMAGE DATA
    PREDICTING CUSTOMER CHURN
    RAPIDMINER’S SELF-ORGANIZING MAP OPERATOR
    CHAPTER SUMMARY

     

    Advanced Data Mining Techniques

    Supervised Statistical Techniques
    BAYES CLASSIFIER
    SUPPORT VECTOR MACHINES
    LINEAR REGRESSION ANALYSIS
    REGRESSION TREES
    LOGISTIC REGRESSION
    CHAPTER SUMMARY
    KEY TERMS

    Unsupervised Clustering Techniques
    AGGLOMERATIVE CLUSTERING
    CONCEPTUAL CLUSTERING
    EXPECTATION MAXIMIZATION
    GENETIC ALGORITHMS AND UNSUPERVISED CLUSTERING
    CHAPTER SUMMARY
    KEY TERMS

    Specialized Techniques
    TIME-SERIES ANALYSIS
    MINING THE WEB
    MINING TEXTUAL DATA
    TECHNIQUES FOR LARGE-SIZED, IMBALANCED, AND STREAMING DATA
    ENSEMBLE TECHNIQUES FOR IMPROVING PERFORMANCE
    CHAPTER SUMMARY
    KEY TERMS

    The Data Warehouse
    OPERATIONAL DATABASES
    DATA WAREHOUSE DESIGN
    ONLINE ANALYTICAL PROCESSING
    EXCEL PIVOT TABLES FOR DATA ANALYTICS
    CHAPTER SUMMARY
    KEY TERMS

    Biography

    Richard J. Roiger is a professor emeritus at Minnesota State University, Mankato where he taught and performed research in the Computer & Information Science Department for 27 years. Dr. Roiger’s Ph.D. degree is in Computer & Information Sciences from the University of Minnesota. Dr. Roiger continues to serve as a part-time faculty member teaching courses in data mining, artificial intelligence and research methods. Richard enjoys interacting with his grandchildren, traveling, writing and pursuing his musical talents.

    "Dr. Roiger does an excellent job of describing in step by step detail formulae involved in various data mining algorithms, along with illustrations. In addition, his tutorials in Weka software provide excellent grounding for students in comprehending the underpinnings of Machine Learning as applied to Data Mining. The inclusion of RapidMiner software tutorials and examples in the book is also a definite plus since it is one of the most popular Data Mining software platforms in use today."
    --Robert Hughes, Golden Gate University, San Francisco, CA, USA