2nd Edition

Statistical Data Mining Using SAS Applications

By George Fernandez Copyright 2011
    478 Pages 151 B/W Illustrations
    by CRC Press

    Statistical Data Mining Using SAS Applications, Second Edition describes statistical data mining concepts and demonstrates the features of user-friendly data mining SAS tools. Integrating the statistical and graphical analysis tools available in SAS systems, the book provides complete statistical data mining solutions without writing SAS program codes or using the point-and-click approach. Each chapter emphasizes step-by-step instructions for using SAS macros and interpreting the results. Compiled data mining SAS macro files are available for download on the author’s website. By following the step-by-step instructions and downloading the SAS macros, analysts can perform complete data mining analysis fast and effectively.

    New to the Second Edition—General Features

    • Access to SAS macros directly from desktop
    • Compatible with SAS version 9, SAS Enterprise Guide, and SAS Learning Edition
    • Reorganization of all help files to an appendix
    • Ability to create publication quality graphics
    • Macro-call error check

    New Features in These SAS-Specific Macro Applications

    • Converting PC data files to SAS data (EXLSAS2 macro)
    • Randomly splitting data (RANSPLIT2)
    • Frequency analysis (FREQ2)
    • Univariate analysis (UNIVAR2)
    • PCA and factor analysis (FACTOR2)
    • Multiple linear regressions (REGDIAG2)
    • Logistic regression (LOGIST2)
    • CHAID analysis (CHAID2)

    Requiring no experience with SAS programming, this resource supplies instructions and tools for quickly performing exploratory statistical methods, regression analysis, logistic regression multivariate methods, and classification analysis. It presents an accessible, SAS macro-oriented approach while offering comprehensive data mining solutions.

    Data Mining: A Gentle Introduction
    Introduction
    Data Mining: Why It Is Successful in the IT World
    Benefits of Data Mining
    Data Mining: Users
    Data Mining: Tools
    Data Mining: Steps
    Problems in the Data Mining Process
    SAS Software the Leader in Data Mining
    Introduction of User-Friendly SAS Macros for Statistical Data Mining

    Preparing Data for Data Mining
    Introduction
    Data Requirements in Data Mining
    Ideal Structures of Data for Data Mining
    Understanding the Measurement Scale of Variables
    Entire Database or Representative Sample
    Sampling for Data Mining
    User-Friendly SAS Applications Used in Data Preparation

    Exploratory Data Analysis
    Introduction
    Exploring Continuous Variables
    Data Exploration: Categorical Variable
    SAS Macro Applications Used in Data Exploration

    Unsupervised Learning Methods
    Introduction
    Applications of Unsupervised Learning Methods
    Principal Component Analysis (PCA)
    Exploratory Factor Analysis (EFA)
    Disjoint Cluster Analysis (DCA)
    Biplot Display of PCA, EFA, and DCA Results
    PCA and EFA Using SAS Macro FACTOR2
    Disjoint Cluster Analysis Using SAS Macro DISJCLS2

    Supervised Learning Methods: Prediction
    Introduction
    Applications of Supervised Predictive Methods
    Multiple Linear Regression Modeling
    Binary Logistic Regression Modeling
    Ordinal Logistic Regression
    Survey Logistic Regression
    Multiple Linear Regression Using SAS Macro REGDIAG2
    Lift Chart Using SAS Macro LIFT2
    Scoring New Regression Data Using the SAS Macro RSCORE2
    Logistic Regression Using SAS Macro LOGIST2
    Scoring New Logistic Regression Data Using the SAS Macro LSCORE2
    Case Study 1: Modeling Multiple Linear Regressions
    Case Study 2: If-Then Analysis and Lift Charts
    Case Study 3: Modeling Multiple Linear Regression with Categorical Variables
    Case Study 4: Modeling Binary Logistic Regression
    Case Study 5: Modeling Binary Multiple Logistic Regression
    Case Study 6: Modeling Ordinal Multiple Logistic Regression

    Supervised Learning Methods: Classification
    Introduction
    Discriminant Analysis
    Stepwise Discriminant Analysis
    Canonical Discriminant Analysis
    Discriminant Function Analysis
    Applications of Discriminant Analysis
    Classification Tree Based on CHAID
    Applications of CHAID
    Discriminant Analysis Using SAS Macro DISCRIM2
    Decision Tree Using SAS Macro CHAID2
    Case Study 1: Canonical Discriminant Analysis and Parametric Discriminant Function Analysis
    Case Study 2: Nonparametric Discriminant Function Analysis
    Case Study 3: Classification Tree Using CHAID

    Advanced Analytics and Other SAS Data Mining Resources
    Introduction
    Artificial Neural Network Methods
    Market Basket Analysis
    SAS Software: The Leader in Data Mining

    Appendix I: Instruction for Using the SAS Macros
    Appendix II: Data Mining SAS Macro Help Files
    Appendix III: Instruction for Using the SAS Macros with Enterprise Guide Code Window

    Index

    A Summary and References appear at the end of each chapter.

    Biography

    George Fernandez is a professor of applied statistical methods and the director of the Center for Research Design and Analysis at the University of Nevada in Reno.

    Its key features include the provision of case studies throughout the sections, downloadable macros and instructions on how to run them. … The step-by-step instructions and the graphical representations of data make it particularly useful to those wishing to communicate complex and technical data to a largely non-specialist audiences.
    —Kassim S. Mwitondi, Journal of Applied Statistics, 2012

    If I had to recommend a good introduction to data mining, I would choose this one.
    — J. A. Pardo, Complutense University of Madrid, Madrid, Spain, in Statistical Papers, 2012

    Like the first edition of the book, this new edition provides a high-level introduction to some important concepts and algorithms in data mining. … the author presents broad statistical data mining solutions without writing SAS program codes. One of the nicest features of this book is that it gives access to SAS macros directly from the desktop and offers to create publication quality graphs. … this new edition provides a simple and straightforward introduction to data mining, along with a number of detailed, worked case studies.
    Technometrics, February 2011

    Praise for the First Edition:
    The macros integrate nicely with SAS’s output delivery system … . this is a book that could serve as an easy-to read introduction to some classical statistical techniques that are used in data mining, and, with the associated macros, provide an opportunity to see those techniques in action.
    Journal of the American Statistical Association, June 2004, Vol. 99, No. 466

    Use of these data mining SAS macros facilitated reliable conversion, examination, and analysis of the data, and selection of best statistical models despite the great size of the data sets. …
    —Christopher Ross, US Bureau of Land Management

    An excellent treatment of data mining using SAS applications is provided in this book. … This book would be suitable for students (as a textbook), data analysts, and experienced SAS programmers. No SAS programming experience, however, is required to benefit from the book.
    Computing Reviews, June 2003

    … the book provides a welcome contrast to treatments of data mining that focus on only the most novel aspects of the subject. Dr. Fernandez is quite right in pointing out that a lot of data mining can be carried out by standard statistical methods in familiar packages. The book also has a healthy emphasis on the use of cross validation (a hallmark of data mining). This and other concepts are well illustrated with numerous examples. Finally, the book demonstrates that the fancy (and expensive) user interfaces sported by many data mining work benches are not essential to the data mining enterprise and might even be counterproductive.
    Computational Statistics, 2005