Other eBook Options:

- Contains all of the key elements required for statistical modeling and predictive analytics
- Covers a wide range of important but difficult-to-find topics
- Gives a step-by-step mathematical derivation of each technique, from the underlying assumptions to final conclusion
- Discusses the practical aspects of modeling and predicting, with many examples from consumer behavior modeling and more
- Provides software and examples at www.DataMinerXL.com

Drawing on the authors’ two decades of experience in applied modeling and data mining, **Foundations of Predictive Analytics** presents the fundamental background required for analyzing data and building models for many practical applications, such as consumer behavior modeling, risk and marketing analytics, and other areas. It also discusses a variety of practical topics that are frequently missing from similar texts.

The book begins with the statistical and linear algebra/matrix foundation of modeling methods, from distributions to cumulant and copula functions to Cornish–Fisher expansion and other useful but hard-to-find statistical techniques. It then describes common and unusual linear methods as well as popular nonlinear modeling approaches, including additive models, trees, support vector machine, fuzzy systems, clustering, naïve Bayes, and neural nets. The authors go on to cover methodologies used in time series and forecasting, such as ARIMA, GARCH, and survival analysis. They also present a range of optimization techniques and explore several special topics, such as Dempster–Shafer theory.

An in-depth collection of the most important fundamental material on predictive analytics, this self-contained book provides the necessary information for understanding various techniques for exploratory data analysis and modeling. It explains the algorithmic details behind each technique (including underlying assumptions and mathematical formulations) and shows how to prepare and encode data, select variables, use model goodness measures, normalize odds, and perform reject inference.

* Web Resource*The book’s website at www.DataMinerXL.com offers the DataMinerXL software for building predictive models. The site also includes more examples and information on modeling.

**Introduction**What Is a Model?

What Is a Statistical Model?

The Modeling Process

Modeling Pitfalls

Characteristics of Good Modelers

The Future of Predictive Analytics

**Properties of Statistical Distributions**Fundamental Distributions

Central Limit Theorem

Estimate of Mean, Variance, Skewness, and Kurtosis from Sample Data

Estimate of the Standard Deviation of the Sample Mean

(Pseudo) Random Number Generators

Transformation of a Distribution Function

Distribution of a Function of Random Variables

Moment Generating Function

Cumulant Generating Function

Characteristic Function

Chebyshev’s Inequality

Markov’s Inequality

Gram–Charlier Series

Edgeworth Expansion

Cornish–Fisher Expansion

Copula Functions

**Important Matrix Relationships**Pseudo Matrix Inversion

A Lemma of Matrix Inversion

Identity for a Matrix Determinant

Inversion of Partitioned Matrix

Determinant of Partitioned Matrix

Matrix Sweep and Partial Correlation

Singular Value Decomposition (SVD)

Diagonalization of a Matrix

Spectral Decomposition of a Positive Semi-Definite Matrix

Normalization in Vector Space

Conjugate Decomposition of a Symmetric Definite Matrix

Cholesky Decomposition

Cauchy–Schwartz Inequality

Relationship of Correlation among Three Variables

**Linear Modeling and Regression**Properties of Maximum Likelihood Estimators

Linear Regression

Fisher’s Linear Discriminant Analysis

Principal Component Regression (PCR)

Factor Analysis

Partial Least Squares Regression (PLSR)

Generalized Linear Model (GLM)

Logistic Regression: Binary

Logistic Regression: Multiple Nominal

Logistic Regression: Proportional Multiple Ordinal

Fisher Scoring Method for Logistic Regression

Tobit Model: A Censored Regression Model

**Nonlinear Modeling**Naive Bayesian Classifier

Neural Network

Segmentation and Tree Models

Additive Models

Support Vector Machine (SVM)

Fuzzy Logic System

Clustering

**Time Series Analysis**Fundamentals of Forecasting

ARIMA Models

Survival Data Analysis

Exponentially Weighted Moving Average (EWMA) and GARCH(1, 1)

**Data Preparation and Variable Selection**Data Quality and Exploration

Variable Scaling and Transformation

How to Bin Variables

Interpolation in 1-D and 2-D

Weight of Evidence (WOE) Transformation

Variable Selection Overview

Missing Data Imputation

Step-Wise Selection Methods

Mutual Information, KL Distance

Detection of Multicollinearity

**Model Goodness Measures**Training, Testing, Validation

Continuous Dependent Variable

Binary Dependent Variable (2-Group Classification)

Population Stability Index Using Relative Entropy

**Optimization Methods**Lagrange Multiplier

Gradient Descent Method

Newton–Raphson Method

Conjugate Gradient Method

Quasi-Newton Method

Genetic Algorithms (GA)

Simulated Annealing

Linear Programming

Nonlinear Programming (NLP)

Nonlinear Equations

Expectation-Maximization (EM) Algorithm

Optimal Design of Experiment

**Miscellaneous Topics**Multidimensional Scaling

Simulation

Odds Normalization and Score Transformation

Reject Inference

Dempster–Shafer Theory of Evidence

**Appendix AAppendix B: DataMinerXL — Microsoft Excel Add-in for Building Predictive Models**

**Bibliography**

**Index**

**James Wu** is a Fixed Income Quant with extensive expertise in a wide variety of applied analytical solutions in consumer behavior modeling and financial engineering. He previously worked at ID Analytics, Morgan Stanley, JPMorgan Chase, Los Alamos Computational Group, and CASA. He earned a PhD from the University of Idaho.

**Stephen Coggeshall** is the Chief Technology Officer of ID Analytics. He previously worked at Los Alamos Computational Group, Morgan Stanley, HNC Software, CASA, and Los Alamos National Laboratory. During his over 20 year career, Dr. Coggeshall has helped teams of scientists develop practical solutions to difficult business problems using advanced analytics. He earned a PhD from the University of Illinois and was named 2008 Technology Executive of the Year by the *San Diego Business Journal*.