1st Edition

Understanding Statistics and Statistical Myths How to Become a Profound Learner

By Kicab Castaneda-Mendez Copyright 2016
    585 Pages 37 B/W Illustrations
    by CRC Press

    Addressing 30 statistical myths in the areas of data, estimation, measurement system analysis, capability, hypothesis testing, statistical inference, and control charts, this book explains how to understand statistics rather than how to do statistics. Every statistical myth listed in this book has been stated in course materials used by the author’s clients, by employers, or by experts in training thousands.

    Each myth is an unconditional statement that, when taken literally and at face value, is false. All are false under some conditions while a few are not true under any condition. This book explores the conditions that render false the universality of the statements to help you understand why.

    In the book, six characters discuss various topics taught in a fictional course intended to teach students how to apply statistics to improve processes. The reader follows along and learns as the students apply what they learn to a project in which they are team members.

    Each discussion is like a Platonic dialogue. The purpose of a Platonic dialogue is to analyze a concept, statement, hypothesis, or theory through questions, applications, examples, and counterexamples, to see if it is true, when it is true, and why it is true when it is true. The dialogues will help readers understand why certain statements are not always true under all conditions, as well as when they contradict other myths.

    Myth 1: Two Types of Data—Attribute/Discrete and Measurement/Continuous
    Background
    Measurement Requires Scale
    Gauges or Instruments vs. No Gauges
    Discrete, Categorical, Attribute versus Continuous, Variable: Degree of Information
    Creating Continuous Measures by Changing the "Thing" Measured
    Discrete versus Continuous: Half Test
    Nominal, Ordinal, Interval, Ratio
    Measurement to Compare
    Scale Type versus Data Type
    Scale Taxonomy
    Purpose of Data Classification

    Myth 2: Proportions and Percentages Are Discrete Data
    Background
    Denominator for Proportions and Percentages
    Probabilities
    Classification of Proportions, Percentages, and Probabilities

    Myth 3: s = √[Σ(Xi- x)2/(n- 1)] The Correct Formula for Sample Standard Deviation
    Background
    Correctness of Estimations
    Estimators and Estimates
    Properties of Estimators

    Myth 4: Sample Standard Deviation
    √[Σ(Xi-x)2/(n- 1)] Is Unbiased
    Background
    Degrees of Freedom
    t Distribution
    Definition of Bias
    Removing Bias and Control Charts

    Myth 5: Variances Can Be Added but Not Standard Deviations
    Background
    Sums of Squares and Square Roots: Pythagorean Theorem
    Functions and Operators
    Random Variables
    Independence of Factors
    Other Properties

    Myth 6: Parts and Operators for an MSA Do Not Have to Be Randomly Selected
    Background
    Types of Analyses of Variance
    Making Measurement System Look Better than It Is: Selecting Parts to Cover the Range of Process Variation
    Selecting Both Good and Bad Parts

    Myth 7: % Study (% Contribution, Number of Distinct Categories) Is the Best Criterion for Evaluating a Measurement System for Process Improvement
    Background
    % Contribution versus % Study
    P/T Ratio versus % Study
    Distinguishing between Good and Bad Parts
    Distinguishing Parts That Are Different

    Myth 8: Only Sigma Can Compare Different Processes and Metrics
    Background
    Sigma and Specifications
    Sigma as a Percentage

    Myth 9: Capability Is Not Percent/Proportion of Good Units
    Background
    Capability Indices: Frequency Meeting Specifications
    Capability: Actual versus Potential
    Capability Indices
    Process Capability Time-Dependent
    Meaning of Capability: Short-Cut Calculations

    Myth 10: p = Probability of Making an Error
    Background
    Only Two Types of Errors
    Definition of an Error about Deciding What Is True
    Calculation of p and Evidence for a Hypothesis
    Probability of Making an Error for a Particular Case
    Probability of Data Given Ho versus Probability of Ho Given Data
    Non-probabilistic Decisions

    Myth 11: Need More Data for Discrete Data than Continuous Data Analysis
    Background
    Discrete Examples When n = 1
    Factors That Determine Sample Size
    Relevancy of Data

    Myth 12: Nonparametric Tests Are Less Powerful than Parametric Tests
    Background
    Distribution Free versus Nonparametric
    Comparing Power for the Same Conditions
    Different Formulas for Testing the Same Hypotheses
    Assumptions of Tests
    Comparing Power for the Same Characteristic
    Converting Quantitative Data to Qualitative Data

    Myth 13: Sample Size of 30 Is Acceptable (for Statistical Significance)
    Background
    A Rationale for n = 30
    Contradictory Rules of Thumb
    Uses of Data
    Sample Size as a Function of Alpha, Beta, Delta, and Sigma
    Sample Size for Practical Use
    Sample Size and Statistical Significance

    Myth 14: Can Only Fail to Reject Ho, Can Never Accept Ho
    Background
    Proving Theories: Sufficient versus Necessary
    Prove versus Accept versus Fail to Reject: Actions
    Innocent versus Guilty: Problems with Example
    Two-Choice Testing
    Significance Testing and Confidence Intervals
    Hypothesis Testing and Power
    Null Hypothesis of ≥ or ≤
    Practical Cases
    Which Hypothesis Has the Equal Sign?
    Bayesian Statistics: Probability of Hypothesis

    Myth 15: Control Limits Are ±3 Standard Deviations from the Center Line
    Background
    Standard Error versus Standard Deviation
    Within- versus between-Subgroup Variation: How Control Charts Work
    I Chart of Individuals

    Myth 16: Control Chart Limits Are Empirical Limits
    Background
    Definition of Empirical
    Empirical Limits versus Limits Justified Empirically
    Shewhart’s Evidence of Limits Being Empirical
    Wheeler’s Empirical Rule
    Empirical Justification for a Purpose

    Myth 17: Control Chart Limits Are Not Probability Limits
    Background
    Association of Probabilities and Control Chart Limits
    Can Control Limits Be Probability Limits?
    False Alarm Rates for All Special Cause Patterns
    Wheeler Uses Probability Limits
    Other Uses of Probability Limits

    Myth 18: ±3 Sigma Limits Are the Most Economical Control Chart Limits
    Background
    Evidence for 3–Standard Error Limits Being Economically Best
    Evidence against 3–Standard Error Limits Being the Best Economically
    Counterexamples: Simple Cost Model Other Out-of-Control Rules—Assignable Causes Shewhart Didn’t Find but Exist
    Small Changes Are Not Critical to Detect versus Taguchi’s Loss Function
    Importance of Subgroup Size and Frequency on Economic Value of Control Chart Limits
    Purpose to Detect Lack of Control—3–Standard Error Limits Misplaced

    Myth 19: Statistical Inferences Are Inductive Inferences
    Background
    Reasoning: Validity and Soundness
    Induction versus Deduction
    Four Cases of Inductive Inferences
    Statistical Inferences: Probability Distributions
    Inferences about Population Parameters
    Deductive Statistical Inferences: Hypothesis Testing
    Deductive Statistical Inferences: Estimation
    Real-World Cases of Statistical Inferences

    Myth 20: There Is One Universe or Population If Data Are Homogeneous
    Background
    Definition of Homogeneous
    Is Displaying Stability Required for Universes to Exist?
    Are There Always Multiple Universes If Data Display Instability?
    Is There Only One Universe If Data Appropriately Plotted Display Stability?
    Control Chart Framework: Valid and Invalid Conclusions

    Myth 21: Control Charts Are Analytic Studies
    Background
    Enumerative versus Analytic Distinguishing Characteristics
    Enumerative Problem, Study, and Solution
    Analytic Problem, Study, and Solution
    Procedures for Enumerative and Analytic Studies
    Are Control Charts Enumerative or Analytic Studies?
    Cause–Effect Relationship
    An Analytic Study Answers "When?"

    Myth 22: Control Charts Are Not Tests of Hypotheses
    Background
    Definition and Structure of Hypothesis Test
    Control Chart as a General Hypothesis Test
    Statistical Hypothesis Testing: Alpha and p
    Analysis of Means
    Shewhart’s View on Control Charts as Tests of Hypotheses
    Deming’s Argument: No Definable, Finite, Static Population
    Woodall’s Two Phases of Control Chart Use
    Finite, Static Universe
    Control Charts as Nonparametric Tests of Hypotheses
    Utility of Viewing Control Charts as Statistical Hypothesis Tests
    Is the Process in Control? versus What Is the Probability the Process Changed?

    Myth 23: Process Needs to Be Stable to Calculate Process Capability
    Background
    Stability and Capability: Dependent or Independent?
    Actual Performance and Potential Capability versus Stability
    Process Capability: Reliability of Estimates
    Control Charts Are Fallible
    Capable: 100% or Less than 100% Meeting Specifications
    Process Capability: "Best" Performance versus Sustainability
    Cp versus P/T
    Random Sampling
    Response Surface Studies

    Myth 24: Specifications Don’t Belong on Control Charts
    Background
    Run Charts
    Charts of Individual Values
    Confusion Having Both Control and Specification Limits on Charts
    Stability, Performance, and Capability
    Specifications on Averages and Variation

    Myth 25: Identify and Eliminate Assignable Causes of Variation
    Background
    Assignable Causes versus Process Change
    Is Increase in Process Variation Always Bad?
    Good Assignable Causes

    Myth 26: Process Needs to Be Stable before You Can Improve It
    Background
    History of Improvement before the 1920s
    Control Chart Fallibility
    Stabilizing a Process and Improving It
    Stability Required versus Four States of a Process
    Shewhart’s Counterexample

    Myth 27: Stability (Homogeneity) Is Required to Establish a Baseline
    Background
    Purpose of Baseline
    Just-Do-It Projects
    Natural Processes
    Processes Whose Output We Want to Be "Out of Control"
    Meaning of "Meaningless"
    Daily Comparisons
    "True" Process Average: Process, Outputs, Characteristics, and Measures
    Ways to Compare
    Universe or Population and Descriptive Statistics
    Random Sampling
    When Is Homogeneity/Stability Not Required or Unimportant?

    Myth 28: A Process Must Be Stable to Be Predictable
    Background
    Types of Predictions: Interpolation and Extrapolation
    Interpolation: Stability versus Instability
    Conditional Predictions
    Extrapolation: Stability versus Instability
    Fallibility of Control Chart Stability
    Control Charts in Daily Life
    Statistical versus Causal Control

    Myth 29: Adjusting a Process Based on a Single Defect Is Tampering, Causing Increased Process Variation
    Background
    Definition of Tampering Zero versus One versus Multiple Defects to Define Tampering
    Role of Theory and Understanding When Adjusting
    Defects Arise from Special Causes: Anomalies
    Control Limits versus Specification Limits
    Actions for Common Cause Signals versus Special Cause Signals
    Is Reducing Common Cause Variation Always Good?
    Fundamental Change versus Tampering
    Funnel Exercise: Counterexample

    Myth 30: No Assumptions Required When the Data Speak for Themselves
    Background
    Simpson’s Paradox
    Math and Descriptive Statistics: Adding versus Aggregating
    Inferences versus Facts: Conditions for Paradoxes
    Assumptions for Modeling
    Assumptions for Causal Inferences
    Assumptions for Inferences from Reasons

    Epilogue

    References

    Index

    Biography

    Kicab Castaneda-Mendez, founder of Process Excellence Consultants, Chapel Hill, NC, provides consulting and training on operational excellence using lean Six Sigma methodologies, balanced scorecard and Baldrige framework.