- Helps readers improve their understanding of the role of regression models in the medical field
- Illustrates each technique with a concrete example, enabling readers to better appreciate the properties and theory of the methods
- Uses Stata to demonstrate the practical use of the models
- Discusses how and when regression models can fail
- Describes the basic principles behind statistical computations, with more mathematical details given in the appendices
- Offers the data sets, solutions to all exercises, and a short introduction to Stata on the author’s website

*Figure slides available with qualifying course adoption*

While regression models have become standard tools in medical research, understanding how to properly apply the models and interpret the results is often challenging for beginners. **Regression Models as a Tool in Medical Research** presents the fundamental concepts and important aspects of regression models most commonly used in medical research, including the classical regression model for continuous outcomes, the logistic regression model for binary outcomes, and the Cox proportional hazards model for survival data. The text emphasizes adequate use, correct interpretation of results, appropriate presentation of results, and avoidance of potential pitfalls.

After reviewing popular models and basic methods, the book focuses on advanced topics and techniques. It considers the comparison of regression coefficients, the selection of covariates, the modeling of nonlinear and nonadditive effects, and the analysis of clustered and longitudinal data, highlighting the impact of selection mechanisms, measurement error, and incomplete covariate data. The text then covers the use of regression models to construct risk scores and predictors. It also gives an overview of more specific regression models and their applications as well as alternatives to regression modeling. The mathematical details underlying the estimation and inference techniques are provided in the appendices.

**THE BASICS Why Use Regression Models? **Why using simple regression models?

Why using multiple regression models?

Some basic notation

**An Introductory Example**A single line model

Fitting a single line model

Taking uncertainty into account

A two lines model

How to perform these steps with Stata

Exercise

Exercise

Exercise

**The Classical Multiple Regression Model**

**Adjusted Effects **Adjusting for confounding

Adjusting for imbalances

Exercise

**Inference for the Classical Multiple Regression Model**The traditional and the modern way of inference

How to perform the modern way of inference with Stata

How valid and good are least squares estimates?

A note on the use and interpretation of p-values in regression analyses

**Logistic Regression**The definition of the logistic regression model

Analyzing a dose response experiment by logistic regression

How to fit a dose response model with Stata

Estimating odds ratios and adjusted odds ratios using logistic regression

How to compute (adjusted) odds ratios using logistic regression in Stata

Exercise

More on logit scale and odds scale

**Inference for the Logistic Regression Model**The maximum likelihood principle

Properties of the ML estimates for logistic regression

Inference for a single regression parameter

How to perform Wald tests and likelihood ratio tests in Stata

**Categorical Covariates**Incorporating categorical covariates in a regression model

Some technicalities in using categorical covariates

Testing the effect of a categorical covariate

The handling of categorical covariates in Stata

Presenting results of a regression analysis involving categorical covariates in a table

Exercise

Exercise

**Handling Ordered Categories: A First Lesson in Regression Modeling Strategies**

**The Cox Proportional Hazard Model**Modeling the risk of dying

Modeling the risk of dying in continuous time

Using the Cox proportional hazards model to quantify the difference in survival between groups

How to fit a Cox proportional hazards model with Stata

Exercise

**Common Pitfalls in Using Regression Models**Association vs. causation

Difference between subjects vs. difference within subjects

Real world models vs. statistical models

Relevance vs. significance

Exercise

**ADVANCED TOPICS AND TECHNIQUESSome Useful Technicalities**Illustrating models by using model based predictions

How to work with predictions in Stata

Residuals and the standard deviation of the error term

Working with residuals and the RMSE in Stata

Linear and nonlinear functions of regression parameters

Transformations of regression parameters

Centering of covariate values

Exercise

**Comparing Regression Coefficients**Comparing regression coefficients among continuous covariates

Comparing regression coefficients among binary covariates

Measuring the impact of changing covariate values

Translating regression coefficients

How to compare regression coefficients in Stata

Exercise

**Power and Sample Size**The power of a regression analysis

Determinants of power in regression models with a single covariate

Determinants of power in regression models with several covariates

Power and sample size calculations when a sample from the covariate distribution is given

Power and sample size calculations given a sample from the covariate distribution with Stata

The choice of the values of the regression parameters in a simulation study

Simulating a covariate distribution

Simulating a covariate distribution with Stata

Choosing the parameters to simulate a covariate distribution

Necessary sample sizes to justify asymptotic methods

Exercise

**The Selection of the Sample**Selection in dependence on the covariates

Selection in dependence on the outcome

Sampling in dependence on covariate values

**The Selection of Covariates**Fitting regression models with correlated covariates

The "Adjustment vs. power" dilemma

The "Adjustment makes effects small" dilemma

Adjusting for mediators

Adjusting for confounding - A useful academic game

Adjusting for correlated confounders

Including predictive covariates

Automatic variable selection

How to choose relevant sets of covariates

Preparing the selection of covariates: Analyzing the association among covariates

Preparing the selection of covariates: Univariate analyses?

Exercise

Preprocessing of the covariate space

How to preprocess the covariate space with Stata

Exercise

What is a confounder?

**Modeling Nonlinear Effects**Quadratic regression

Polynomial regression

Splines

Fractional Polynomials

Gain in power by modeling nonlinear effects?

Demonstrating the effect of a covariate

Demonstrating a nonlinear effect

Describing the shape of a nonlinear effect

Detecting nonlinearity by analysis of residuals

Judging of nonlinearity may require adjustment

How to model nonlinear effects in Stata

The impact of ignoring nonlinearity

Modeling the nonlinear effect of confounders

Nonlinear models

Exercise

**Transformation of Covariates**Transformations to obtain a linear relationship

Transformation of skewed covariates

To categorize or not to categorize

**Effect Modification and Interactions**Modeling effect modification

Adjusted effect modifications

Interactions

Modeling effect modifications in several covariates

The effect of a covariate in the presence of interactions

Interactions as deviations from additivity

Scales and interactions

Ceiling effects and interactions

Hunting for interactions

How to analyze effect modification and interactions with Stata

Exercise

**Applying Regression Models to Clustered Data**Why clustered data can invalidate inference

Robust standard errors

Improving the efficiency

Within and between cluster effects

Some unusual but useful usages of robust standard errors in clustered data

How to take clustering into account in Stata

**Applying Regression Models to Longitudinal Data**Analyzing time trends in the outcome

Analyzing time trends in the effect of covariates

Analyzing the effect of covariates

Analyzing individual variation in time trends

Analyzing summary measures

Analyzing the effect of change

How to perform regression modeling of longitudinal data in Stata

Exercise

**The Impact of Measurement Error**The impact of systematic and random measurement error

The impact of misclassification

The impact of measurement error in confounders

The impact of differential misclassification and measurement error

Studying the measurement error

Exercise

**The Impact of Incomplete Covariate Data**Missing value mechanisms

Properties of a complete case analysis

Bias due to using ad hoc methods

Advanced techniques to handle incomplete covariate data

Handling of partially defined covariates

**RISK SCORES AND PREDICTORSRisk Scores**What is a risk score?

Judging the usefulness of a risk score

The precision of risk score values

The overall precision of a risk score

Using Stata’s predict command to compute risk scores

Categorization of risk scores

Exercise

**Construction of Predictors**From risk scores to predictors

Predictions and prediction intervals for a continuous outcome

Predictions for a binary outcome

Construction of predictions for time to event data

How to construct predictions with Stata

The overall precision of a predictor

**Evaluating the Predictive Performance**The predictive performance of an existing predictor

How to assess the predictive performance of an existing predictor in Stata

Estimating the predictive performance of a new predictor

How to assess the predictive performance via cross validation in Stata

Exercise

**Outlook: Construction of Parsimonious Predictors**

**MISCELLANEOUSAlternatives to Regression Modeling**Stratification

Measures of association: Correlation coefficients

Measures of association: The odds ratio

Propensity scores

Classification and regression trees

**Specific Regression Models**Probit regression for binary outcomes

Generalized linear models

Regression models for count data

Regression models for ordinal outcome data

Quantile regression and robust regression

ANOVA and regression

**Specific Usages of Regression Models**Logistic regression for the analysis of case control studies

Logistic regression for the analysis of matched case control studies

Adjusting for baseline values in randomized clinical trials

Assessing predictive factors

Incorporating time varying covariates in a Cox model

Time dependent effects in a Cox model

Using the Cox model in the presence of competing risks

Using the Cox model to analyze multi state models

**What Is a Good Model?**Does the model fit the data?

How good are predictions?

Explained variation

Goodness of fit

Model stability

The usefulness of a model

**Final Remarks on the Role of Prespecified Models and Model Development**

**MATHEMATICAL DETAILS**

Computing regression parameters in the classical multiple regression model

Estimation of the standard error

Construction of confidence intervals and p-values

**Mathematics behind the Logistic Regression Model **The least squares principle as a maximum likelihood principle

Maximizing the likelihood of a logistic regression model

Estimating the standard error of the ML estimates

Testing composite hypotheses

**The Modern Way of Inference **Robust estimation of standard errors

Robust estimation of standard errors in the presence of clustering

**Mathematics for Risk Scores and Predictors **Computing individual survival probabilities after fitting a Cox model

Standard errors for risk scores

The delta rule

**Bibliography **

**Index**

**Werner Vach** is a professor of medical informatics and clinical epidemiology at the University of Freiburg. Dr. Vach has co-authored more than 150 publications in medical journals. His research encompasses biostatistics methodology in the areas of incomplete covariate data, prognostic studies, diagnostic studies, and agreement studies.

"The book can be a very helpful contribution especially for researchers in medical sciences when performing their statistical analyses and trying to interpret the results obtained. … This book provides plenty of practical knowledge about these basic models and also some of their extensions that is often not easy to find from statistical textbooks or from software manuals. The basic methods are well explained and illustrated by numerous practical examples, mainly using simulated datasets."

—Tapio Nummi, *International Statistical Review* (2013), 81