Analyzing Baseball Data with R

1st Edition

Max Marchi, Jim Albert

Chapman and Hall/CRC
Published October 29, 2013
Reference - 334 Pages - 50 B/W Illustrations
ISBN 9781466570221 - CAT# K16473
Series: Chapman & Hall/CRC The R Series


Add to Wish List
FREE Standard Shipping!


  • Presents an accessible introduction to R using baseball applications
  • Provides detailed instructions for obtaining and managing publicly available baseball datasets
  • Dedicates two chapters to R’s powerful graphical capabilities
  • Gives step-by-step guidance on how to convert baseball events to runs value (and ultimately wins) for newcomers to the subject
  • Uses a fielding evaluation example to show how R packages can extend basic R capabilities, such as summarizing data quickly, matching by string similarity, improving graphical display, and reading different file formats
  • Includes further reading and exercises at the end of each chapter
  • Offers R code and datasets on a supporting website

Visit the authors’ blog for advice on using R in sabermetrics research and for keeping up to date with new developments.


With its flexible capabilities and open-source platform, R has become a major tool for analyzing detailed, high-quality baseball data. Analyzing Baseball Data with R provides an introduction to R for sabermetricians, baseball enthusiasts, and students interested in exploring the rich sources of baseball data. It equips readers with the necessary skills and software tools to perform all of the analysis steps, from gathering the datasets and entering them in a convenient format to visualizing the data via graphs to performing a statistical analysis.

The authors first present an overview of publicly available baseball datasets and a gentle introduction to the type of data structures and exploratory and data management capabilities of R. They also cover the traditional graphics functions in the base package and introduce more sophisticated graphical displays available through the lattice and ggplot2 packages. Much of the book illustrates the use of R through popular sabermetrics topics, including the Pythagorean formula, runs expectancy, career trajectories, simulation of games and seasons, patterns of streaky behavior of players, and fielding measures. Each chapter contains exercises that encourage readers to perform their own analyses using R. All of the datasets and R code used in the text are available online.

This book helps readers answer questions about baseball teams, players, and strategy using large, publically available datasets. It offers detailed instructions on downloading the datasets and putting them into formats that simplify data exploration and analysis. Through the book’s various examples, readers will learn about modern sabermetrics and be able to conduct their own baseball analyses.

Share this Title