Praise for previous editions:
"Gandrud has written a great outline of how a fully reproducible research project should look from start to finish, with brief explanations of each tool that he uses along the way… Advanced undergraduate students in mathematics, statistics, and similar fields as well as students just beginning their graduate studies would benefit the most from reading this book. Many more experienced R users or second-year graduate students might find themselves thinking, ‘I wish I’d read this book at the start of my studies, when I was first learning R!’…This book could be used as the main text for a class on reproducible research …" (The American Statistician)
Reproducible Research with R and R Studio, Third Edition brings together the skills and tools needed for doing and presenting computational research. Using straightforward examples, the book takes you through an entire reproducible research workflow. This practical workflow enables you to gather and analyze data as well as dynamically present results in print and on the web. Supplementary materials and example are available on the author’s website.
New to the Third Edition
- Updated package recommendations, examples, URLs, and removed technologies no longer in regular use.
- More advanced R Markdown (and less LaTeX) in discussions of markup languages and examples.
- Stronger focus on reproducible working directory tools.
- Updated discussion of cloud storage services and persistent reproducible material citation.
- Added discussion of Jupyter notebooks and reproducible practices in industry.
- Examples of data manipulation with Tidyverse tibbles (in addition to standard data frames) and pivot_longer() and pivot_wider() functions for pivoting data.
Features
- Incorporates the most important advances that have been developed since the editions were published
- Describes a complete reproducible research workflow, from data gathering to the presentation of results
- Shows how to automatically generate tables and figures using R
- Includes instructions on formatting a presentation document via markup languages
- Discusses cloud storage and versioning services, particularly Github
- Explains how to use Unix-like shell programs for working with large research projects
I Getting Started
1 Introducing Reproducible Research
What Is Reproducible Research?
Why Should Research Be Reproducible?
For science
For you
Who Should Read This Book?
Academic researchers
Students
Instructors
Editors
Private sector researchers
The Tools of Reproducible Research
Why Use R, knitr/R Markdown, and RStudio for Reproducible Research?
Installing the main software
Installing markup languages
GNU Make
Other Tools
Book Overview
How to read this book
Reproduce this book
Contents overview
2 Getting Started with Reproducible Research
The Big Picture: A Workflow for Reproducible Research
Reproducible theory
Practical Tips for Reproducible Research
Document everything!
Everything is a (text) file
All files should be human readable
Explicitly tie your files together
Have a plan to organize, store, and make your files available
3 Getting Started with R, RStudio, and knitr/R Markdown
Using R: The Basics
Objects
Functions
The workspace & history
R history
Global R options
Installing new packages and loading functions
Using RStudio
Using knitr and R Markdown: The basics
What knitr does
What rmarkdown does
File extensions
Code chunks
Global chunk options
knitr package options
Hooks
knitr, R Markdown, & RStudio
knitr & R
R Markdown and R
4 Getting Started with File Management
File Paths & Naming Conventions
Root directories
Sub-directories & parent directories
Working directories
Absolute vs relative paths
Spaces in directory & file names
Organizing Your Research Project
Organizing Research with RStudio Projects
R File Manipulation Functions
Unix-like Shell Commands for File Management
File Navigation in RStudio
II Data Gathering and Storage
5 Storing, Collaborating, Accessing Files, and Versioning
Saving Data in Reproducible Formats
Storing Your Files in the Cloud: Dropbox
Storage
Accessing data
Contents v
Collaboration
Version control
Storing Your Files in the Cloud: GitHub
Setting up GitHub: Basic
Version control with Git
Remote storage on GitHub
Accessing on GitHub
Summing up the GitHub workflow
RStudio & GitHub
Setting up Git/GitHub with Projects
Using Git in RStudio Projects
6 Gathering Data with R
Organize Your Data Gathering: Makefiles
R Make-like files
GNU Make
Importing Locally Stored Data Sets
Importing Data Sets from the Internet
Data from non-secure (http) URLs
Data from secure (https) URLs
Compressed data stored online
Data APIs & feeds
Advanced Automatic Data Gathering: Web Scraping
7 Preparing Data for Analysis
Cleaning Data for Merging
Get a handle on your data
Reshaping data
Renaming variables
Ordering data
Subsetting data
Recoding string/numeric variables
Creating new variables from old
Changing variable types
Merging Data Sets
Binding
Merging data frames
Duplicate columns
8 Statistical Modeling and knitr/R Markdown
Incorporating Analyses into the Markup
Full code chunks
Showing code & results inline
Dynamically including non-R code in code chunks
vi Contents
Dynamically Including Modular Analysis Files
Source from a local file
Source from a URL
Reproducibly Random: setseed()
Computationally Intensive Analyses
9 Showing Results with Tables
Basic knitr Syntax for Tables
Table Basics
Tables in LaTeX
Tables in Markdown/HTML
Creating Tables from Supported Class R Objects
kable for Markdown and LaTeX
xtable for LaTeX and HTML
Fitting Large Tables in LaTeX
xtable with non-supported class objects
Creating variable description documents with xtable
10 Showing Results with Figures
Including Non-knitted Graphics
Including graphics in LaTeX
Including graphics in Markdown/HTML
Non-knitted graphics with knitr/rmarkdown
Basic knitr/rmarkdown Figure Options
Chunk options
Global options
Knitting R’s Default Graphics
Including ggplot Graphics
Showing regression results with caterpillar plots
JavaScript Graphs with googleVis
Basic googleVis figures
Including googleVis in knitted documents
JavaScript Graphs with htmlwidgets-based packages
11 Presenting with LaTeX
The Basics
Getting started with LaTeX editors
Basic LaTeX command syntax
The LaTeX preamble & body
Headings
Paragraphs & spacing
Horizontal lines
Text formatting
Math
Lists
Footnotes
Cross-references
Bibliographies with BibTeX
The bib file
Including citations in LaTeX documents
Generating a BibTeX file of R package citations
Presentations with LaTeX Beamer
Beamer basics
knitr with LaTeX slideshows
12 Presenting in a Variety of Formats with R Markdown
The Basics
Getting started with Markdown editors
Preamble and document structure
Headings
Horizontal lines
Paragraphs and new lines
Italics and bold
Links
Lists
Math with MathJax
Further Customizability with rmarkdown
CSS style files and Markdown
Slideshows with Markdown, R Markdown, and HTML
HTML Slideshows with rmarkdown
LaTeX Beamer Slideshows with rmarkdown
Slideshows with Markdown and RStudio’s R Presentations
Publishing HTML Documents Created with R Markdown
Further information on R Markdown
13 Conclusion
Citing Reproducible Research
Licensing Your Reproducible Research
Sharing Your Code in Packages
Project Development: Public or Private?
Is it Possible to Completely Future-Proof Your Research?
Biography
Christopher Gandrud is Head of Economics and Experimentation at Zalando SE where he leads teams of social data scientists and software engineers building large scale automated decision-making systems. He was previously a research fellow at the Institute for Quantitative Social Science, Harvard University developing statistical software for the social and physical sciences. He has published many articles in peer-reviewed journals, including the Journal of Common Market Studies, Review of International Political Economy, Political Science Research and Methods, Journal of Statistical Software, and International Political Science Review. He earned a PhD in quantitative political science from the London School of Economics.
I recommend this book for students studying statistical sciences, individuals beginning their research career, and advanced researchers looking to up their reproducibility game. I am thrilled to have this resource for my own lab and indent on having my students follow the recommendations within closely.
- Lucy D’Agostino McGowan, Biometrics, 2020, Volume 76, Issue 4
In summary, I found this book to be a very good introduction to R and reproducible research, one that I can certainly recommend.
- Anikó Lovik, International Society for Clinical Biostatistics, June 2021 Number 71