1st Edition

Data Management Using Stata A Practical Handbook

By Michael N. Mitchell Copyright 2010
    387 Pages
    by Stata Press

    Using simple language and illustrative examples, this book comprehensively covers data management tasks that bridge the gap between raw data and statistical analysis. Rather than focus on clusters of commands, the author takes a modular approach that enables readers to quickly identify and implement the necessary task without having to access background information first. Each section in the chapters presents a self-contained lesson that illustrates a particular data management task via examples, such as creating data variables and automating error checking. The text also discusses common pitfalls and how to avoid them and provides strategic data management advice. Ideal for both beginning statisticians and experienced users, this handy book helps readers solve problems and learn comprehensive data management skills.

    Introduction
    Using this book
    Overview of this book
    Listing observations in this book

    Reading and Writing Datasets
    Introduction
    Reading Stata datasets
    Saving Stata datasets
    Reading comma-separated and tab-separated files
    Reading space-separated files
    Reading fixed-column files
    Reading fixed-column files with multiple lines of raw data per observation
    Reading SAS XPORT files
    Common errors reading files
    Entering data directly into the Stata Data Editor
    Saving comma-separated and tab-separated files
    Saving space-separated files
    Saving SAS XPORT files

    Data Cleaning
    Introduction
    Double data entry
    Checking individual variables
    Checking categorical by categorical variables
    Checking categorical by continuous variables
    Checking continuous by continuous variables
    Correcting errors in data
    Identifying duplicates
    Final thoughts on data cleaning

    Labeling Datasets
    Introduction
    Describing datasets
    Labeling variables
    Labeling values
    Labeling utilities
    Labeling variables and values in different languages
    Adding comments to your dataset using notes
    Formatting the display of variables
    Changing the order of variables in a dataset

    Creating Variables
    Introduction
    Creating and changing variables
    Numeric expressions and functions
    String expressions and functions
    Recoding
    Coding missing values
    Dummy variables
    Date variables
    Date-and-time variables
    Computations across variables
    Computations across observations
    More examples using the egen command
    Converting string variables to numeric variables
    Converting numeric variables to string variables
    Renaming and ordering variables

    Combining Datasets
    Introduction
    Appending: Appending datasets
    Appending: Problems
    Merging: One-to-one match-merging
    Merging: One-to-many match-merging
    Merging: Merging multiple datasets
    Merging: Update merges
    Merging: Additional options when merging datasets
    Merging: Problems merging datasets
    Joining datasets
    Crossing datasets

    Processing Observations across Subgroups
    Introduction
    Obtaining separate results for subgroups
    Computing values separately by subgroups
    Computing values within subgroups: Subscripting observations
    Computing values within subgroups: Computations across observations
    Computing values within subgroups: Running sums
    Computing values within subgroups: More examples
    Comparing the by and tsset commands

    Changing the Shape of Your Data
    Introduction
    Wide and long datasets
    Introduction to reshaping long to wide
    Reshaping long to wide: Problems
    Introduction to reshaping wide to long
    Reshaping wide to long: Problems
    Multilevel datasets
    Collapsing datasets

    Programming for Data Management
    Introduction
    Tips on long-term goals in data management
    Executing do-files and making log files
    Automating data checking
    Combining do-files
    Introducing Stata macros
    Manipulating Stata macros
    Repeating commands by looping over variables
    Repeating commands by looping over numbers
    Repeating commands by looping over anything
    Accessing results saved from Stata commands
    Saving results of estimation commands as data
    Writing Stata programs

    Additional Resources
    Online resources for this book
    Finding and installing additional programs
    More online resources

    Appendix: Common elements

    Index

    Biography

    Michael N. Mitchell is a senior statistician in health services research. For 12 years, he worked in the Statistical Consulting Group of the UCLA Academic Technology Services.

    The author uses a "learning by example" approach in the book. Overall this works well …
    —Morteza Marzjarani, The American Statistician, November 2011