1st Edition

Methods in Medical Informatics Fundamentals of Healthcare Programming in Perl, Python, and Ruby

By Jules J. Berman Copyright 2010
    413 Pages
    by Chapman & Hall

    416 Pages
    by Chapman & Hall

    Too often, healthcare workers are led to believe that medical informatics is a complex field that can only be mastered by teams of professional programmers. This is simply not the case. With just a few dozen simple algorithms, easily implemented with open source programming languages, you can fully utilize the medical information contained in clinical and research datasets. The common computational tasks of medical informatics are accessible to anyone willing to learn the basics.

    Methods in Medical Informatics: Fundamentals of Healthcare Programming in Perl, Python, and Ruby demonstrates that biomedical professionals with fundamental programming knowledge can master any kind of data collection. Providing you with access to data, nomenclatures, and programming scripts and languages that are all free and publicly available, this book —

    • Describes the structure of data sources used, with instructions for downloading
    • Includes a clearly written explanation of each algorithm
    • Offers equivalent scripts in Perl, Python, and Ruby, for each algorithm
    • Shows how to write short, quickly learned scripts, using a minimal selection of commands
    • Teaches basic informatics methods for retrieving, organizing, merging, and analyzing data sources
    • Provides case studies that detail the kinds of questions that biomedical scientists can ask and answer with public data and an open source programming language

    Requiring no more than a working knowledge of Perl, Python, or Ruby, Methods in Medical Informatics will have you writing powerful programs in just a few minutes. Within its chapters, you will find descriptions of the basic methods and implementations needed to complete many of the projects you will encounter in your biomedical career.

    PART I FUNDAMENTAL ALGORITHMS AND METHODS OF MEDICAL INFORMATICS
    Chapter 1
    Parsing and Transforming Text Files
    Peeking into Large Files
    Paging through Large Text Files
    Extracting Lines that Match a Regular Expression
    Changing Every File in a Subdirectory
    Counting the Words in a File
    Making a Word List with Occurrence Tally
    Using Printf Formatting Style
    Chapter 2 Utility Scripts
    Random Numbers
    Converting Non-ASCII to Base64 ASCII
    Creating a Universally Unique Identifier
    Splitting Text into Sentences
    One-Way Hash on a Name
    One-Way Hash on a File
    A Prime Number Generator
    Chapter 3 Viewing and Modifying Images
    Viewing a JPEG Image
    Converting between Image Formats
    Batch Conversions
    Drawing a Graph from List Data
    Drawing an Image Mashup
    Chapter 4 Indexing Text
    ZIPF Distribution of a Text File
    Preparing a Concordance
    Extracting Phrases
    Preparing an Index
    Comparing Texts Using Similarity Scores

    PART II MEDICAL DATA RESOURCES
    Chapter 5 The National Library of Medicine’s Medical Subject Headings (MeSH )
    Determining the Hierarchical Lineage for MeSH Terms
    Creating a MeSH Database
    Reading the MeSH Database
    Creating an SQLite Database for MeSH
    Reading the SQLite MeSH Database
    Chapter 6 The International Classification of Diseases
    Creating the ICD Dictionary
    Building the ICD-O (Oncology) Dictionary
    Chapter 7 SEER: The Cancer Surveillance, Epidemiology, and End Results Program
    Parsing the SEER Data Files
    Finding the Occurrences of All Cancers in the SEER Data Files
    Finding the Age Distributions of the Cancers in the SEER Data Files
    Chapter 8 OMIM: The Online Mendelian Inheritance in Man
    Collecting the OMIM Entry Terms
    Finding Inherited Cancer Conditions
    Chapter 9 PubMed
    Building a Large Text Corpus of Biomedical Information
    Creating a List of Doublets from a PubMed Corpus
    Downloading Gene Synonyms from PubMed
    Downloading Protein Synonyms from PubMed
    Chapter 10 Taxonomy
    Finding a Taxonomic Hierarchy
    Finding the Restricted Classes of Human Infectious Pathogens
    Chapter 11 Developmental Lineage Classification and Taxonomyof Neoplasms
    Building the Doublet Hash
    Scanning the Literature for Candidate Terms
    Adding Terms to the Neoplasm Classification
    Determining the Lineage of Every Neoplasm Concept
    Chapter 12 U.S. Census Files
    Total Population of the United States
    Stratified Distribution for the U.S. Census
    Adjusting for Age
    Chapter 13 Centers for Disease Control and Prevention Mortality Files
    Death Certificate Data
    Obtaining the CDC Data Files
    How Death Certificates Are Represented in Data Records
    Ranking, by Number of Occurrences, Every Condition in the CDC
    Mortality Files

    PART III PRIMARY TASKS OF MEDICAL INFORMATICS
    Chapter 14 Autocoding

    A Neoplasm Autocoder
    Recoding
    Chapter 15 Text Scrubber for Deidentifyin g Confidential Text
    Chapter 16 Web Pages and CGI Scripts

    Grabbing Web Pages
    CGI Script for Searching the Neoplasm Classification
    Chapter 17 Image Annotation
    Inserting a Header Comment
    Extracting the Header Comment in a JPEG Image File
    Inserting IPTC Annotations
    Extracting Comment, EXIF, and IPTC Annotations
    Dealing with DICOM
    Finding DICOM Images
    DICOM-to-JPEG Conversion
    Chapter 18 Describing Data with Data, Using XML
    Parsing XML
    Resource Description Framework (RDF)
    Dublin Core Metadata
    Insert an RDF Document into an Image File
    Insert an Image File into an RDF Document
    RDF Schema
    Visualizing an RDF Schema with GraphViz
    Obtaining GraphViz
    Converting a Data Structure to GraphViz

    PART IV MEDICAL DISCOVERY
    Chapter 19 Case Study: Emphysema Rates
    Chapter 20 Case Study: Cancer Occurrence Rates
    Chapter 21 Case Study: Germ Cell Tumor Rates across Ethnicities
    Chapter 22 Case Study: Ranking the Death-Certifying Process, by State
    Chapter 23 Case Study: Data Mashups for Epidemics
    Tally of Coccidioidomycosis Cases by State
    Creating the Map Mashup
    Chapter 24 Case Study: Sickle Cell Rates
    Chapter 25 Case Study: Site-Specific Tumor Biology

    Anatomic Origins of Mesotheliomas
    Mesothelioma Records in the SEER Data Sets
    Graphic Representation
    Chapter 26 Case Study: Bimodal Tumors
    Chapter 27 Case Study: The Age of Occurrence of Precancers
    Epilogue for Healthcare Professionals and Medical Scientists
    Learn One or More Open Source Programming Languages
    Don’t Agonize Over Which Language You Should Choose
    Learn Algorithms
    Unless You Are a Professional Programmer, Relax and Enjoy Being a Newbie
    Do Not Delegate Simple Programming Tasks to Others
    Break Complex Tasks into Simple Methods and Algorithms
    Write Fast Scripts
    Concentrate on the Questions, Not the Answers

    Appendix
    How to Acquire Ruby
    How to Acquire Perl
    How to Acquire Python
    How to Acquire RMagick
    How to Acquire SQLite
    How to Acquire the Public Data Files Used in This Book
    Other Publicly Available Files, Data Sets, and Utilities

    Biography

    Jules Berman, Ph.D., M.D., received two bachelor of science degrees (mathematics and earth sciences) from MIT, a Ph.D. in pathology from Temple University, and an M.D. from the University of Miami School of Medicine. His postdoctoral research was conducted at the National Cancer Institute. His medical residence in pathology was completed at the George Washington University School of Medicine. He became board certified in anatomic pathology and in cytopathology, and served as the chief of Anatomic Pathology, Surgical Pathology and Cytopathology at the Veterans Administration (VA) Medical Center in Baltimore, Maryland.

    While at the Baltimore VA, Dr. Berman held appointments at the University of Maryland Medical Center and at theJohns Hopkins Medical Institutions. In 1998, he became the program director for pathology informatics in the Cancer Diagnosis Program at the U.S. National Cancer Institute. In 2006, he became president of the Association for Pathology Informatics. Over the course of his career, he has written, as first author, more than 100 publications, including five books in the field of medical informatics. Today, Dr. Berman is a full-time freelance writer.

    As subspecialty board certification in clinical informatics has finally become a reality, Jules Berman’s Methods in Medical Informatics could not be more timely. This well-written and informative text combines Dr. Berman’s expertise in programming with his vast knowledge of publicly available data sets and everyday healthcare programming needs to result in a book which … should become a staple in health informatics education programs as well as a standard addition to the personal libraries of informaticists.
    —Alexis B. Carter, Journal of Pathology Informatics, October 2011

    This book provides an introduction to processing clinical and population health data using rigorous methods and widely available, low cost, but very capable tools. The inclusion of the three leading dynamic programming languages broadens the appeal … bridges the gap from programming instruction to dealing with specialized medical data, making it possible to teach a relevant programming course in a biomedical environment. I would have loved to have a copy of this when I was teaching introductory programming for medical informatics.
    —Professor James H. Harrison, Jr., Director of Clinical Informatics, University of Virginia

    … presents students and professionals in the healthcare field (who have some working knowledge of the open-source programming languages Perl, Python, or Ruby) with instruction for applying basic informatics algorithms to medical data sets. He [the author] provides algorithm scripts for each of the languages, along with step-by-step explanations of the algorithms used for retrieving, organizing, merging, and analyzing such data sources as the National Cancer Institute’s Surveillance Epidemiology and End Results project, the National Library of Medicine’s PubMed service, the mortality records of the US Centers for Disease Control and Prevention, the US Census, and the Online Mendelian Inheritance in Man data set on inherited conditions.
    SciTech Book News, February 2011