Guide to the De-Identification of Personal Health Information

Guide to the De-Identification of Personal Health Information

Published:
Content:
Author(s):
Free Standard Shipping

Purchasing Options

Hardback
$69.95
ISBN 9781466579064
Cat# K16841
Add to cart
eBook (VitalSource)
$69.95 $48.97
ISBN 9781466579088
Cat# KE21882
Add to cart
SAVE 30%
eBook Rentals
Other eBook Options:
 
 

Features

  • Presents a risk-based methodology for the de-identification of health information
  • Provides a detailed case for why de-identification is necessary and when it is advised to apply it when using and disclosing personal health information
  • Situates and contextualizes our risk-based methodology, and gives a general overview of its steps
  • Explains in some detail how to measure re-identification risk and how to go about applying transformations to data to reduce the risk of re-identification
  • Focuses on transformations that we know have worked on health information, rather than covering all possible approaches that have been published or proposed

Summary

Offering compelling practical and legal reasons why de-identification should be one of the main approaches to protecting patients’ privacy, the Guide to the De-Identification of Personal Health Information outlines a proven, risk-based methodology for the de-identification of sensitive health information. It situates and contextualizes this risk-based methodology and provides a general overview of its steps.

The book supplies a detailed case for why de-identification is important as well as best practices to help you pin point when it is necessary to apply de-identification in the disclosure of personal health information. It also:

  • Outlines practical methods for de-identification
  • Describes how to measure re-identification risk
  • Explains how to reduce the risk of re-identification
  • Includes proofs and supporting reference material
  • Focuses only on transformations proven to work on health information—rather than covering all possible approaches, whether they work in practice or not

Rated the top systems and software engineering scholar worldwide by The Journal of Systems and Software, Dr. El Emam is one of only a handful of individuals worldwide qualified to de-identify personal health information for secondary use under the HIPAA Privacy Rule Statistical Standard. In this book Dr. El Emam explains how we can make health data more accessible—while protecting patients’ privacy and complying with current regulations.

Table of Contents

Introduction
Primary and Secondary Purposes
The Spectrum of Risk for Data Access
Managing Risk
What Is De-identification?
Learning Something New
The Status Quo
Safe Harbor-Compliant Data Can Have a High Risk of Re-identification
     The Adversary Knows Who Is in the Data
     The Data Set Is Not a Random Sample from the U.S. Population
     Other Fields Can Be Used for Re-identification
Moving Forward beyond Safe Harbor
Why We Wrote This Book
References

THE CASE FOR DE-IDENTIFYING PERSONAL HEALTH INFORMATION

Permitted Disclosures, Consent, and De-identification of PHI
Common Data Flows
The Need for De-identification

Permitted Uses and Disclosures of Health Information
Uses of Health Information by an Agent
Disclosing Identifiable Data When Permitted
References

The Impact of Consent
Differences between Consenters and Non-Consenters in Clinical Trials
The Impact of Consent on Observational Studies
Impact on Recruitment
Impact on Bias
Impact on Cost
Impact on Time
References

Data Breach Notifications
Benefits and Costs of Breach Notification
Cost of Data Breach Notifications to Custodian
Data Breach Trends
The Value of Health Data
     Financial Information in the Health Records
     Financial Value of Health Records
     Medical Identity Theft
Monetizing Health Records through Extortion
References

Peeping and Snooping
Examples of Peeping
Information and Privacy Commissioners Orders
     Ontario
          HO-002
          HO-010
          HR06-53
          HI-050013-1
     Alberta
          Investigation Report H2011-IR-004
          IPC Investigation (Report Not Available)
     Saskatchewan
          H-2010-001
References

Unplanned but Legitimate Uses and Disclosures
Unplanned Uses by Governments
Data Sharing for Research Purposes
Open Government
Open Data for Research
Unplanned Uses and Disclosures by Commercial Players
Competitions
References

Public Perception and Privacy Protective Behaviors
References

Alternative Methods for Data Access
Remote Access
On-Site Access
Remote Execution
Remote Queries
Secure Computation
Summary
References

UNDERSTANDING DISCLOSURE RISKS

Scope, Terminology, and Definitions
Perspective on De-identification
Original Data and DFs
Unit of Analysis
Types of Data
     Relational Data
     Transactional Data
     Sequential Data
     Trajectory Data
     Graph Data
The Notion of an Adversary
Types of Variables
     Directly Identifying Variables
     Indirectly Identifying Variables (Quasi-identifiers)
     Sensitive Variables
     Other Variables
Equivalence Classes
Aggregate Tables
References

Frequently Asked Questions about De-identification
Can We Have Zero Risk?
Will All DFs Be Re-identified in the Future?
Is a Data Set Identifiable If a Person Can Find His or Her Record?
Can De-identified Data Be Linked to Other Data Sets?
Doesn’t Differential Privacy Already Provide the Answer?

A Methodology for Managing Re-identification Risk
Re-identification Risk versus Re-identification Probability
Re-identification Risk for Public Files
Managing Re-identification Risk
References

Definitions of Identifiability
Definitions
Common Framework for Assessing Identifiability
References

Data Masking Methods
Suppression
Randomization
Irreversible Coding
Reversible Coding
Reversible Coding, HIPAA, and the Common Rule
Other Techniques That Do Not Work Well
     Constraining Names
     Adding Noise
     Character Scrambling
     Character Masking
     Truncation
     Encoding
Summary
References

Theoretical Re-identification Attacks
Background Knowledge of the Adversary
Re-identification Attacks
     Example of a Linking Attack on Relational Data
     Example of a Linking Attack on Transaction Data
     Example of a Linking Attack on Sequential Data
     Example of a Linking Attack on Trajectory Data
     Example of a Linking Attack Based on Semantic Information
References

MEASURING RE-IDENTIFICATION RISK

Measuring the Probability of Re-identification
Simple and Derived Metrics
Simple Risk Metrics: Prosecutor and Journalist Risk
Measuring Prosecutor Risk
Measuring Journalist Risk
Applying the Derived Metrics and Decision Rules
     Relationship among Metrics
References

Measures of Uniqueness
Uniqueness under Prosecutor Risk
Uniqueness under Journalist Risk
Summary
References

Modeling the Threat
Characterizing the Adversaries
Attempting a Re-identification Attack
Plausible Adversaries
An Internal Adversary
An External Adversary
What Are the Quasi-identifiers?
Sources of Data
Correlated and Inferred Variables
References

Choosing Metric Thresholds
Choosing the α Threshold
Choosing the τ and λ Thresholds
Choosing the Threshold for Marketer Risk
Choosing among Thresholds
Thresholds and Incorrect Re-identification
References

PRACTICAL METHODS FOR DE-IDENTIFICATION

De-identification Methods
Generalization
     Principles
     Optimal Lattice Anonymization (OLA)
Tagging
Records to Suppress
Suppression Methods
     Overview
     Fast Local Cell Suppression
Available Tools
Case Study: De-identification of the BORN Registry
     General Parameters
     Attack T1
     Attack T2
     Attack T3
     Summary of Risk Assessment and De-identification
References

Practical Tips
Disclosed Files Should Be Samples
Disclosing Multiple Samples
Creating Cohorts
     Cohort Defined on Quasi-identifiers Only
     Cohort Defined on a Non-Quasi-identifier
     Cohort Defined on Non-Quasi-identifiers and Quasi-identifiers
Impact of Data Quality
Publicizing Re-identification Risk Assessment
Adversary Power
Levels of Adversary Background Knowledge
De-identification in the Context of a Data Warehouse
References

END MATTER

An Analysis of Historical Breach Notification Trends
Methods
     Definitions
     Breach Lists
Original Data Sources
     Sponsors of Lists
     Data Quality
Estimating the Number of Disclosed Breaches
Data Collection
     Interrater Agreement
Results
Discussion
     Summary of Main Results
     Post Hoc Analysis
References

Methods of Attack for Maximum Journalist Risk
Method of Attack 1
Method of Attack 2
Method of Attack 3

How Many Friends Do We Have?
References

Cell Size Precedents
References

The Invasion of Privacy Construct
6B Dimensions
Sensitivity of the Data
Potential Injury to Consumers
Appropriateness of Consent

General Information on Mitigating Controls
Introduction
Origins of the MCI
Subject of Assessment: Data Requestor versus Data Recipient
Applicability of the MCI
Structure of the MCI
     Scoring
     Which Practices to Rate
Third-Party versus Self-Assessment
Scoring the MCI
Interpreting to the MCI Questions
General Justifications for Time Intervals
Practical Requirements
Remediation
Controlling Access, Disclosure, Retention, and Disposition of Personal Data
Safeguarding Personal Data
Ensuring Accountability and Transparency in the Management of Personal Data

Assessing Motives and Capacity
Dimensions
     Motives to Re-identify the Data
     Capacity to Re-identify the Data

Invasion of Privacy
Sensitivity of the Data
Potential Injury to Patients
Appropriateness of Consent

Index

Author Bio(s)

Dr. El Emam holds the Canada Research Chair in Electronic Health Information at the University of Ottawa and is an Associate Professor in the Faculty of Medicine at the university. In 2003 and 2004, he was ranked as the top systems and software engineering scholar worldwide by The Journal of Systems and Software based on his research on measurement and quality evaluation and improvement. He is a senior scientist at the Children’s Hospital of Eastern Ontario Research Institute and leads the multi-disciplinary Electronic Health Information Laboratory (EHIL) team.

Dr. El Emam is one of only a handful of individuals worldwide known to be qualified to de-identify personal health information for secondary use under the HIPAA Privacy Rule Statistical Standard. Khaled is also a world-renowned expert in health information privacy and the head of the Electronic Health Information Laboratory www.ehealthinformation.ca which conducts cutting edge research in this area. He has been de-identifying data since 2004, and has a large following and speaks extensively on this topic.

He has edited 2 books and written one already, as well as contributed chapters to a number of others.

Editorial Reviews

By arguing persuasively for the use of de-identification as a privacy-enhancing tool, and setting out a practical methodology for the use of de-identification techniques and re-identification risk measurement tools, this book provides a valuable and much needed resource for all data custodians who use or disclose personal health information for secondary purposes. Doubly enabling, privacy-enhancing tools like these, that embrace privacy by design, will ensure the continued availability of personal health information for valuable secondary purposes that benefit us all.
Dr. Ann Cavoukian, Information and Privacy Commissioner, Ontario, Canada

 
Textbooks
Other CRC Press Sites
Featured Authors
STAY CONNECTED
Facebook Page for CRC Press Twitter Page for CRC Press You Tube Channel for CRC Press LinkedIn Page for CRC Press Google Plus Page for CRC Press Pinterest Page for CRC Press
Sign Up for Email Alerts
© 2014 Taylor & Francis Group, LLC. All Rights Reserved. Privacy Policy | Cookie Use | Shipping Policy | Contact Us