Gaining access to high-quality data is a vital necessity in knowledge-based decision making. But data in its raw form often contains sensitive information about individuals. Providing solutions to this problem, the methods and tools of privacy-preserving data publishing enable the publication of useful information while protecting data privacy. Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques presents state-of-the-art information sharing and data integration methods that take into account privacy and data mining requirements.
The first part of the book discusses the fundamentals of the field. In the second part, the authors present anonymization methods for preserving information utility for specific data mining tasks. The third part examines the privacy issues, privacy models, and anonymization methods for realistic and challenging data publishing scenarios. While the first three parts focus on anonymizing relational data, the last part studies the privacy threats, privacy models, and anonymization methods for complex data, including transaction, trajectory, social network, and textual data.
This book not only explores privacy and information utility issues but also efficiency and scalability challenges. In many chapters, the authors highlight efficient and scalable methods and provide an analytical discussion to compare the strengths and weaknesses of different solutions.
Data Collection and Data Publishing
What Is Privacy-Preserving Data Publishing?
Related Research Areas
Attack Models and Privacy Models
Record Linkage Model
Attribute Linkage Model
Table Linkage Model
Modeling Adversary’s Background Knowledge
Generalization and Suppression
Anatomization and Permutation
General Purpose Metrics
Special Purpose Metrics
Algorithms for the Record Linkage Model
Algorithms for the Attribute Linkage Model
Algorithms for the Table Linkage Model
Algorithms for the Probabilistic Attack
Attacks on Anonymous Data
ANONYMIZATION FOR DATA MINING
Anonymization for Classification Analysis
Anonymization Problems for Red Cross BTS
High-Dimensional Top-Down Specialization (HDTDS)
Summary and Lesson Learned
Anonymization for Cluster Analysis
Anonymization Framework for Cluster Analysis
Dimensionality Reduction-Based Transformation
EXTENDED DATA PUBLISHING SCENARIOS
Multiple Views Publishing
Checking Violations of k-Anonymity on Multiple Views
Checking Violations with Marginals
Anonymizing Sequential Releases with New Attributes
Monotonicity of Privacy
Anonymization Algorithm for Sequential Releases
Anonymizing Incrementally Updated Data Records
Continuous Data Publishing
Dynamic Data Republishing
Collaborative Anonymization for Vertically Partitioned Data
Privacy-Preserving Data Mashup
Summary and Lesson Learned
Collaborative Anonymization for Horizontally Partitioned Data
Overview of the Solution
ANONYMIZING COMPLEX DATA
Anonymizing Transaction Data
Band Matrix Method
Anonymizing Query Logs
Anonymizing Trajectory Data
Other Spatio-Temporal Anonymization Methods
Anonymizing Social Networks
General Privacy-Preserving Strategies
Anonymization Methods for Social Networks
Sanitizing Textual Data
Health Information DE-identification (HIDE)
Other Privacy-Preserving Techniques and Future Trends
Interactive Query Model
Privacy Threats Caused by Data Mining Results
Privacy-Preserving Distributed Data Mining
Benjamin C. M. Fung is an assistant professor in the Concordia Institute for Information Systems Engineering at Concordia University in Montreal, Quebec. Dr. Fung is also a research scientist and the treasurer of the National Cyber-Forensics and Training Alliance Canada (NCFTA Canada).
Ke Wang is a professor in the School of Computing Science at Simon Fraser University in Burnaby, British Columbia.
Ada Wai-Chee Fu is an associate professor in the Department of Computer Science and Engineering at the Chinese University of Hong Kong.
Philip S. Yu is a professor in the Department of Computer Science and the Wexler Chair in Information and Technology at the University of Illinois at Chicago.