Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1, Raymond Chi-Wing Wong 2, Lei Chen 2, Jiuyong Li 3 The Chinese.

Slides:



Advertisements
Similar presentations
Anonymity for Continuous Data Publishing
Advertisements

Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Minimality Attack in Privacy Preserving Data Publishing Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Ada Wai-Chee Fu (the Chinese University.
Hani AbuSharkh Benjamin C. M. Fung fung (at) ciise.concordia.ca
M-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets by Tyrone Cadenhead.
1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.
M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir.
Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada
Personalized Privacy Preservation Xiaokui Xiao, Yufei Tao City University of Hong Kong.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.
Project topics – Private data management Nov
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Probabilistic Inference Protection on Anonymized Data
On Efficient Spatial Matching Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Yufei Tao (the Chinese University of Hong Kong) Ada Wai-Chee.
1 Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor Raymond Chi-Wing Wong (Hong Kong University of Science and Technology) M. Tamer.
Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,
1 Global Privacy Guarantee in Serial Data Publishing Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jia Liu 2, Ke Wang 3, Yabo Xu 4 The Hong Kong University.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Suppose I learn that Garth has 3 friends. Then I know he must be one of {v 1,v 2,v 3 } in Figure 1 above. If I also learn the degrees of his neighbors,
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Structure based Data De-anonymization of Social Networks and Mobility Traces Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology.
1 Efficient Algorithms for Optimal Location Queries in Road Networks Zitong Chen (Sun Yat-Sen University) Yubao Liu (Sun Yat-Sen University) Raymond Chi-Wing.
Database Laboratory Regular Seminar TaeHoon Kim.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Self-Enforcing Private Inference Control Yanjiang Yang (I2R, Singapore) Yingjiu Li (SMU, Singapore) Jian Weng (Jinan Univ. China) Jianying Zhou (I2R, Singapore)
Publishing Microdata with a Robust Privacy Guarantee
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Cryptography Dec 29. This Lecture In this last lecture for number theory, we will see probably the most important application of number theory in computer.
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
Refined privacy models
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and.
Privacy-preserving data publishing
1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.
Thesis Sumathie Sundaresan Advisor: Dr. Huiping Guo.
1 Finding Competitive Price Yu Peng (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)
Trajectory Simplification: On Minimizing the Direction-based Error
Other Clustering Techniques
Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Unraveling an old cloak: k-anonymity for location privacy
Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Versatile Publishing For Privacy Preservation
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
Privacy Preserving Data Publishing
By (Group 17) Mahesha Yelluru Rao Surabhee Sinha Deep Vakharia
Differential Privacy in Practice
Modeling Medical Records of Diabetes using Markov Decision Processes
Indexing and Hashing Basic Concepts Ordered Indices
Walking in the Crowd: Anonymizing Trajectory Data for Pattern Analysis
SHUFFLING-SLICING IN DATA MINING
Refined privacy models
Presentation transcript:

Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1, Raymond Chi-Wing Wong 2, Lei Chen 2, Jiuyong Li 3 The Chinese University of Hong Kong 1 The Hong Kong University of Science and Technology 2 University of South Australia 3 Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong

Outline 1.Sequential Releases 2.Existing Privacy Models m-invariance Privacy breaches 3.Our Proposed Privacy Model l-scarcity 4.Experiments 5.Conclusion

1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 Release the data set to public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance)

1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Release the data set to public Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance) Insertions, deletions and updates

1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance) Insertions, deletions and updates

1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t

2. Existing Privacy Models 1.Byun et al., “ Secure Anonymization for Incremental datasets ”, Secure Data Management, Fung et al, “ Anonymity for Continuous Data Publishing ”, EDBT, Xiao et al, “ m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets ”, SIGMOD, 2007 Considers insertions only Does not consider deletions and updates Considers insertions only Does not consider deletions and updates Considers insertions and deletions only Does not consider updates Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together Updates cannot simply be regarded as “ a deletion and then an insertion ” when privacy is considered.

2. Existing Privacy Models Sensitive Diseases Transient diseases Permanent diseases e.g., If an individual is linked to flu in a published table, s/he can be linked to flu or not in the later published table. curable E.g. flu, fever incurable E.g., HIV e.g., If an individual is linked to HIV in a published table, s/he MUST be linked to HIV in the later published table (that they exist in). We are the first to study these two kinds of sensitive values. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t

2. Existing Privacy Models 1.Byun et al., “ Secure Anonymization for Incremental datasets ”, Secure Data Management, Fung et al, “ Anonymity for Continuous Data Publishing ”, EDBT, Xiao et al, “ m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets ”, SIGMOD, 2007 Considers insertions only Does not consider deletions and updates Considers insertions only Does not consider deletions and updates Considers insertions and deletions only Does not consider updates Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together Does not consider transient/permanent values Also considers transient/permanent values Contributions: We consider a more realistic setting of sequential releases. Insertions, deletions and updates Transient/permanent values We cannot simply adapt these existing privacy models to this realistic setting.

2. Existing Privacy Models 1.Byun et al., “ Secure Anonymization for Incremental datasets ”, Secure Data Management, Fung et al, “ Anonymity for Continuous Data Publishing ”, EDBT, Xiao et al, “ m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets ”, SIGMOD, 2007 Problem (m-invariance): At the current time t, we want to generate a table which satisfies the following. Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together Problem (l-scarcity): At the current time t, we want to generate a table which satisfies the following. Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/l.

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes Release the data set to public

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease Flu HIV Fever HIV Flu Fever Medical Data + Some Useful Attributes Release the data set to public

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Medical Data + Some Useful Attributes Release the data set to public Generalization 3-diversity Each individual is linked to “ HIV ” with probability at most 1/3 in THIS PUBLISHED TABLE 3-diversity only focuses on ONE-TIME publishing 3-invariance focuses on MULTIPLE-TIME publishing It also makes use of the idea of 3-diversity Idea: Each individual is linked to “ HIV ” with probability at most 1/3 with respect to MULTIPLE PUBLISHED TABLES

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Medical Data + Some Useful Attributes Release the data set to public 3-invarianceTime = 1

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PIDSignature p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 {Flu, HIV, Fever} p1p1 p2p2 p3p3 p4p4 p5p5 p6p6

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PIDSignature p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 {Flu, HIV, Fever}

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PIDSignature p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 {Flu, HIV, Fever}

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PIDSignature p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 {Flu, HIV, Fever}

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PIDSignature p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 {Flu, HIV, Fever}

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Fever Alicep4p HIV Bobp5p Flu Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Medical Data + Some Useful Attributes PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 This table satisfies 3-invariance. This is because each individual is linked to the SAME signature. p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Idea of 3-invariance: Each individual is linked to the SAME signature in each published table.

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 Time = 2

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 Time = 2

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Medical Data + Some Useful Attributes This table satisfies 3-invariance. This is because each individual is linked to the SAME signature. p2p2 p3p3 p5p5 p1p1 p4p4 p6p6

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6

NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL| Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p Flu Peterp2p HIV Maryp3p Flu Alicep4p HIV Bobp5p Fever Johnp6p Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 3-invariance

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 3-invariance I know all voter registration lists Knowledge 2 Knowledge 1 NamePIDAgeZip Code Raymondp1p Peterp2p Maryp3p Alicep4p Bobp5p Johnp6p … …… Davidp |RL|

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 3-invariance I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. There are TWO HIVs in the published table. Knowledge 4

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 1 is linked to HIV. Yes No There are TWO HIVs in the published table. Knowledge 4

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 1 is linked to HIV. Yes No There are TWO HIVs in the published table. Knowledge 4

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 1 is linked to HIV. Yes No There are TWO HIVs in the published table. Knowledge 4

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 1 is linked to HIV. Yes No There are TWO HIVs in the published table. Knowledge 4 Contradiction! p 1 CANNOT be linked to HIV.

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 6 is linked to HIV. Yes No There are TWO HIVs in the published table. Knowledge 4

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 6 is linked to HIV. There are TWO HIVs in the published table. Knowledge 4 Yes No

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 6 is linked to HIV. There are TWO HIVs in the published table. Knowledge 4 Yes No

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 6 is linked to HIV. There are TWO HIVs in the published table. Knowledge 4 Contradiction! p 6 CANNOT be linked to HIV. Yes No

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. There are TWO HIVs in the published table. Knowledge 4 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? 3-invariance Problem (m-invariance): At the current time t, we want to generate a table which satisfies the following. Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m.

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 p 2 is an HIV-holder. p 1 is an HIV-decoy. p 3 is an HIV-decoy. HIV-decoys (i.e., p 1 and p 3 ) are used to reduce the strong linkage between p 2 and HIV.

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 p 2 is an HIV-holder. p 1 is an HIV-decoy. p 3 is an HIV-decoy. Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 p 4 is an HIV-holder. Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 p 5 is an HIV-decoy. p 6 is an HIV-decoy.

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 p 1 and p 6 are in the same cohort. Besides, they are in the same group of the published table at time = 3 Idea: This kind of grouping can lead to privacy breaches. We can protect privacy by avoiding this kind of grouping.

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 3-invariance 3-scarcity

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 Knowledge 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 3-scarcity

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 Knowledge 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 3-scarcity Probability that an individual is linked to a sensitive value wrt these three tables is at most 1/3.

3. Algorithm Propose an algorithm which follows the principle Whenever we form one group, choose one member from each cohort

3. Guarantee Theorem: Our proposed algorithm can generate a table which satisfies the following. Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/l (i.e., l-scarcity)

4. Experiments Real Data Set (CADRMP) mps/medeff/databasdon/index_e.html Real hospital database Patient Information (Voter Registration List) 40,478 tuples Medical Record 105,420 tuples Each patient can be linked to multiple diseases

4. Experiments Studies Privacy Breaches of an existing model m-invariance Performance of our proposed algorithm

4.1 Privacy Breaches of an existing model Breach Rate The proportion of tuples with privacy breaches m-invariance

4.2 Performance of our proposed algorithm Measurements Computation Cost Relative Average Error Variations Parameter l (used in l-scarcity) No. of published tables

4.2 Performance of our proposed algorithm

5. Conclusion Sequential Releases QID values can be updated Sensitive values can be updated Sensitive Values Permanent Transient Identify the insufficiency of existing models Algorithm Experiments

Q&A

4.2 Performance of our proposed algorithm

Cohort 1Cohort 2Cohort 3 CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) p2p2 p1p1 p6p6 HIV-holder HIV-decoy p 6 is replaced with a container CI(p 6 ) where the QID attributes of this container (Age, Zip Code) cover p 6 ’ s QID attributes. e.g., (Age, Zip Code) = ([20,26], [29k,33k]) CI(p 6 ) p 6 ’ s QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We switch the role of l-1 HIV-deocys from PRESENT individuals to ABSENT individuals Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys. 1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List). e.g., Container CI(p 6 ) (Age, Zip Code) = ([20,26], [29k,33k]) 1.p 6 (Age, Zip Code) = (20, 29000) 2.p 7 (Age, Zip Code) = (25, 33000) 3.p 8 (Age, Zip Code) = (26, 30000) HIV-decoy HIV-buddy e.g. p 4 (HIV-holder) is absent in this current table. If other HIV-decoys are still present, the adversary can figure out that p 4 is an HIV-holder. HIV-decoy present absent Case 1: HIV-decoy Case 2: HIV-holder HIV-buddy Since one HIV-holder and l-1 HIV decoys become ABSENT together, the adversary cannot figure out who is the REAL HIV-holder.

3. Algorithm Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) We have just discussed how to update the role of each individual (i.e., decoy/holder) according to different scenarios when there is a new medical raw data Algorithm: For the first medical raw table, Use some existing privacy algorithm (e.g., l-diversity) to generate a temporary table T ’ Find HIV-holders and HIV-decoys from T ’ Construct the cohorts according to HIV-holders/decoys Form containers for each HIV-holder/decoy Generate a published table according to the cohorts Whenever there is a new medical raw data Update the role of individuals according to different scenarios Generate some containers (if necessary) Generate a published table according to the cohorts Repeat pick one container from each Cohort form one group by generalizing all these containers Until Cohort 1 is empty

Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … Time = 1 Medical Data Published Data p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 We can make use of some “ existing ” approaches to generate this table which satisfies 3-diverisity.

Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … Time = 1 Medical Data Published Data p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 We create the container of p 2. That is, finding some present individuals (e.g., p 7 ) and some absent individuals (e.g., p 8 ). We can find a generalized QID values which cover the QID values of these individuals and p 2. CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … Some additional individuals in CI(p 1 ) which are present

Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … Time = 1 Medical Data Published Data p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 We create the container of p 2. That is, finding some present individuals (e.g., p 7 ) and some absent individuals (e.g., p 8 ). We can find a generalized QID values which cover the QID values of these individuals and p 2. CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … Some additional individuals in CI(p 2 ) which are present …

Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … Time = 1 Medical Data Published Data p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 We create the container of p 2. That is, finding some present individuals (e.g., p 7 ) and some absent individuals (e.g., p 8 ). We can find a generalized QID values which cover the QID values of these individuals and p 2. CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … Some additional individuals in CI(p 3 ) which are present … …

Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV ……… [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … Time = 1 Medical Data Published Data p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 We create the container of p 2. That is, finding some present individuals (e.g., p 7 ) and some absent individuals (e.g., p 8 ). We can find a generalized QID values which cover the QID values of these individuals and p 2. CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … Some additional individuals in CI(p 4 ), CI(p 5 ) and CI(p 6 ) which are present … … … … …

Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV ……… [20,26][18k,29k]HIV [20,26][18k,29k]Flu ……… ……… AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu ……… [23,26][16k,25k]Flu [23,26][16k,25k]HIV ……… ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever …… … Time = 1Time = 2 Medical Data Published Data Medical Data p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … … … … … … Published Data 1. Update the role of each individual (i.e., decoy/holder) according to different scenarios 2. Pin some individuals if necessary p2p2 p3p3 p6p6 … … … p1p1 p4p4 p5p5 … … …

Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV ……… [20,26][18k,29k]HIV [20,26][18k,29k]Flu ……… ……… AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu ……… [23,26][16k,25k]Flu [23,26][16k,25k]HIV ……… ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever …… … Time = 1Time = 2 Medical Data Published Data Medical Data p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … … … … … … Published Data 1. Update the role of each individual (i.e., decoy/holder) according to different scenarios 2. Pin some individuals if necessary p2p2 p3p3 p6p6 … … … p1p1 p4p4 p5p5 … … … AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu ……… [20,26][12k,29k]Flu [20,26][12k,29k]HIV ……… ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever …… … Time = 3 Published Data p1p1 p2p2 p5p5 … … … p3p3 p4p4 p6p6 … … …

Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy p 6 is replaced with a container CI(p 6 ) where the QID attributes of this container (Age, Zip Code) cover p 6 ’ s QID attributes. e.g., (Age, Zip Code) = ([20,26], [29k,33k]) CI(p 6 ) p 6 ’ s QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys. 1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List). e.g., Container CI(p 6 ) (Age, Zip Code) = ([20,26], [29k,33k]) 1.p 6 (Age, Zip Code) = (20, 29000) 2.p 7 (Age, Zip Code) = (25, 33000) 3.p 8 (Age, Zip Code) = (26, 30000) HIV-decoy HIV-buddy e.g. p 6 suffers from HIV in this current table. p 6 loses its functionality as an HIV-decoy. HIV-decoy From the adversary ’ s point of view, the adversary cannot know p 6 or p 7 is the original HIV-decoy. Thus, the role replacement still protects privacy. present absent This idea is valid when there EXISTS another individual for replacement. If not, then? e.g. p 7 suffers from HIV in some later tables. p 7 loses its functionality as an HIV-decoy. We cannot find other HIV-buddies for replacement. Then, we pin p 7. That is, the original HIV value of p 7 will be modified/suppressed to a transient value (e.g., Flu). Once it is pinned, it will be acted as an HIV-decoy forever until it disappears.

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 We just show a simple case for anonymization. In this case, Scenario 1: If the individual does not suffer from HIV, s/he will not suffer from HIV in the later published tables. NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Time = 1Time = 2 Time = 3 How should we anonymize when these individuals may develop a new permanent disease? HIV p 6 originally is used as an HIV-decoy. Now, it changes its role from an HIV- decoy to an HIV-holder. It loses its functionality to protect other HIV-holders (in Cohort 1). Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 We just show a simple case for anonymization. In this case, Scenario 2: If an individual is present in an earlier published table, s/he is also present in all later published tables. NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Time = 1Time = 2 Time = 3 How should we anonymize when some individuals are absent in a later published table. p 6 originally is used as an HIV-decoy. Now, it disappears in this published table. It loses its functionality to protect other HIV-holders (in Cohort 1). Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Time = 1Time = 2 Time = 3 Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. There are other scenarios. e.g., Some individuals who are absent in some earlier published tables are present in this table. In this talk, we focus on Scenario 1 and Scenario 2. You can find other scenarios in the paper.

Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy p 6 has the QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy p 6 is replaced with a container CI(p 6 ) where the QID attributes of this container (Age, Zip Code) cover p 6 ’ s QID attributes. e.g., (Age, Zip Code) = ([20,26], [29k,33k]) CI(p 6 ) p 6 ’ s QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys. 1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List). e.g., Container CI(p 6 ) (Age, Zip Code) = ([20,26], [29k,33k]) 1.p 6 (Age, Zip Code) = (20, 29000) 2.p 7 (Age, Zip Code) = (25, 33000) 3.p 8 (Age, Zip Code) = (26, 30000) HIV-decoy HIV-buddy present HIV-buddy present absent

Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy p 6 is replaced with a container CI(p 6 ) where the QID attributes of this container (Age, Zip Code) cover p 6 ’ s QID attributes. e.g., (Age, Zip Code) = ([20,26], [29k,33k]) CI(p 6 ) p 6 ’ s QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys. 1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List). e.g., Container CI(p 6 ) (Age, Zip Code) = ([20,26], [29k,33k]) 1.p 6 (Age, Zip Code) = (20, 29000) 2.p 7 (Age, Zip Code) = (25, 33000) 3.p 8 (Age, Zip Code) = (26, 30000) HIV-decoy HIV-buddy e.g. p 6 suffers from HIV in this current table. p 6 loses its functionality as an HIV-decoy. HIV-decoy From the adversary ’ s point of view, the adversary cannot know p 6 or p 7 is the original HIV-decoy. Thus, the role replacement still protects privacy. present absent Note that we are updating the role of some individuals (i.e., decoy/ holder/ buddy) for Scenario 1 when there is a new medical raw data (e.g. time=3)

Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy p 6 is replaced with a container CI(p 6 ) where the QID attributes of this container (Age, Zip Code) cover p 6 ’ s QID attributes. e.g., (Age, Zip Code) = ([20,26], [29k,33k]) CI(p 6 ) p 6 ’ s QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys. 1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List). e.g., Container CI(p 6 ) (Age, Zip Code) = ([20,26], [29k,33k]) 1.p 6 (Age, Zip Code) = (20, 29000) 2.p 7 (Age, Zip Code) = (25, 33000) 3.p 8 (Age, Zip Code) = (26, 30000) HIV-decoy HIV-buddy e.g. p 6 (HIV-decoy) is absent in this current table. p 6 loses its functionality as an HIV-decoy. HIV-decoy From the adversary ’ s point of view, the adversary cannot know p 6 or p 7 is the original HIV-decoy. Thus, the role replacement still protects privacy. present absent Case 1: HIV-decoy Case 2: HIV-holder Note that we are updating the role of some individuals (i.e., decoy/ holder/ buddy) for Scenario 2 when there is a new medical raw data (e.g. time=3)

Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy CI(p 6 ) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy). CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 )

3. Algorithm Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) We have just discussed how to update the role of each individual (i.e., decoy/holder) according to different scenarios when there is a new medical raw data Algorithm: For the first medical raw table, Construct the cohorts with some methods Generate a published table according to the cohorts Whenever there is a new medical raw data Update the role of individuals according to different scenarios Generate some containers (if necessary) Generate a published table according to the cohorts Repeat pick one container from each Cohort form one group by generalizing all these containers Until Cohort 1 is empty

3. Multiple Diseases We just consider that each individual is linked to one disease We can extend to handle that each individual is linked to multiple diseases