Download presentation
Presentation is loading. Please wait.
1
Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1, Raymond Chi-Wing Wong 2, Lei Chen 2, Jiuyong Li 3 The Chinese University of Hong Kong 1 The Hong Kong University of Science and Technology 2 University of South Australia 3 Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong
2
Outline 1.Sequential Releases 2.Existing Privacy Models m-invariance Privacy breaches 3.Our Proposed Privacy Model l-scarcity 4.Experiments 5.Conclusion
3
1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 Release the data set to public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance)
4
1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Release the data set to public Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance) Insertions, deletions and updates
5
1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance) Insertions, deletions and updates
6
1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t
7
2. Existing Privacy Models 1.Byun et al., “ Secure Anonymization for Incremental datasets ”, Secure Data Management, 2006 2.Fung et al, “ Anonymity for Continuous Data Publishing ”, EDBT, 2008 3.Xiao et al, “ m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets ”, SIGMOD, 2007 Considers insertions only Does not consider deletions and updates Considers insertions only Does not consider deletions and updates Considers insertions and deletions only Does not consider updates Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together Updates cannot simply be regarded as “ a deletion and then an insertion ” when privacy is considered.
8
2. Existing Privacy Models Sensitive Diseases Transient diseases Permanent diseases e.g., If an individual is linked to flu in a published table, s/he can be linked to flu or not in the later published table. curable E.g. flu, fever incurable E.g., HIV e.g., If an individual is linked to HIV in a published table, s/he MUST be linked to HIV in the later published table (that they exist in). We are the first to study these two kinds of sensitive values. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t
9
2. Existing Privacy Models 1.Byun et al., “ Secure Anonymization for Incremental datasets ”, Secure Data Management, 2006 2.Fung et al, “ Anonymity for Continuous Data Publishing ”, EDBT, 2008 3.Xiao et al, “ m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets ”, SIGMOD, 2007 Considers insertions only Does not consider deletions and updates Considers insertions only Does not consider deletions and updates Considers insertions and deletions only Does not consider updates Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together Does not consider transient/permanent values Also considers transient/permanent values Contributions: We consider a more realistic setting of sequential releases. Insertions, deletions and updates Transient/permanent values We cannot simply adapt these existing privacy models to this realistic setting.
10
2. Existing Privacy Models 1.Byun et al., “ Secure Anonymization for Incremental datasets ”, Secure Data Management, 2006 2.Fung et al, “ Anonymity for Continuous Data Publishing ”, EDBT, 2008 3.Xiao et al, “ m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets ”, SIGMOD, 2007 Problem (m-invariance): At the current time t, we want to generate a table which satisfies the following. Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together Problem (l-scarcity): At the current time t, we want to generate a table which satisfies the following. Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/l.
11
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes Release the data set to public
12
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease 2316355Flu 2215500HIV 2112900Fever 2618310HIV 2525000Flu 2029000Fever Medical Data + Some Useful Attributes Release the data set to public
13
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Medical Data + Some Useful Attributes Release the data set to public Generalization 3-diversity Each individual is linked to “ HIV ” with probability at most 1/3 in THIS PUBLISHED TABLE 3-diversity only focuses on ONE-TIME publishing 3-invariance focuses on MULTIPLE-TIME publishing It also makes use of the idea of 3-diversity Idea: Each individual is linked to “ HIV ” with probability at most 1/3 with respect to MULTIPLE PUBLISHED TABLES
14
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Medical Data + Some Useful Attributes Release the data set to public 3-invarianceTime = 1
15
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PIDSignature p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 {Flu, HIV, Fever} p1p1 p2p2 p3p3 p4p4 p5p5 p6p6
16
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PIDSignature p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 {Flu, HIV, Fever}
17
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PIDSignature p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 {Flu, HIV, Fever}
18
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PIDSignature p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 {Flu, HIV, Fever}
19
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PIDSignature p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 {Flu, HIV, Fever}
20
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6
21
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Fever Alicep4p4 2618310HIV Bobp5p5 2525000Flu Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6
22
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2525000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6
23
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2525000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Medical Data + Some Useful Attributes PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 This table satisfies 3-invariance. This is because each individual is linked to the SAME signature. p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Idea of 3-invariance: Each individual is linked to the SAME signature in each published table.
24
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2525000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 Time = 2
25
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2525000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2
26
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2525000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 Time = 2
27
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2525000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6
28
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2525000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 2 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6
29
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2515000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2515000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6
30
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2515000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2515000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Medical Data + Some Useful Attributes This table satisfies 3-invariance. This is because each individual is linked to the SAME signature. p2p2 p3p3 p5p5 p1p1 p4p4 p6p6
31
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2515000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2515000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3
32
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2515000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2515000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3
33
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2515000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2515000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3
34
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2515000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2515000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6
35
NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2515000 Johnp6p6 2029000 … …… Davidp |RL| 3131000 Public Hospital Voter Registration List NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Medical Data NamePIDAgeZip CodeDisease Raymondp1p1 2316355Flu Peterp2p2 2215500HIV Maryp3p3 2112900Flu Alicep4p4 2618310HIV Bobp5p5 2515000Fever Johnp6p6 2029000Fever Medical Data + Some Useful Attributes AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Release the data set to public 3-invarianceTime = 3 Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6
36
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever Time = 3 PI D Signature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 3-invariance
37
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 3-invariance I know all voter registration lists Knowledge 2 Knowledge 1 NamePIDAgeZip Code Raymondp1p1 2316355 Peterp2p2 2215500 Maryp3p3 2112900 Alicep4p4 2618310 Bobp5p5 2525000 Johnp6p6 2029000 … …… Davidp |RL| 3131000
38
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 PIDSignature p1p1 {Flu, HIV, Fever} p2p2 p3p3 p4p4 p5p5 p6p6 3-invariance I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. There are TWO HIVs in the published table. Knowledge 4
39
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 1 is linked to HIV. Yes No There are TWO HIVs in the published table. Knowledge 4
40
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 1 is linked to HIV. Yes No There are TWO HIVs in the published table. Knowledge 4
41
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 1 is linked to HIV. Yes No There are TWO HIVs in the published table. Knowledge 4
42
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 1 is linked to HIV. Yes No There are TWO HIVs in the published table. Knowledge 4 Contradiction! p 1 CANNOT be linked to HIV.
43
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 6 is linked to HIV. Yes No There are TWO HIVs in the published table. Knowledge 4
44
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 6 is linked to HIV. There are TWO HIVs in the published table. Knowledge 4 Yes No
45
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 6 is linked to HIV. There are TWO HIVs in the published table. Knowledge 4 Yes No
46
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. PIDHIV? p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 Proof by contradiction. Suppose p 6 is linked to HIV. There are TWO HIVs in the published table. Knowledge 4 Contradiction! p 6 CANNOT be linked to HIV. Yes No
47
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 I know all voter registration lists Knowledge 2 Knowledge 1 I know that HIV is a permanent sensitive value. Knowledge 3 I can deduce that p 1 and p 6 cannot be linked to HIV. There are TWO HIVs in the published table. Knowledge 4 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? 3-invariance Problem (m-invariance): At the current time t, we want to generate a table which satisfies the following. Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m.
48
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 p 2 is an HIV-holder. p 1 is an HIV-decoy. p 3 is an HIV-decoy. HIV-decoys (i.e., p 1 and p 3 ) are used to reduce the strong linkage between p 2 and HIV.
49
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 p 2 is an HIV-holder. p 1 is an HIV-decoy. p 3 is an HIV-decoy. Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy
50
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 p 4 is an HIV-holder. Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 p 5 is an HIV-decoy. p 6 is an HIV-decoy.
51
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5
52
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5
53
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5
54
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 I can deduce that p 4 MUST be linked to HIV. Privacy breaches! Why? NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Original Medical Data Time = 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 p 1 and p 6 are in the same cohort. Besides, they are in the same group of the published table at time = 3 Idea: This kind of grouping can lead to privacy breaches. We can protect privacy by avoiding this kind of grouping.
55
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 AgeZip CodeDisease [21,25][12k,16k]HIV [21,25][12k,16k]Flu [21,25][12k,16k]Fever [20,26][16k,29k]Flu [20,26][16k,29k]HIV [20,26][16k,29k]Fever p2p2 p3p3 p5p5 p1p1 p4p4 p6p6 Time = 3 Knowledge 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 3-invariance 3-scarcity
56
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 Knowledge 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 3-scarcity
57
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 Knowledge 1 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 3-scarcity Probability that an individual is linked to a sensitive value wrt these three tables is at most 1/3.
58
3. Algorithm Propose an algorithm which follows the principle Whenever we form one group, choose one member from each cohort
59
3. Guarantee Theorem: Our proposed algorithm can generate a table which satisfies the following. Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/l (i.e., l-scarcity)
60
4. Experiments Real Data Set (CADRMP) http://www.hc-sc.gc.ca/dhp- mps/medeff/databasdon/index_e.html Real hospital database Patient Information (Voter Registration List) 40,478 tuples Medical Record 105,420 tuples Each patient can be linked to multiple diseases
61
4. Experiments Studies Privacy Breaches of an existing model m-invariance Performance of our proposed algorithm
62
4.1 Privacy Breaches of an existing model Breach Rate The proportion of tuples with privacy breaches m-invariance
63
4.2 Performance of our proposed algorithm Measurements Computation Cost Relative Average Error Variations Parameter l (used in l-scarcity) No. of published tables
64
4.2 Performance of our proposed algorithm
65
5. Conclusion Sequential Releases QID values can be updated Sensitive values can be updated Sensitive Values Permanent Transient Identify the insufficiency of existing models Algorithm Experiments
66
Q&A
67
4.2 Performance of our proposed algorithm
68
Cohort 1Cohort 2Cohort 3 CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) p2p2 p1p1 p6p6 HIV-holder HIV-decoy p 6 is replaced with a container CI(p 6 ) where the QID attributes of this container (Age, Zip Code) cover p 6 ’ s QID attributes. e.g., (Age, Zip Code) = ([20,26], [29k,33k]) CI(p 6 ) p 6 ’ s QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We switch the role of l-1 HIV-deocys from PRESENT individuals to ABSENT individuals Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys. 1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List). e.g., Container CI(p 6 ) (Age, Zip Code) = ([20,26], [29k,33k]) 1.p 6 (Age, Zip Code) = (20, 29000) 2.p 7 (Age, Zip Code) = (25, 33000) 3.p 8 (Age, Zip Code) = (26, 30000) HIV-decoy HIV-buddy e.g. p 4 (HIV-holder) is absent in this current table. If other HIV-decoys are still present, the adversary can figure out that p 4 is an HIV-holder. HIV-decoy present absent Case 1: HIV-decoy Case 2: HIV-holder HIV-buddy Since one HIV-holder and l-1 HIV decoys become ABSENT together, the adversary cannot figure out who is the REAL HIV-holder.
69
3. Algorithm Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) We have just discussed how to update the role of each individual (i.e., decoy/holder) according to different scenarios when there is a new medical raw data Algorithm: For the first medical raw table, Use some existing privacy algorithm (e.g., l-diversity) to generate a temporary table T ’ Find HIV-holders and HIV-decoys from T ’ Construct the cohorts according to HIV-holders/decoys Form containers for each HIV-holder/decoy Generate a published table according to the cohorts Whenever there is a new medical raw data Update the role of individuals according to different scenarios Generate some containers (if necessary) Generate a published table according to the cohorts Repeat pick one container from each Cohort form one group by generalizing all these containers Until Cohort 1 is empty
70
Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … Time = 1 Medical Data Published Data p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 We can make use of some “ existing ” approaches to generate this table which satisfies 3-diverisity.
71
Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … Time = 1 Medical Data Published Data p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 We create the container of p 2. That is, finding some present individuals (e.g., p 7 ) and some absent individuals (e.g., p 8 ). We can find a generalized QID values which cover the QID values of these individuals and p 2. CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … Some additional individuals in CI(p 1 ) which are present
72
Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … Time = 1 Medical Data Published Data p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 We create the container of p 2. That is, finding some present individuals (e.g., p 7 ) and some absent individuals (e.g., p 8 ). We can find a generalized QID values which cover the QID values of these individuals and p 2. CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … Some additional individuals in CI(p 2 ) which are present …
73
Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … Time = 1 Medical Data Published Data p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 We create the container of p 2. That is, finding some present individuals (e.g., p 7 ) and some absent individuals (e.g., p 8 ). We can find a generalized QID values which cover the QID values of these individuals and p 2. CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … Some additional individuals in CI(p 3 ) which are present … …
74
Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV ……… [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … Time = 1 Medical Data Published Data p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 We create the container of p 2. That is, finding some present individuals (e.g., p 7 ) and some absent individuals (e.g., p 8 ). We can find a generalized QID values which cover the QID values of these individuals and p 2. CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … Some additional individuals in CI(p 4 ), CI(p 5 ) and CI(p 6 ) which are present … … … … …
75
Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV ……… [20,26][18k,29k]HIV [20,26][18k,29k]Flu ……… ……… AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu ……… [23,26][16k,25k]Flu [23,26][16k,25k]HIV ……… ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever …… … Time = 1Time = 2 Medical Data Published Data Medical Data p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … … … … … … Published Data 1. Update the role of each individual (i.e., decoy/holder) according to different scenarios 2. Pin some individuals if necessary p2p2 p3p3 p6p6 … … … p1p1 p4p4 p5p5 … … …
76
Cohort 1Cohort 2 Cohort 3 HIV-holder HIV-decoy p2p2 p1p1 p3p3 p4p4 p6p6 p5p5 CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) Algorithm AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV ……… [20,26][18k,29k]HIV [20,26][18k,29k]Flu ……… ……… AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu ……… [23,26][16k,25k]Flu [23,26][16k,25k]HIV ……… ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever …… … NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever …… … Time = 1Time = 2 Medical Data Published Data Medical Data p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 … … … … … … Published Data 1. Update the role of each individual (i.e., decoy/holder) according to different scenarios 2. Pin some individuals if necessary p2p2 p3p3 p6p6 … … … p1p1 p4p4 p5p5 … … … AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu ……… [20,26][12k,29k]Flu [20,26][12k,29k]HIV ……… ……… NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever …… … Time = 3 Published Data p1p1 p2p2 p5p5 … … … p3p3 p4p4 p6p6 … … …
77
Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy p 6 is replaced with a container CI(p 6 ) where the QID attributes of this container (Age, Zip Code) cover p 6 ’ s QID attributes. e.g., (Age, Zip Code) = ([20,26], [29k,33k]) CI(p 6 ) p 6 ’ s QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys. 1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List). e.g., Container CI(p 6 ) (Age, Zip Code) = ([20,26], [29k,33k]) 1.p 6 (Age, Zip Code) = (20, 29000) 2.p 7 (Age, Zip Code) = (25, 33000) 3.p 8 (Age, Zip Code) = (26, 30000) HIV-decoy HIV-buddy e.g. p 6 suffers from HIV in this current table. p 6 loses its functionality as an HIV-decoy. HIV-decoy From the adversary ’ s point of view, the adversary cannot know p 6 or p 7 is the original HIV-decoy. Thus, the role replacement still protects privacy. present absent This idea is valid when there EXISTS another individual for replacement. If not, then? e.g. p 7 suffers from HIV in some later tables. p 7 loses its functionality as an HIV-decoy. We cannot find other HIV-buddies for replacement. Then, we pin p 7. That is, the original HIV value of p 7 will be modified/suppressed to a transient value (e.g., Flu). Once it is pinned, it will be acted as an HIV-decoy forever until it disappears.
78
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 We just show a simple case for anonymization. In this case, Scenario 1: If the individual does not suffer from HIV, s/he will not suffer from HIV in the later published tables. NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Time = 1Time = 2 Time = 3 How should we anonymize when these individuals may develop a new permanent disease? HIV p 6 originally is used as an HIV-decoy. Now, it changes its role from an HIV- decoy to an HIV-holder. It loses its functionality to protect other HIV-holders (in Cohort 1). Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.
79
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 We just show a simple case for anonymization. In this case, Scenario 2: If an individual is present in an earlier published table, s/he is also present in all later published tables. NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Time = 1Time = 2 Time = 3 How should we anonymize when some individuals are absent in a later published table. p 6 originally is used as an HIV-decoy. Now, it disappears in this published table. It loses its functionality to protect other HIV-holders (in Cohort 1). Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.
80
AgeZip CodeDisease [21,23][12k,17k]Flu [21,23][12k,17k]HIV [21,23][12k,17k]Fever [20,26][18k,29k]HIV [20,26][18k,29k]Flu [20,26][18k,29k]Fever Time = 1 p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 AgeZip CodeDisease [20,22][12k,29k]HIV [20,22][12k,29k]Flu [20,22][12k,29k]Fever [23,26][16k,25k]Flu [23,26][16k,25k]HIV [23,26][16k,25k]Fever p2p2 p3p3 p6p6 p1p1 p4p4 p5p5 Time = 2 Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 HIV-holder HIV-decoy p4p4 p6p6 p5p5 AgeZip CodeDisease [22,25][15k,17k]HIV [22,25][15k,17k]Flu [22,25][15k,17k]Fever [20,26][12k,29k]Flu [20,26][12k,29k]HIV [20,26][12k,29k]Fever p1p1 p2p2 p5p5 p3p3 p4p4 p6p6 Time = 3 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Flu Alicep4p4 HIV Bobp5p5 Fever Johnp6p6 Fever Time = 1Time = 2 Time = 3 Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. There are other scenarios. e.g., Some individuals who are absent in some earlier published tables are present in this table. In this talk, we focus on Scenario 1 and Scenario 2. You can find other scenarios in the paper.
81
Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy p 6 has the QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy).
82
Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy p 6 is replaced with a container CI(p 6 ) where the QID attributes of this container (Age, Zip Code) cover p 6 ’ s QID attributes. e.g., (Age, Zip Code) = ([20,26], [29k,33k]) CI(p 6 ) p 6 ’ s QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys. 1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List). e.g., Container CI(p 6 ) (Age, Zip Code) = ([20,26], [29k,33k]) 1.p 6 (Age, Zip Code) = (20, 29000) 2.p 7 (Age, Zip Code) = (25, 33000) 3.p 8 (Age, Zip Code) = (26, 30000) HIV-decoy HIV-buddy present HIV-buddy present absent
83
Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy p 6 is replaced with a container CI(p 6 ) where the QID attributes of this container (Age, Zip Code) cover p 6 ’ s QID attributes. e.g., (Age, Zip Code) = ([20,26], [29k,33k]) CI(p 6 ) p 6 ’ s QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys. 1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List). e.g., Container CI(p 6 ) (Age, Zip Code) = ([20,26], [29k,33k]) 1.p 6 (Age, Zip Code) = (20, 29000) 2.p 7 (Age, Zip Code) = (25, 33000) 3.p 8 (Age, Zip Code) = (26, 30000) HIV-decoy HIV-buddy e.g. p 6 suffers from HIV in this current table. p 6 loses its functionality as an HIV-decoy. HIV-decoy From the adversary ’ s point of view, the adversary cannot know p 6 or p 7 is the original HIV-decoy. Thus, the role replacement still protects privacy. present absent Note that we are updating the role of some individuals (i.e., decoy/ holder/ buddy) for Scenario 1 when there is a new medical raw data (e.g. time=3)
84
Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy p 6 is replaced with a container CI(p 6 ) where the QID attributes of this container (Age, Zip Code) cover p 6 ’ s QID attributes. e.g., (Age, Zip Code) = ([20,26], [29k,33k]) CI(p 6 ) p 6 ’ s QID attributes (Age, Zip Code) = (20, 29000) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy). Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys. 1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List). e.g., Container CI(p 6 ) (Age, Zip Code) = ([20,26], [29k,33k]) 1.p 6 (Age, Zip Code) = (20, 29000) 2.p 7 (Age, Zip Code) = (25, 33000) 3.p 8 (Age, Zip Code) = (26, 30000) HIV-decoy HIV-buddy e.g. p 6 (HIV-decoy) is absent in this current table. p 6 loses its functionality as an HIV-decoy. HIV-decoy From the adversary ’ s point of view, the adversary cannot know p 6 or p 7 is the original HIV-decoy. Thus, the role replacement still protects privacy. present absent Case 1: HIV-decoy Case 2: HIV-holder Note that we are updating the role of some individuals (i.e., decoy/ holder/ buddy) for Scenario 2 when there is a new medical raw data (e.g. time=3)
85
Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy CI(p 6 ) Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table. Scenario 2: Some individuals who are present in some earlier published tables are absent in this table. Idea: We find another individual to replace its original role (i.e., an HIV-decoy). CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 )
86
3. Algorithm Cohort 1 p2p2 Cohort 2Cohort 3 p1p1 p3p3 p4p4 p6p6 p5p5 HIV-holder HIV-decoy CI(p 6 ) CI(p 1 ) CI(p 5 ) CI(p 3 ) CI(p 4 ) CI(p 2 ) We have just discussed how to update the role of each individual (i.e., decoy/holder) according to different scenarios when there is a new medical raw data Algorithm: For the first medical raw table, Construct the cohorts with some methods Generate a published table according to the cohorts Whenever there is a new medical raw data Update the role of individuals according to different scenarios Generate some containers (if necessary) Generate a published table according to the cohorts Repeat pick one container from each Cohort form one group by generalizing all these containers Until Cohort 1 is empty
87
3. Multiple Diseases We just consider that each individual is linked to one disease We can extend to handle that each individual is linked to multiple diseases
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.