Download presentation
Presentation is loading. Please wait.
Published byWyatt Burbridge Modified over 10 years ago
1
Simulatability “The enemy knows the system”, Claude Shannon CompSci 590.03 Instructor: Ashwin Machanavajjhala 1Lecture 6 : 590.03 Fall 12
2
Announcements Please meet with me at least 2 times before you finalize your project (deadline Sep 28). Lecture 6 : 590.03 Fall 122
3
Recap – L-Diversity The link between identity and attribute value is the sensitive information. “Does Bob have Cancer? Heart disease? Flu?” “Does Umeko have Cancer? Heart disease? Flu?” Adversary knows ≤ L-2 negation statements. “Umeko does not have Heart Disease.” – Data Publisher may not know exact adversarial knowledge Privacy is breached when identity can be linked to attribute value with high probability Pr[ “Bob has Cancer” | published table, adv. knowledge] > t 3Lecture 6 : 590.03 Fall 12
4
ZipAgeNat. Disease 1306*<=40*Heart 1306*<=40*Flu 1306*<=40*Cancer 1306*<=40*Cancer 1485*>40*Cancer 1485*>40*Heart 1485*>40*Flu 1485*>40*Flu 1305*<=40*Heart 1305*<=40*Flu 1305*<=40*Cancer 1305*<=40*Cancer Recap – 3-Diverse Table 4 L-Diversity Principle: Every group of tuples with the same Q-ID values has ≥ L distinct sensitive values of roughly equal proportions. Lecture 6 : 590.03 Fall 12
5
Outline Simulatable Auditing Minimality Attack in anonymization Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 125
6
Query Auditing Database has numeric values (say salaries of employees). Database either truthfully answers a question or denies answering. MIN, MAX, SUM queries over subsets of the database. Question: When to allow/deny queries? Database Researcher Query Safe to publish? Yes No 6Lecture 6 : 590.03 Fall 12
7
Why should we deny queries? Q1: Ben’s sensitive value? – DENY Q2: Max sensitive value of males? – ANSWER: 2 Q3: Max sensitive value of 1 st year PhD students? – ANSWER: 3 But Q3 + Q2 => Xi = 3 Lecture 6 : 590.03 Fall 127 Name1 st year PhD GenderSensitiv e value BenYM1 BhaNM1 IosYM1 JanNM2 JianYM2 JieNM1 JoeNM2 MohNM1 SonNF1 XiYF3 YaoNM2
8
Value-Based Auditing Let a 1, a 2, …, a k be the answers to previous queries Q 1, Q 2, …, Q k. Let a k+1 be the answer to Q k+1. a i = f(c i1 x 1, c i2 x 2, …, c in x n ), i = 1 … k+1 c im = 1 if Q i depends on x m Check if any x j has a unique solution. 8Lecture 6 : 590.03 Fall 12
9
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 9Lecture 6 : 590.03 Fall 12
10
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 -∞ ≤ x 1 … x 5 ≤ 10 10Lecture 6 : 590.03 Fall 12
11
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY -∞ ≤ x 1 … x 4 ≤ 8 => x 5 = 10 11Lecture 6 : 590.03 Fall 12
12
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY Denial means some value can be compromised! 12Lecture 6 : 590.03 Fall 12
13
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY What could max(x1, x2, x3, x4) be? 13Lecture 6 : 590.03 Fall 12
14
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY From first answer, max(x1,x2,x3,x4) ≤ 10 14Lecture 6 : 590.03 Fall 12
15
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY If, max(x1,x2,x3,x4) = 10 Then, no privacy breach 15Lecture 6 : 590.03 Fall 12
16
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY Hence, max(x1,x2,x3,x4) x5 = 10! 16Lecture 6 : 590.03 Fall 12
17
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY Hence, max(x1,x2,x3,x4) x5 = 10! Denials leak information. Attack occurred since privacy analysis did not assume that attacker knows the algorithm. 17Lecture 6 : 590.03 Fall 12
18
Simulatable Auditing [Kenthapadi et al PODS ‘05] An auditor is simulatable if the decision to deny a query Q k is made based on information already available to the attacker. – Can use querie s Q 1, Q 2, …, Q k and answers a 1, a 2, …, a k-1 – Cannot use a k or the actual data to make the decision. Denials provably do not leak informaiton – Because the attacker could equivalently determine whether the query would be denied. – Attacker can mimic or simulate the auditor. 18Lecture 6 : 590.03 Fall 12
19
Simulatable Auditing Algorithm Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Before computing answer DENY Ans > 10 => not possible Ans = 10 => -∞ ≤ x 1 … x 4 ≤ 10 Ans x 5 = 10 SAFE UNSAFE 19Lecture 6 : 590.03 Fall 12
20
Summary of Simulatable Auditing Decision to deny answers must be based on past queries answered in some (many!) cases. Denials can leak information if the adversary does not know all the information that is used to decide whether to deny the query. 20Lecture 6 : 590.03 Fall 12
21
Outline Simulatable Auditing Minimality Attack in anonymization Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 1221
22
Minimality attack on Generalization algorithms Algorithms for K-anonymity, L-diversity, T-closeness, etc. try to maximize utility. – Find a minimally generalized table in the lattice that satisfies privacy, and maximizes utility. But … attacker also knows this algorithm! Lecture 6 : 590.03 Fall 1222
23
Example Minimality attack [Wong et al VLDB07] Dataset with one quasi-identifier and 2 values q1, q2. q1, q2 generalize to Q. Sensitive attribute: Cancer – yes/no We want to ensure P[Cancer = yes] < ½. – OK to know if an individual does not have Cancer. Published Table: Lecture 6 : 590.03 Fall 1223 QIDCancer QYes Q QNo Q q2No q2No
24
Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1224 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q1Yes q1Yes q1No q2No q2No q2No QIDCancer q1Yes q1No q1No q2Yes q2No q2No
25
Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1225 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q1Yes QNo Q q2Yes q2No q2No This is a better generalization!
26
Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1226 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 1 occurrence of q1 QIDCancer q2Yes q1Yes q2No q2No q2No q2No QIDCancer q2Yes q2Yes q1No q2No q2No q2No
27
Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1227 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q2Yes QNo Q q2Yes q2No q2No This is a better generalization!
28
Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1228 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q2Yes QNo Q q2Yes q2No q2No There must be exactly two tuples with q1
29
Which input datasets could have led to the published table? QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 2 occurrences of q1 QIDCancer q1Yes q1Yes q2No q2No q2No q2No QIDCancer q2Yes q2Yes q1No q1No q2No q2No QIDCancer q1Yes q2Yes q1No q2No q2No q2No Already satisfies privacy 29Lecture 6 : 590.03 Fall 12
30
Which input datasets could have led to the published table? QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 2 occurrences of q1 QIDCancer q1Yes q1Yes q2No q2No q2No q2No QIDCancer q2Yes q2Yes q1No q1No q2No q2No Learning Cancer=NO is OK, Hence, this is private 30Lecture 6 : 590.03 Fall 12
31
Which input datasets could have led to the published table? QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 2 occurrences of q1 QIDCancer q1Yes q1Yes q2No q2No q2No q2No This is the ONLY input that results in the output! P[Cancer = yes | q1] = 1 31Lecture 6 : 590.03 Fall 12
32
Outline Simulatable Auditing Minimality Attack in anonymization Transparent Anonymization: Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 1232
33
Transparent Anonymization Assume that the adversary knows the algorithm that is being used. Lecture 6 : 590.03 Fall 1233 O: Output table I (O, A) : Input tables that result in O due to algorithm A I: All possible input tables
34
Transparent Anonymization According to I (O, A) privacy must be guaranteed. – Probability must be computed assuming I (O,A) is the actual set of all possible input tables. What is an efficient algorithm for Transparent Anonymization? – For L-diversity? Lecture 6 : 590.03 Fall 1234
35
Ace Algorithm [Xiao et al TODS’10] Step 1: Assign Just based on the sensitive values, construct (in a randomized fashion) an intermediate L-diverse generation. Step 2: Split Only based on the quasi-identifier values (and without looking at sensitive values), deterministically refine the intermediate solution to maximize utility. Lecture 6 : 590.03 Fall 1235
36
Step 1: Assign Input Table Lecture 6 : 590.03 Fall 1236
37
Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values Lecture 6 : 590.03 Fall 1237
38
Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values – 1 st iteration β=2, α=2 Lecture 6 : 590.03 Fall 1238
39
Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values – 2 nd iteration β=2, α=1 Lecture 6 : 590.03 Fall 1239
40
Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values – 3 rd iteration β=2, α=1 Lecture 6 : 590.03 Fall 1240
41
Intermediate Generalization NameAgeZip Ann2110000 Bob2718000 Gill6063000 Ed5460000 Don3235000 Fred6063000 Hera6063000 Cate3235000 Lecture 6 : 590.03 Fall 1241 Disease Dyspepsia Flu Bronchitis Gastritis Diabetes Gastritis
42
Step 2: Split If a bucket contains α>1 tuples of each sensitive value, split it into two buckets, B a and B b s.t., – Pick 1 ≤ α a < α tuples from each sensitive value in bucket B, and put them in bucket B a. The remaining tuples go to B b. – The division (B a, B b ) is optimal in terms of utility. Lecture 6 : 590.03 Fall 1242 NameAgeZip Ann2110000 Bob2718000 Gill6063000 Ed5460000 Don3235000 Fred6063000 Hera6063000 Cate3235000
43
Why does the Ace algorithm satisfy Transparent L-Diversity? According to I (O, A) privacy must be guaranteed. – Probability must be computed assuming I (O,A) is the actual set of all possible input tables. Lecture 6 : 590.03 Fall 1243 O: Output table I (O, A) : Input tables that result in O due to algorithm A I: All possible input tables
44
Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): Consider an intermediate output Int Suppose there is some input table T such that Assign(T) = Int Any other table T’ where the sensitive values of 2 individuals in the same group are swapped, also leads to the same intermediate output Int. Lecture 6 : 590.03 Fall 1244
45
Ace algorithm analysis Lecture 6 : 590.03 Fall 1245 Both tables result in the same intermediate output.
46
Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): Consider an intermediate output Int Suppose there is some input table T such that Assign(T) = Int Any other table T’, where the sensitive values of 2 individuals in the same group are swapped, also leads to the same intermediate output. The set of input tables I(Int,A) contains all possible assignments of diseases to individuals within each group of Int. Lecture 6 : 590.03 Fall 1246
47
Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): The set of table I(Int,A) contains all possible assignments of diseases to individuals in each group of Int. P[Ann has dyspepsia | I (Int,A) and Int] = 1/2 Lecture 6 : 590.03 Fall 1247 NameAgeZip Ann2110000 Bob2718000 Gill6063000 Ed5460000 Disease Dyspepsia Flu
48
Ace algorithm analysis Lemma 2: The split phase also satisfies transparent L-diversity. Proof (sketch): I(Int, Assign) contains all tables where an individual is assigned to an arbitrary sensitive value within the same group in Int. Suppose some input table T ε I(Int, Assign) results in the final output O after Split. Lecture 6 : 590.03 Fall 1248
49
Ace algorithm analysis Split does not depend on the sensitive values. Lecture 6 : 590.03 Fall 1249 Ann Gill Bob Ed dyspepsia flu AnnBob dyspepsia flu GillEd dyspepsia flu results in Bob Ed Ann Gill dyspepsia flu BobAnn dyspepsia flu EdGill dyspepsia flu results in
50
Ace algorithm analysis Lecture 6 : 590.03 Fall 1250 If T ε I(Int, Assign), and it results in O after split, Then, T’ ε I(Int, Assign), and it results in O after split Table TTable T’
51
Ace algorithm analysis Lemma 2: The split phase also satisfies transparent L-diversity. Proof (sketch) Let T’ be generated by “swapping diseases” in some bucket. If T ε I(Int, Assign), and it results in O after split, Then, T’ ε I(Int, Assign), and it results in O after split. For any individual it is equally likely that sensitive value is one of ≥L choices. Therefore, P[individual has disease | I(O, Ace)] < 1/L Lecture 6 : 590.03 Fall 1251
52
Summary Many systems assume privacy/security is guaranteed by assuming the adversary does not know the algorithm. – This is bad … Simulatable algorithms avoid this problem – Ideally choices made by the algorithm should be simulatable by the adversary. Anonymization algorithms are also susceptible to adversaries who know the algorithm or the objective function. Transparent anonymization limits the inference an attacker (who knows the algorithm) can make about sensitive values. Lecture 6 : 590.03 Fall 1252
53
Next Class Composition of privacy Differential Privacy Lecture 6 : 590.03 Fall 1253
54
References A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam, “L-Diversity: Privacy beyond k-anonymity”, ICDE 2006 K. Kenthapadi, N. Mishra, K. Nissim, “Simulatable Auditing”, PODS 2005 R. Wong, A. Fu, K. Wang, J. Pei, “Minimality attack in privacy preserving data publishing”, PVLDB 2007 X. Xiao, Y. Tao & N. Koudas, “Transparent Anonymization: Thwarting adversaries who know the algorithm”, TODS 2010 Lecture 6 : 590.03 Fall 1254
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.