Download presentation
Presentation is loading. Please wait.
Published byDella Blankenship Modified over 9 years ago
1
Foundations of Privacy Lecture 5 Lecturer: Moni Naor
2
Desirable Properties from a sanitization mechanism Composability –Applying the sanitization several time yields a graceful degradation –Will see: t releases, each -DP, are t ¢ - DP –Next class: (√t +t 2, )- DP (roughly) Robustness to side information –No need to specify exactly what the adversary knows: –knows everything except one row Differential Privacy: satisfies both…
3
Differential Privacy Protect individual participants: Curator/ Sanitizer M Curator/ Sanitizer M + Dwork, McSherry Nissim & Smith 2006 D2D2 D1D1
4
Adjacency: D+I and D-I Differential Privacy Protect individual participants: Probability of every bad event - or any event - increases only by small multiplicative factor when I enter the DB. May as well participate in DB… ε -differentially private sanitizer M For all DBs D, all individuals I and all events T Pr A [M(D+I) 2 T] Pr A [M(D-I) 2 T] ≤ e ε ≈ 1+ ε e-ε ≤e-ε ≤ Handles aux input
5
Differential Privacy (Bad) Responses: ZZZ Pr [response] ratio bounded Sanitizer M gives - differential privacy if: for all adjacent D 1 and D 2, and all A µ range(M): Pr[ M (D 1 ) 2 A] ≤ e Pr[ M (D 2 ) 2 A] Participation in the data set poses no additional risk Differing in one user
6
Example of Differential Privacy X is a set of (name,tag 2 {0,1}) tuples One query: #of participants with tag=1 Sanitizer : output #of 1’s + noise noise from Laplace distribution with parameter 1/ε Pr[noise = k-1] ≈ e ε Pr[noise=k] 0 12345-2-3-4 0 1234 5 -2-3-4
7
( , ) - Differential Privacy Bad Responses: Z Z Z Pr [response] ratio bounded This course : negligible Sanitizer M gives ( , ) - differential privacy if: for all adjacent D 1 and D 2, and all A µ range(M): Pr[ M (D 1 ) 2 A] ≤ e Pr[ M (D 2 ) 2 A] +
8
Example: NO Differential Privacy U set of (name,tag 2 {0,1}) tuples One counting query: #of participants with tag=1 Sanitizer A : choose and release a few random tags Bad event T : Only my tag is 1, my tag released Pr A [A(D+Me) 2 T] ≥ 1/n Pr A [A(D-Me) 2 T] = 0 Pr A [A(D+Me) 2 T] Pr A [A(D-Me) 2 T] ≤ e ε ≈ 1+ ε e-ε ≤e-ε ≤ Not ε diff private for any ε ! It is (0,1/n) Differential Private
9
Counting Queries Counting-queries U Q is a set of predicates q: U {0,1} Query : how many x participants satisfy q? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: some error anyway inherent in statistical analysis U Database x of size n Query q n individuals, each contributing U a single point in U Sometimes talk about fraction
10
Bound on Achievable Privacy Want to get bounds on the Accuracy –The responses from the mechanism to all queries are assured to be within α except with probability Number of queries t for which we can receive accurate answers The privacy parameter ε for which ε differential privacy is achievable –Or ( ε, ) differential privacy is achievable
11
Blatant Non Privacy Mechanism M is Blatantly Non-Private if there is an adversary A that On any database D of size n can select queries and use the responses M(D) to reconstruct D’ such that ||D-D’|| 1 2 o(n) D’ agrees with D in all but o(n) of the entries. Claim : Blatant non privacy implies that M is not ( , ) -DP for any constant
12
Sanitization Can’t be Too Accurate Usual counting queries –Query: q µ [n] – i 2 q d i Response = Answer + noise Blatant Non-Privacy: Adversary Guesses 99% bits Theorem : If all responses are within o(n) of the true answer, then the algorithm is blatantly non-private. But: require exponential # of queries. 12
13
Proof: Exponential Adversary Focus on Column Containing Super Private Bit Assume all answers are within error bound . 13 “ The database ” Vector d 2 {0,1} n 0 1 1 1 1 0 0 Will show that cannot be o(n)
14
Proof: Exponential Adversary for Blatant Non Privacy Estimate # 1 ’s in all possible sets – 8 S µ [n] : | M (S) – i 2 S d i | ≤ Weed Out “Distant” DBs –For each possible candidate database c 2 {0,1} n : If for any S µ [n] : | i 2 S c i – M (S)| > , then rule out c. –If c not ruled out, halt and output c Claim : Real database d won’t be ruled out 14 M (S): answer on S
15
Proof: Exponential Adversary Assume : 8 S µ [n] : |M(S) – i 2 S d i | ≤ Claim : For c that has not been ruled out Hamming distance (c,d) ≤ 2 0 1 1 S0S0 S1S1 d c 1 0 0 1 1 0 1 ≤ 4 |M(S 0 ) - i 2 S 0 c i | ≤ ( c not ruled out) |M(S 1 ) - i 2 S 1 c i | ≤ ( c not ruled out) ≤ 2
16
Impossibility of Exponential Queries The result means that we cannot sanitize the data and publish a data structure so that for all queries the answer can be deduced correctly to within 2 o(n) query 1, query 2,... Database answer 1 answer 3 answer 2 ? Sanitizer On the other hand: we will see that we can get accuracy up to log |Q|
17
What can we do efficiently ? Allowed “too” much power to the adversary Number of queries: exponential Computation: exponential On the other hand: lack of wild errors in the responses Theorem : For any sanitization algorithm: If all responses are within o(√n) of the true answer, then it is blatantly non-private even against a polynomial time adversary making O(n log 2 n) random queries.
18
The Model As before: database d is a bit string of length n. Counting queries : –A query is a subset q µ {1, …, n} –The (exact) answer is a q = i 2 q d i -perturbation –for an answer: a q ± Slide 18
19
What If We Had Exact Answers? Consider a mechanism 0 -perturbations –Receive the exact answer a q = i 2 q d i Then with n linearly independent queries – over the reals we could reconstruct d precisely: Obtain n linearly equations a q = i 2 q c i and solve uniquely When we have -perturbations : get an inequality a j - ≤ i 2 q c i ≤ a j + Idea: use linear programming A solution must exist: d itself
20
Privacy requires Ω(√n) perturbation Consider a database with o(√n) perturbation Adversary makes t = n log 2 n random queries q j, getting noisy answers a j Privacy violating Algorithm : Construct database c = {c i } 1 ≤ i ≤ n by solving Linear Program: 0 ≤ c i ≤ 1 for 1 ≤ i ≤ n a j - ≤ i 2 q c i ≤ a j + for 1 ≤ j ≤ t Round the solution: – if c i > 1/2 set to 1 and to 0 otherwise A solution must exist: d itself For every query q j : its answer according to c is at most 2 far from its (real) answer in d.
21
Bad solutions to LP do not survive A query q disqualifies a potential database c 2 [0,1] n if its answer on q is more than 2 far answer in d : | i 2 q c i - i 2 q d i | > 2 Idea: show that for a database c that is far away from d a random query disqualifies c with some constant probability Want to use the Union Bound : all far away solutions are disqualified w.p. at least 1 – n n (1 - ) t = 1–neg(n) How do we limit the solution space? Round each value to closest 1/n
22
Privacy requires Ω(√n) perturbation A query q disqualifies a potential database c 2 [0,1] n if its answer on q is more than 2 far answer in d : Lemma : if c is far away from d, then a random query disqualifies c with some constant probability If Prob i 2 [n] [|d i -c i | ¸ 1/3] > , then there is a >0 such that Prob q 2 {0,1} [n] [| i 2 q (c i – d i )| ¸ 2 +1] > Proof uses Azuma’s inequality
23
Privacy requires Ω(√n) perturbation Can discretize all potential databases c 2 [0,1] n : Suppose we round each entry c i to closest fraction with denominator n: |c i – w i /n| · 1/n The response on q change by at most 1. If we disqualify all `discrete’ databases then we also effectively eliminate all c 2 [0,1] n There are n n `discrete’ databases
24
Privacy requires Ω(√n) perturbation A query q disqualifies a potential database c 2 [0,1] n if its answer on q is more than 2 far answer in d : Claim :if c is far away from d, then a random query disqualifies c with some constant probability Therefore: t = n log 2 n queries leave a negligible probability for each far away reconstruction. Union bound : all far away suggestions are disqualified w.p. at least 1 – n n (1 - ) t = 1 – neg(n) Can apply union bound by discretization Count number of entries far from d
25
Review and Conclusion When the perturbation is o(√n), choosing Õ(n) random queries gives enough information to efficiently reconstruct an o(n) -close db. Database reconstructed using Linear programming – polynomial time. Slide 25 o(√n) databases are Blatantly Non-Private. poly(n) time reconstructable
26
Composition Suppose we are going to apply a DP mechanism t times. –Perhaps on different databases Want to argue that result is differentially private A value b 2 {0,1} is chosen In each of the t rounds adversary A picks two adjacent databases D 0 i and D 1 i and receives result z i of an - DP mechanism M i on D b i Want to argue A ‘s view is within for both values of b A ‘s view: (z 1, z 2, …, z t ) plus randomness used.
27
Differential Privacy: Composition Handles auxiliary information Composes naturally A 1 (D) is ε 1 -diffP for all z 1, A 2 (D,z 1 ) is ε 2 -diffP, Then A 2 (D,A 1 (D)) is (ε 1 +ε 2 ) -diffP Proof: for all adjacent D, D’ and (z 1,z 2 ) : e -ε 1 ≤ P[z 1 ] / P’[z 1 ] ≤ e ε 1 e -ε 2 ≤ P[z 2 ] / P’[z 2 ] ≤ e ε 2 e -(ε 1 +ε 2 ) ≤ P[(z 1,z 2 )]/P’[(z 1,z 2 )] ≤ e ε 1 +ε 2 P[z 1 ] = Pr z~A 1 (D) [z=z 1 ] P’[z 1 ] = Pr z~A 1 (D’) [z=z 1 ] P[z 2 ] = Pr z~A 2 (D,z 1 ) [z=z 2 ] P’[z 2 ] = Pr z~A 2 (D’,z 1 ) [z=z 2 ]
28
Differential Privacy: Composition If all mechanisms M i are -DP, then for any view the probability that A gets the view when b=0 and when b=1 are with e t Therefore results for a single query translate to results on several queries
29
Answering a single counting query U set of (name,tag 2 {0,1}) tuples One counting query : #of participants with tag=1 Sanitizer A: output #of 1’s + noise Differentially private! If choose noise properly Choose noise from Laplace distribution
30
0 12345-2-3-4 Laplacian Noise Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e -|y|/b Standard deviation: O( b) Take b=1/ε, get that Pr[Y=y] Ç e - |y|
31
Laplacian Noise: ε- Privacy Take b=1/ε, get that Pr[Y=y] Ç e - |y| Release: q(D) + Lap(1/ε) For adjacent D, D’ : |q(D) – q(D’)| ≤ 1 For output a : e - ≤ Pr by D [a]/Pr by D’ [a] ≤ e 0 12345-2-3-4
32
Laplacian Noise: ε- Privacy Theorem: the Laplace mechanism with parameter b=1/ is -differential private 0 12345-2-3-4
33
0 12345-2-3-4 Laplacian Noise: Õ(1/ε)- Error Take b=1/ε, get that Pr[Y=y] Ç e - |y| Concentration of the Laplace distribution: Pr y~Y [|y| > k·1/ε] = O(e -k ) Setting k=O(log n) Expected error is 1/ε, w.h.p error is Õ(1/ε)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.