Download presentation
Presentation is loading. Please wait.
1
Foundations of Privacy Lecture 11 Lecturer: Moni Naor
2
Recap of recent lecture Continual changing data –Counters –How to combine expert advice –Multi-counter and the list update problem Pan Privacy General Transformation to continual output
3
The Dynamic Privacy Zoo Differentially Private Outputs Privacy under Continual Observation Pan Privacy User level Privacy Continual Pan Privacy Petting Sketch vs. Stream
4
Sanitization Can’t be Too Accurate Usual counting queries –Query: q µ [n] – i 2 q d i Response = Answer + noise Blatant Non-Privacy: Adversary Guesses 99% bits Theorem : If all responses are within o(n) of the true answer, then the algorithm is blatantly non-private. But: require exponential # of queries. 4
5
Proof: Exponential Adversary Focus on Column Containing Super Private Bit Assume all answers are within error bound . 5 “ The database ” d 0 1 1 1 1 0 0 Will show that cannot be o(n)
6
Proof: Exponential Adversary Estimate # 1 ’s in all possible sets – 8 S µ [n] : | K (S) – i 2 S d i | ≤ Weed Out “Distant” DBs –For each possible candidate database c : If for any S µ [n] : | i 2 S c i – K (S)| > , then rule out c. –If c not ruled out, halt and output c Claim : Real database d won’t be ruled out 6 K (S) real answer on S
7
Proof: Exponential Adversary Suppose: 8 S µ [n] : |K(S) – i 2 S d i | ≤ Claim : For c that has not been ruled out Hamming distance (c,d) ≤ 2 0 1 1 S0S0 S1S1 d c 0 0 0 1 0 1 1 ≤ 2 | K(S 0 ) - i 2 S 0 c i | ≤ ( c not ruled out) |K(S 1 ) - i 2 S 1 c i | ≤ ( c not ruled out)
8
Contradiction? We have seen algorithms that allow answer each query with accuracy o(n) – O(√n) and O(n 2/3 ) Why is there no contradiction with current results
9
What can we do efficiently ? Allowed “too” much power to the adversary Number of queries Computation On the other hand: lack of wild errors in the responses Theorem : For any sanitization algorithm: If all responses are within o(√n) of the true answer, then it is blatantly non-private even against a polynomial time adversary making O(n log 2 n) random queries. Show the adversary
10
The model As before: database d is a bit string of length n. Users query for subset sums : –A query is a subset q µ {1, …, n} –The (exact) answer is a q = i 2 q d i -perturbation –for an answer: a q ± Slide 10
11
Privacy requires Ω(√n) perturbation Consider a database with o(√n) perturbation Adversary makes t = n log 2 n random queries q j, getting noisy answers a j Privacy violating Algorithm : Construct database c = {c i } 1 ≤ i ≤ n by solving Linear Program: 0 ≤ c i ≤ 1 for 1 ≤ i ≤ n a j - ≤ i 2 q c i ≤ a j + for 1 ≤ j ≤ t Round the solution: – if c i > 1/2 set to 1 and to 0 otherwise A solution must exist: d itself For every query q j : its answer according to c is at most 2 far from its (real) answer in d.
12
Bad solutions to LP do not survive A query disqualifies a potential database c if its answer for the query is more than 2 + 1 far from its real answer in d. Idea: show that for a database c that is far away from d a random query disqualifies c with some constant probability Want to use the Union Bound : all far away solutions are disqualified w.p. at least 1 – n n (1 - ) t = 1–neg(n) How do we limit the solution space? Round each one value to closest 1/n
13
Privacy requires Ω(√n) perturbation A query disqualifies a potential database c if its answer for the query is more than 2 + 1 far from its real answer in d. Claim : a random query disqualifies far away from d database c with some constant probability Therefore: t = n log 2 n queries leave a negligible probability for each far reconstruction. Union bound : all far away suggestions are disqualified w.p. at least 1 – n n (1 - ) t = 1 – neg(n) Can apply union bound by discretization Count number of entries far from d
14
Review and Conclusion When the perturbation is o(√n), choosing Õ(n) random queries gives enough information to efficiently reconstruct an o(n) -close db. Database reconstructed using Linear programming – polynomial time. Slide 14 o(√n) databases are Blatantly Non-Private. poly(n) time reconstructable
15
Ω(√n) lower bound revisited An attack on a o(√n)- perturbation database with substantially better performance Previous attack uses n log 2 n queries and runs in n 5 log 4 n time (LP) New attack: issues n queries and runs in O(nlog n) time New attack is deterministic –Fixed set of queries for each size –Not necessarily an advantage – must ask certain queries Slide 15
16
The Fourier Attack Treat the database d as a function Z 2 logn → Z 2 Query specific subset sums: from which the Fourier coefficients of the function can be calculated –One for each Fourier coefficient Round reconstructed function’s values to bits When the sums have o(√n) error, so do the coefficients –the reconstruction can be shown to have o(n) error. Fourier transform can be computed in time O(n log n) Slide 16 Key point: linearity of Fourier transform implies small error in coefficients also mean small error in function Vector defines a functi on
17
Fourier Transform The characters of Z 2 k : homomorphisms into {-1,1} There are 2 k characters : one for each a=(a 1, a 2, …, a k ) 2 Z 2 k a (x) = (-1) i=1 a i x i For function f: Z 2 logn → R The Fourier coefficients f( a ) are x a (x) f(x) We have: f(x) = a a (x) f( a ) Æ Æ k H = 2 k x 2 k Hadamard matrix H H = 2 k I f = H f f = 1/2 k H f H a,b = a (b) Æ Æ
18
Parseval’s Identity Relates the absolute values of f to absolute values of Fourier coefficients of f x 2 Z 2 k |f(x)| 2 = 1/2 k a 2 Z 2 k |f( a )| 2 Æ
19
Evaluating Fourier Coefficients with Counting queries Let 0 = x f(x) For a=(a 1, a 2, …, a k ) let S a = {x| =0 mod 2} f( a ) = 2 x 2 S a f(x) - 0 Approximation of counting query on S a yields approximation of f( a ) with related term f = 1/2 k H f => 1/2 k H (f + e) = f + 1/2 k He |S a |= 2 k-1 Æ Æ Æ e : error vector of Fourier co. Æ e=(e 1, e 2, …, e n )
20
f = 1/2 k H f => 1/2 k H (f + e) = f + 1/2 k He If 1/2 k He has (n) entries which are ¸ ½ Then by Parseval’s: 1/2 k a 2 Z 2 k |e a | 2 is (n) Hence: at least one |e a | is (√n) ÆÆ n e : error vector of Fourier co. e=(e 1, e 2, …, e n ) x 2 Z 2 k |f(x)| 2 = 1/2 k a 2 Z 2 k |f( a )| 2 Contradicting assumption on accuracy
21
Changing the Model: weighted counting Previous attacks: assume all queries are within some small perturbation New model: To up to ½- of the queries unbounded noise is added To the rest “small” noise bounded Stronger query model : subset sums are weighted with weights 0...p-1 for Slide 21 Cannot “hide” single bits: all the weight might be there some prime p = Ω(1/ 2 + / ) Want some randomness of queries – otherwise repetition
22
Interpolation attack Treat database as linear form of n variables over Z p Treat a query q = (q 1, …, q n ) as the evaluation of the form at a point f(q 1, …, q n ) = Σ i=1..n d i q i mod p –An answer to query q =((p-1)/2, 0, …, 0) that is within (p-1)/4 error tells us the first db bit –Similarly to all other bits No point in asking the query directly: these useful queries might have unbounded noise Need to deduce (approximate) answer to q from other queries Slide 22 By dropping info
23
Interpolation attack - implementation Want to evaluate a specific query q with small error Pick a random degree-2 curve that passes through q and issue queries for the p points on the curve Key issue: points on curve are pairwise independent Therefore: for sufficiently many queries, with high probability interpolation gives a correct (up to small noise) answer for q Can try exhaustively all degree 2 polynomials Slide 23 Similar to Reed Muller decoding
24
Interpolation attack … Interpolation implemented by searching all p 3 degree 2 polynomials for one which is -close at ½- of the entries polynomial –restrictions of a deg-2 curve to a linear form is a deg-2 polynomial Any two such polynomials must be 2 -close, due to low degree Hence the accuracy of the reconstructed answer is 2 . For (p-1)/4 > 2 : can figure out any specific database bit with high probability Slide 24 To query
25
Interpolation Attack: evaluating a query accurately DB: f(q 1, …, q n ) = Σ i=1..n d i q i (Z p n → Z p ) Pick a curve: for two random points u 1, u 2 in Z p n : c(t) = q + u 1 t + u 2 t 2 (Z p → Z p n ) Restriction of f to c : f| c (t) = f(c(t)) this is a degree-2 polynomial ( Z p → Z p ) Query all p points of c to get evaluations of f| c –answers are inaccurate Interpolate to find f| c up to a small error Evaluate f| c (0) = f(q) accurately Slide 25
26
Interpolation attack - performance Time for finding any specific bit: O(p 4 )=O( -8 ) Independent of db size n ? (querying time? |q| = Θ( n )) –Can be used with very large databases if interesting part is small Time to construct whole db with small error: O(n) with pn queries (or O( n 2 )) Slide 26
27
Summary Ω(√ n ) perturbation lower bound revisited – simple and efficient attack When queries allow sufficiently large weights, an adversary can: –Handle unbounded noise on large portion of the queries –Find out private data in time independent of size of DB Slide 27
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.