Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A AA
Compressed Sensing: If x 2 R N is k -sparse Take M ~ Ck log N / k random Gaussian measurements Then L 1 minimization recovers x. For what k does this make sense (i.e M < N )? How small can C be?
Privacy motivation Coding setting Results Proof Sketch
Database of information about individuals E.g. Medical history, Census data, Customer info. Need to guarantee confidentiality of individual entries Want to make deductions about the database; learn large scale trends. E.g. Learn that drug V increases likelihood of heart disease Do not leak info about individual patients Curator Analyst
Simple Model (easily justifiable) Database: n -bit binary vector x Query: vector a True answer: Dot product ax Response is ax + e = True Answer + Noise Blatant Non-Privacy: Attacker learns n−o ( n ) bits of x. Theorem: If all responses are within o ( √n ) of the true answer, then the algorithm is blatantly non-private even against a polynomial time adversary asking O ( n log 2 n ) random questions.
Privacy has a Price There is no safe way to avoid increasing the noise as the number of queries increases Applies to Non-Interactive Setting Any non-interactive solution permitting answers that are “too accurate” to “too many” questions is vulnerable to the DiNi attack. This work : what if most responses have small error, but some can be arbitrarily off?
Real vector x 2 R n Matrix A 2 R m x n with i.i.d. Gaussian entries Transmit codeword Ax 2 R m Channel corrupts message. Receive y = Ax + e Decoder must reconstruct x, assuming e has small support small support: at most m entries of e are non-zero. Channel EncoderDecoder
min support( e ' ) such that y = Ax ' + e ' x ' 2 R n solving this would give the original message x. min | e ' | 1 such that y = Ax ' + e ' x ' 2 R n this is a linear program; solvable in poly time.
Theorem [Donoho/ Candes-Rudelson-Tao-Vershynin] For an error rate < 1 / 2000, LP decoding succeeds in recovering x (for m = 4n ). This talk: How large an error rate can LP decoding tolerate?
Let * = … Theorem 1: For any < *, there exists c such that if A has i.i.d. Gaussian entries, and if A has m = cn rows For k= m, every support k vector e k satisfies| e – e k | < then LP decoding reconstructs x’ where | x ’ -x | 2 is O( ∕ √ n). Theorem 2: For any > *, LP decoding can be made to fail, even if m grows arbitrarily.
In the privacy setting: Suppose, for < *, the curator answers (1- ) fraction of questions within error o( √ n) answers fraction of the questions arbitrarily. Then the curator is blatantly non-private. Theorem 3: Similar LP decoding results hold when the entries of A are randomly chosen from § 1. Attack works in non-interactive setting as well. Also leads to error correcting codes over finite alphabets.
Theorem 1: For any < *, there exists c such that if B has i.i.d. Gaussian entries, and if B has M = (1 – c) N rows For k= m, for any vector x 2 R N then given Ax, LP decoding reconstructs x’ where
Let * = … Theorem 1 ( =0 ): For any < *, there exists c such that if A has i.i.d. Gaussian entries with m=cn rows, and if the error vector e has support at most m, then LP decoding accurately reconstructs x. Proof sketch…
LP decoding is scale and translation invariant Thus, without loss of generality, transmit x = 0 Thus receive y = Ax+e = e If reconstruct z 0, then | z | 2 = 1 Call such a z bad for A. Ax Ax ’ y
Proof: Any fixed z is very unlikely to be bad for A : Pr[z bad] · exp(-cm) Net argument to extend to R n : Pr[ 9 bad z] · exp(-c ’ m) Thus, with high probability, A is such that LP decoding never fails.
z bad: | Az – e | 1 < | A0 – e | 1 ) | Az – e | 1 < | e | 1 Let e have support T. Without loss of generality, e | T = Az | T Thusz bad: | Az | T c < | Az | T ) | Az | T > ½| Az | 1 e1e2e3....eme1e2e3....em a1za2za3z....amza1za2za3z....amz T 0y=eAz 0 TcTc
A i.i.d. Gaussian ) Each entry of Az is an i.i.d. Gaussian Let W = Az; its entries W 1, … W m are i.i.d. Gaussians z bad ) i 2 T | W i | > ½ i | W i | Recall: | T | · m Define S (W) to be sum of magnitudes of the top fraction of entries of W Thus z bad ) S (W) > ½ S 1 (W) Few Gaussians with a lot of mass! 0 T
Let us look at E[S ] Let w * be such that Let * = Pr[ | W | ¸ w * ] Then E[S * ] = ½ E[S 1 ] Moreover, for any < *, E[S ] · ( ½ – ) E[S 1 ] E[S * ] = ½ E[S 1 ] E[S ] w*w*
S depends on many independent Gaussians. Gaussian Isoperimetric inequality implies: With high probability, S (W) close to E[S ]. S 1 similarly concentrated. Thus Pr[z is bad] · exp(-cm) E[S * ] = ½ E[S 1 ] E[S ]
1) Any fixed z is very unlikely to be bad for A Pr[z bad] · exp(-cm) 2) Union bound over a dense net of unit ball in R n (size of net e xp(c ’ n) ) Pr[ 9 bad z in net] · exp(-c ’’ m) 3) A continuity-type argument to show that no z is bad.
Now E[S ] > ( ½ + ) E[S 1 ] Similar measure concentration argument shows that any z is bad with high probability. Thus LP decoding fails w.h.p. beyond * Donoho/CRTV experiments used random error model. E[S * ] = ½ E[S 1 ] E[S ]
Compressed Sensing: If x 2 R N is k -sparse Take M ~ Ck log N / k random Gaussian measurements Then L 1 minimization recovers x. For what k does this make sense (i.e M < N )? How small can C be? k < * N ≈ N C > ( * log 1 / * ) –1 ≈ 2.02
Tight threshold for Gaussian LP decoding To preserve privacy: lots of error in lots of answers. Similar results hold for +1/-1 queries. Inefficient attacks can go much further: Correct (½ - ) fraction of wild errors. Correct ( 1- ) fraction of wild errors in the list decoding sense. Efficient Versions of these attacks? Dwork-Yekhanin: (½ - ) using AG codes.
Formally Database : a vector d D N Mechanism: M : D N → R Evaluating M ( d ) should not reveal specific info about tuples in d Curator Analyst
When is small: For d, d ' D N differing on one input, and any S R Pr [ M ( d ) S ] (1 ± ) × Pr [ M ( d ' ) S ] Probabilities taken over coins flipped by curator Independent of other sources of data, databases, or even knowledge of every other input in d. “Anything, good or bad, is essentially equally likely to occur, whether I join the database or not.” Generalizes to groups of respondents Although, if group is large, then outcomes should differ.
Dalenius’ Goal: “Anything that can be learned about a respondent, given access to the statistical database, can be learned without access” is Provably Unachievable. Sam the smoker tries to buy medical insurance Statistical DB teaches smoking causes cancer Sam harmed: high premiums for medical insurance Sam need not be in the database! Differential Privacy guarantees that risk to Sam will not noticably increase if he enters the DB DBs have intrinsic social value
No perceptible risk is incurred by joining data set Anything adversary can do to Sam, it could do even if his data not in data set Bad r’s: XXX Pr [r]
Suppose analyst is interested in a counting query f ( d ) = i P [ d i ] for some predicate P Example: P = 1 iff d i smokes and has cancer Curator adds noise scaled symmetric noise ~ Lap(s) with s = / 0 s2s3s4s5s-s-2s-3s-4s 0 p(x) exp(-|x|/s)
Suppose analyst is interested in a counting query f ( d ) = i P [ d i ] for some predicate P Example: P = 1 iff d i smokes and has cancer Curator adds noise scaled symmetric noise ~ Lap(s) with s = / 0 s2s3s4s5s-s-2s-3s-4s 0 p(x) exp(-|x|/s) p(x) exp(-|x-1|/s)
For a general query f : D N → R k Let ∆ f = max d, d ´ :| d - d ´ |= 1 | f ( d ) - f ( d ´)| 1 Example: f histogram has sensitivity ∆ f = 1 Curator adds noise symmetric multidimensional noise ~ Lap(s) k with s = ∆ f / Theorem: This gives -differential privacy. 0 s2s3s4s5s-s-2s-3s-4s 0
This allows fairly accurate reporting of insensitive functions When asking e.g. independent counting questions, noise grows linearly in the number of questions. Lots of algorithms/analyses can be written/rephrased so as to use a sequence of insensitive questions to the database Means/Variance/Covariances EM algorithm for k-means PCA A set of low-dimensional marginals