Foundations of Privacy Lecture 3 Lecturer: Moni Naor.

Foundations of Privacy Lecture 3 Lecturer: Moni Naor

Recap of last week’s lecture The Simulation Paradigm for Defining and Proving Security of Cryptographic Protocols The Basic Impossibility of Disclosure Prevention: –cannot hope to obtain results that are based on all possible auxiliary information Differential Privacy –For all adjacent databases – output probability is very close Extractors and Fuzzy Extractors

Desirable Properties from a sanitization mechanism Composability –Applying the sanitization several time yields a graceful degradation –q releases, each  -DP, are q ¢  -DP Robustness to side information –No need to specify exactly what the adversary knows Differential Privacy: satisfies both…

Adjacency: D+Me and D-Me Differential Privacy Protect individual participants: Probability of every bad event - or any event - increases only by small multiplicative factor when I enter the DB. May as well participate in DB… ε -differentially private sanitizer A For all DBs D, all Me and all events T Pr A [A(D+Me) 2 T] Pr A [A(D-Me) 2 T] ≤ e ε ≈ 1+ ε e-ε ≤e-ε ≤ Handles aux input Dwork, McSherry, Nissim and Smith

5 Differential Privacy Bad Responses : XXX Pr [response] ratio bounded A gives  -  differential privacy if for all neighboring D 1 and D 2, and all T µ range( A ): Pr[ A ( D 1 ) 2 T ] ≤ e  Pr[ A ( D 2 ) 2 T ] Neutralizes all linkage attacks. Composes unconditionally and automatically: Σ i  i

Differential Privacy: Important Properties Handles auxiliary information Composes naturally A 1 (D) is ε 1 -diffP for all z 1, A 2 (D,z 1 ) is ε 2 -diffP, Then A 2 (D,A 1 (D)) is (ε 1 +ε 2 ) -diffP Proof: for all adjacent D, D’ and (z 1,z 2 ) : e -ε 1 ≤ P[z 1 ] / P’[z 1 ] ≤ e ε 1 e -ε 2 ≤ P[z 2 ] / P’[z 2 ] ≤ e ε 2 e -(ε 1 +ε 2 ) ≤ P[(z 1,z 2 )]/P’[(z 1,z 2 )] ≤ e ε 1 +ε 2 P[z 1 ] = Pr z~A 1 (D) [z=z 1 ] P’[z 1 ] = Pr z~A 1 (D’) [z=z 1 ] P[z 2 ] = Pr z~A 2 (D,z 1 ) [z=z 2 ] P’[z 2 ] = Pr z~A 2 (D’,z 1 ) [z=z 2 ]

Example: NO Differential Privacy U set of (name,tag 2 {0,1}) tuples One counting query: #of participants with tag=1 Sanitizer A : choose and release a few random tags Bad event T : Only my tag is 1, my tag released Pr A [A(D+Me) 2 T] ≥ 1/n Pr A [A(D-Me) 2 T] = 0 Not diff private for any ε ! Pr A [A(D+Me) 2 T] Pr A [A(D-Me) 2 T] ≤ e ε ≈ 1+ ε e-ε ≤e-ε ≤

Size of ε How small can ε be? Cannot be negligible Why? Hybrid argument How large can it be? Think of a small constant D, D’ – totally unrelated databases Utility should be very different Consider sequence D 0 =D, D 1, D 2, …, D n =D’ where D i and D i+1 adjacent db. For each output set T Prob[T|D] ¸ Prob[T|D’] ¢ e εn

Answering a single counting query U set of (name,tag 2 {0,1}) tuples One counting query : #of participants with tag=1 Sanitizer A: output #of 1’s + noise Differentially private! If choose noise properly Choose noise from Laplace distribution

0 12345-2-3-4 Laplacian Noise Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e -|y|/b Standard deviation: O( b) Take b=1/ε, get that Pr[Y=y] Ç e -  |y|

Laplacian Noise: ε- Privacy Take b=1/ε, get that Pr[Y=y] Ç e -  |y| Release: q(D) + Lap(1/ε) For adjacent D, D’ : |q(D) – q(D’)| ≤ 1 For output a : e -  ≤ Pr by D [a]/Pr by D’ [a] ≤ e  0 12345-2-3-4

0 12345-2-3-4 Laplacian Noise: Õ(1/ε)- Error Take b=1/ε, get that Pr[Y=y] Ç e -  |y| Pr y~Y [|y| > k·1/ε] = O(e -k ) Expected error is 1/ε, w.h.p error is Õ(1/ε)

Randomized Response Randomized Response Technique [Warner 1965] –Method for polling stigmatizing questions –Idea: Lie with known probability. Specific answers are deniable Aggregate results are still valid The data is never stored “in the plain” 1 noise + 0 + 1 + … “trust no-one” Popular in DB literature

Randomized Response with Laplacian Noise Initial idea: each user i, on input x i 2 {0, 1} Add to x i independent Laplace noise with magnitude 1/ε Privacy: since each increment protected by Laplace noise – differentially private whether x i is 0 or 1 Accuracy: noise cancels out, error Õ(√ T ) Is it too high? T – total number of users 0 12345-2-3-4

Scaling Noise to Sensitivity Global sensitivity of query q:U n → R GS q = max D,D’ |q(D) – q(D’)| For a counting query q : GS q =1 Previous argument generalizes: For any query q:U n → R release q(D) + Lap(GS q /ε) ε-private error Õ(GS q /ε) [0,n]

Scaling Noise to Sensitivity Many dimensions Global sensitivity of query q:U n → R d GS q = max D,D’ ||q(D) – q(D’)|| 1 Previous argument generalizes: For any query q:U n → R d release q(D) + (Y 1, Y 2, … Y d ) –Each Y i independent Lap(GS q /ε) ε-private error Õ(GS q /ε)

Example: Histograms Say x 1, x 2,..., x n in domain U Partition U into d disjoint bins q(x 1, x 2,..., x n ) = (n 1, n 2,..., n d ) where n j = #{i : x i in j-th bin} GS q =2 Sufficient to add Lap(2/ε) noise to each count Problem: might not look like a histogram

Covariance Matrix Suppose each person’s data is a real vector (r 1, r 2,..., r n ) Database is a matrix X The covariance matrix of X is (roughly) the matrix Entries measure correlation between attributes First step of many analyses, e.g. PCA

Distance to DP with Property Suppose P = set of “good” databases –well-clustered databases Distance to P = # points in x that must be changed to put x in P Always has GS = 1 Example: –Distance to data set with “good clustering” P x

K Means A clustering algorithm with iteration Always keeping k centers

Median Median of x 1, x 2,..., x n 2 [0,1] X= 0,…,0,0,1,…,1 X’= 0,…,0,1,1,…,1 median(X) = 0 median(X’) = 1 GSmedian = 1 Noise magnitude: 1. Too much noise! But for “ most” neighbor databases X, X’ |median(X) − median(X’)| is small. Can we add less noise on ”good” instances? (n-1)/2

Global Sensitivity vs. Local sensitivity Global sensitivity is worst case over inputs Local sensitivity of query q at point D LS q (D)= max D’ |q(D) – q(D’)| Reminder: GS q (D) = max D LS q (D) Goal: add less noise when local sensitivity is lower Problem: can leak information by amount of noise

Local sensitivity of Median For X = x 1, x 2,..., x n LSmedian( X ) = max(x m − x m−1, x m+1 − x m ) x 1, x 2,..., x m-1, x m, x m+1,..., x n

Sensitivity of Local Sensitivity of Median Median of x 1, x 2,..., x n 2 [0,1] X= 0,…,0,0,0,0,1,…,1 X’= 0,…,0,0,0,1,1,…,1 LS(X) = 0 LS(X’) = 1 Noise magnitude must be an insensitive function! (n-3)/2

Smooth Upper Bound Compute a “smoothed” version of local sensitivity Design sensitivity function S(X) S(X) is an  -smooth upper bound on LS f (X) if: – for all x: S(X) ¸ LS f (X) – for all neighbors X, X’ : S(X) · e  S(X’) Theorem: if A(x) = f(x) + noise(S(x)/ε) then A is 2ε-differentially private.

Smooth sensitivity S f *(X)= max Y {LS f (Y) e -  dist(x,y) } Claim: if S(X) is an  -smooth upper bound on LS f (X) for Smooth sensitivity

The Exponential Mechanism McSherry Talwar A general mechanism that yields Differential privacy May yield utility/approximation Is defined (and evaluated) by considering all possible answers The definition does not yield an efficient way of evaluating it Application: Approximate truthfulness of auctions Collusion resistance Compatibility

Example of the Exponential Mechanism Data: x i = website visited by student i today Range: Y = {website names} For each name y, let q(y; X) = #{i : x i = y} Goal: output the most frequently visited site Procedure: Given X, Output website y with probability prop to e  q(y,X) Popular sites exponentially more likely than rare ones Website scores don’t change too quickly70

Projects Report on a paper Apply a notion studied to some known domain Checking the state of privacy is some setting Privacy in GWAS Privacy in crowd sourcing Privacy Preserving Wordle Unique identification bounds How much worse are differential privacy guarantees in estimation Contextual Privacy

Planned Topics Privacy of Data Analysis Differential Privacy – Definition and Properties – Statistical databases – Dynamic data Privacy of learning algorithms Privacy of genomic data Interaction with cryptography SFE Voting Entropic Security Data Structures Everlasting Security Privacy Enhancing Tech. – Mixed nets

Course Information Foundation of Privacy - Spring 2010 Instructor: Moni Naor When: Mondays, 11:00--13:00 (2 points) Where: Ziskind 1 Course web page : www.wisdom.weizmann.ac.il/~naor/COURSE/foundations_of_privacy.html Prerequisites: familiarity with algorithms, data structures, probability theory, and linear algebra, at an undergraduate level; a basic course in computability is assumed. Requirements: – Participation in discussion in class Best: read the papers ahead of time – Homework : There will be several homework assignments Homework assignments should be turned in on time (usually two weeks after they are given)! – Class Project and presentation – Exam : none planned Office: Ziskind 248 Phone: 3701 E-mail: moni.naor@

Foundations of Privacy Lecture 3 Lecturer: Moni Naor.

Similar presentations

Presentation on theme: "Foundations of Privacy Lecture 3 Lecturer: Moni Naor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Foundations of Privacy Lecture 3 Lecturer: Moni Naor.

Similar presentations

Presentation on theme: "Foundations of Privacy Lecture 3 Lecturer: Moni Naor."— Presentation transcript:

Similar presentations

About project

Feedback