Foundations of Privacy Lecture 3 Lecturer: Moni Naor.

Slides:



Advertisements
Similar presentations
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Advertisements

Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
Foundations of Cryptography Lecture 4 Lecturer: Moni Naor.
Foundations of Privacy Lecture 4 Lecturer: Moni Naor.
Foundations of Privacy Lecture 8 Lecturer: Moni Naor.
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
Privacy Enhancing Technologies
Foundations of Privacy Lecture 4 Lecturer: Moni Naor.
Planning under Uncertainty
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
An brief tour of Differential Privacy Avrim Blum Computer Science Dept Your guide:
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Ch. 7 - QuickSort Quick but not Guaranteed. Ch.7 - QuickSort Another Divide-and-Conquer sorting algorithm… As it turns out, MERGESORT and HEAPSORT, although.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Foundations of Privacy Lecture 2 Lecturer: Moni Naor.
Foundations of Privacy Lecture 7 Lecturer: Moni Naor.
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Chapter 11: Limitations of Algorithmic Power
Calibrating Noise to Sensitivity in Private Data Analysis
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Lecture II-2: Probability Review
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
CS573 Data Privacy and Security Statistical Databases
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Slide 1 Differential Privacy Xintao Wu slides (P2-20) from Vitaly Shmatikove, then from Adam Smith.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
1 Sublinear Algorithms Lecture 1 Sofya Raskhodnikova Penn State University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Privacy of profile-based ad targeting Alexander Smal and Ilya Mironov.
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
An Introduction to Differential Privacy and its Applications 1 Ali Bagherzandi Ph.D Candidate University of California at Irvine 1- Most slides in this.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
Privacy of Dynamic Data: Continual Observation and Pan Privacy Weizmann Institute of Science (visiting Princeton) Moni Naor Based on joint papers with:
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Computational Differential Privacy Ilya Mironov (MICROSOFT) Omkant Pandey (UCLA) Omer Reingold (MICROSOFT) Salil Vadhan (HARVARD)
Private Data Management with Verification
Understanding Generalization in Adaptive Data Analysis
Privacy-preserving Release of Statistics: Differential Privacy
Graph Analysis with Node Differential Privacy
Differential Privacy in Practice
Vitaly (the West Coast) Feldman
Foundations of Privacy Lecture 7
Published in: IEEE Transactions on Industrial Informatics
CS639: Data Management for Data Science
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Presentation transcript:

Foundations of Privacy Lecture 3 Lecturer: Moni Naor

Recap of last week’s lecture The Simulation Paradigm for Defining and Proving Security of Cryptographic Protocols The Basic Impossibility of Disclosure Prevention: –cannot hope to obtain results that are based on all possible auxiliary information Differential Privacy –For all adjacent databases – output probability is very close Extractors and Fuzzy Extractors

Desirable Properties from a sanitization mechanism Composability –Applying the sanitization several time yields a graceful degradation –q releases, each  -DP, are q ¢  -DP Robustness to side information –No need to specify exactly what the adversary knows Differential Privacy: satisfies both…

Adjacency: D+Me and D-Me Differential Privacy Protect individual participants: Probability of every bad event - or any event - increases only by small multiplicative factor when I enter the DB. May as well participate in DB… ε -differentially private sanitizer A For all DBs D, all Me and all events T Pr A [A(D+Me) 2 T] Pr A [A(D-Me) 2 T] ≤ e ε ≈ 1+ ε e-ε ≤e-ε ≤ Handles aux input Dwork, McSherry, Nissim and Smith

5 Differential Privacy Bad Responses : XXX Pr [response] ratio bounded A gives  -  differential privacy if for all neighboring D 1 and D 2, and all T µ range( A ): Pr[ A ( D 1 ) 2 T ] ≤ e  Pr[ A ( D 2 ) 2 T ] Neutralizes all linkage attacks. Composes unconditionally and automatically: Σ i  i

Differential Privacy: Important Properties Handles auxiliary information Composes naturally A 1 (D) is ε 1 -diffP for all z 1, A 2 (D,z 1 ) is ε 2 -diffP, Then A 2 (D,A 1 (D)) is (ε 1 +ε 2 ) -diffP Proof: for all adjacent D, D’ and (z 1,z 2 ) : e -ε 1 ≤ P[z 1 ] / P’[z 1 ] ≤ e ε 1 e -ε 2 ≤ P[z 2 ] / P’[z 2 ] ≤ e ε 2 e -(ε 1 +ε 2 ) ≤ P[(z 1,z 2 )]/P’[(z 1,z 2 )] ≤ e ε 1 +ε 2 P[z 1 ] = Pr z~A 1 (D) [z=z 1 ] P’[z 1 ] = Pr z~A 1 (D’) [z=z 1 ] P[z 2 ] = Pr z~A 2 (D,z 1 ) [z=z 2 ] P’[z 2 ] = Pr z~A 2 (D’,z 1 ) [z=z 2 ]

Example: NO Differential Privacy U set of (name,tag 2 {0,1}) tuples One counting query: #of participants with tag=1 Sanitizer A : choose and release a few random tags Bad event T : Only my tag is 1, my tag released Pr A [A(D+Me) 2 T] ≥ 1/n Pr A [A(D-Me) 2 T] = 0 Not diff private for any ε ! Pr A [A(D+Me) 2 T] Pr A [A(D-Me) 2 T] ≤ e ε ≈ 1+ ε e-ε ≤e-ε ≤

Size of ε How small can ε be? Cannot be negligible Why? Hybrid argument How large can it be? Think of a small constant D, D’ – totally unrelated databases Utility should be very different Consider sequence D 0 =D, D 1, D 2, …, D n =D’ where D i and D i+1 adjacent db. For each output set T Prob[T|D] ¸ Prob[T|D’] ¢ e εn

Answering a single counting query U set of (name,tag 2 {0,1}) tuples One counting query : #of participants with tag=1 Sanitizer A: output #of 1’s + noise Differentially private! If choose noise properly Choose noise from Laplace distribution

Laplacian Noise Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e -|y|/b Standard deviation: O( b) Take b=1/ε, get that Pr[Y=y] Ç e -  |y|

Laplacian Noise: ε- Privacy Take b=1/ε, get that Pr[Y=y] Ç e -  |y| Release: q(D) + Lap(1/ε) For adjacent D, D’ : |q(D) – q(D’)| ≤ 1 For output a : e -  ≤ Pr by D [a]/Pr by D’ [a] ≤ e 

Laplacian Noise: Õ(1/ε)- Error Take b=1/ε, get that Pr[Y=y] Ç e -  |y| Pr y~Y [|y| > k·1/ε] = O(e -k ) Expected error is 1/ε, w.h.p error is Õ(1/ε)

Randomized Response Randomized Response Technique [Warner 1965] –Method for polling stigmatizing questions –Idea: Lie with known probability. Specific answers are deniable Aggregate results are still valid The data is never stored “in the plain” 1 noise … “trust no-one” Popular in DB literature

Randomized Response with Laplacian Noise Initial idea: each user i, on input x i 2 {0, 1} Add to x i independent Laplace noise with magnitude 1/ε Privacy: since each increment protected by Laplace noise – differentially private whether x i is 0 or 1 Accuracy: noise cancels out, error Õ(√ T ) Is it too high? T – total number of users

Scaling Noise to Sensitivity Global sensitivity of query q:U n → R GS q = max D,D’ |q(D) – q(D’)| For a counting query q : GS q =1 Previous argument generalizes: For any query q:U n → R release q(D) + Lap(GS q /ε) ε-private error Õ(GS q /ε) [0,n]

Scaling Noise to Sensitivity Many dimensions Global sensitivity of query q:U n → R d GS q = max D,D’ ||q(D) – q(D’)|| 1 Previous argument generalizes: For any query q:U n → R d release q(D) + (Y 1, Y 2, … Y d ) –Each Y i independent Lap(GS q /ε) ε-private error Õ(GS q /ε)

Example: Histograms Say x 1, x 2,..., x n in domain U Partition U into d disjoint bins q(x 1, x 2,..., x n ) = (n 1, n 2,..., n d ) where n j = #{i : x i in j-th bin} GS q =2 Sufficient to add Lap(2/ε) noise to each count Problem: might not look like a histogram

Covariance Matrix Suppose each person’s data is a real vector (r 1, r 2,..., r n ) Database is a matrix X The covariance matrix of X is (roughly) the matrix Entries measure correlation between attributes First step of many analyses, e.g. PCA

Distance to DP with Property Suppose P = set of “good” databases –well-clustered databases Distance to P = # points in x that must be changed to put x in P Always has GS = 1 Example: –Distance to data set with “good clustering” P x

K Means A clustering algorithm with iteration Always keeping k centers

Median Median of x 1, x 2,..., x n 2 [0,1] X= 0,…,0,0,1,…,1 X’= 0,…,0,1,1,…,1 median(X) = 0 median(X’) = 1 GSmedian = 1 Noise magnitude: 1. Too much noise! But for “ most” neighbor databases X, X’ |median(X) − median(X’)| is small. Can we add less noise on ”good” instances? (n-1)/2

Global Sensitivity vs. Local sensitivity Global sensitivity is worst case over inputs Local sensitivity of query q at point D LS q (D)= max D’ |q(D) – q(D’)| Reminder: GS q (D) = max D LS q (D) Goal: add less noise when local sensitivity is lower Problem: can leak information by amount of noise

Local sensitivity of Median For X = x 1, x 2,..., x n LSmedian( X ) = max(x m − x m−1, x m+1 − x m ) x 1, x 2,..., x m-1, x m, x m+1,..., x n

Sensitivity of Local Sensitivity of Median Median of x 1, x 2,..., x n 2 [0,1] X= 0,…,0,0,0,0,1,…,1 X’= 0,…,0,0,0,1,1,…,1 LS(X) = 0 LS(X’) = 1 Noise magnitude must be an insensitive function! (n-3)/2

Smooth Upper Bound Compute a “smoothed” version of local sensitivity Design sensitivity function S(X) S(X) is an  -smooth upper bound on LS f (X) if: – for all x: S(X) ¸ LS f (X) – for all neighbors X, X’ : S(X) · e  S(X’) Theorem: if A(x) = f(x) + noise(S(x)/ε) then A is 2ε-differentially private.

Smooth sensitivity S f *(X)= max Y {LS f (Y) e -  dist(x,y) } Claim: if S(X) is an  -smooth upper bound on LS f (X) for Smooth sensitivity

The Exponential Mechanism McSherry Talwar A general mechanism that yields Differential privacy May yield utility/approximation Is defined (and evaluated) by considering all possible answers The definition does not yield an efficient way of evaluating it Application: Approximate truthfulness of auctions Collusion resistance Compatibility

Example of the Exponential Mechanism Data: x i = website visited by student i today Range: Y = {website names} For each name y, let q(y; X) = #{i : x i = y} Goal: output the most frequently visited site Procedure: Given X, Output website y with probability prop to e  q(y,X) Popular sites exponentially more likely than rare ones Website scores don’t change too quickly70

Projects Report on a paper Apply a notion studied to some known domain Checking the state of privacy is some setting Privacy in GWAS Privacy in crowd sourcing Privacy Preserving Wordle Unique identification bounds How much worse are differential privacy guarantees in estimation Contextual Privacy

Planned Topics Privacy of Data Analysis Differential Privacy – Definition and Properties – Statistical databases – Dynamic data Privacy of learning algorithms Privacy of genomic data Interaction with cryptography SFE Voting Entropic Security Data Structures Everlasting Security Privacy Enhancing Tech. – Mixed nets

Course Information Foundation of Privacy - Spring 2010 Instructor: Moni Naor When: Mondays, 11:00--13:00 (2 points) Where: Ziskind 1 Course web page : Prerequisites: familiarity with algorithms, data structures, probability theory, and linear algebra, at an undergraduate level; a basic course in computability is assumed. Requirements: – Participation in discussion in class Best: read the papers ahead of time – Homework : There will be several homework assignments Homework assignments should be turned in on time (usually two weeks after they are given)! – Class Project and presentation – Exam : none planned Office: Ziskind 248 Phone: