Privacy-preserving Release of Statistics: Differential Privacy

Slides:



Advertisements
Similar presentations
Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
Advertisements

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 8 04/04/2011 Security and Privacy in Cloud Computing.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
Foundations of Privacy Lecture 4 Lecturer: Moni Naor.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Privacy Enhancing Technologies
Adding Privacy to Netflix Recommendations Frank McSherry, Ilya Mironov (MSR SVC) Attacks on Recommender Systems — No “blending in”, auxiliary information.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
An brief tour of Differential Privacy Avrim Blum Computer Science Dept Your guide:
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
1 Validation and Verification of Simulation Models.
Calibrating Noise to Sensitivity in Private Data Analysis
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Human Computable Passwords
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
Private Analysis of Graphs
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Defining and Achieving Differential Privacy Cynthia Dwork, Microsoft TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 16 10/11/2011 Security and Privacy in Cloud Computing.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
Slide 1 Differential Privacy Xintao Wu slides (P2-20) from Vitaly Shmatikove, then from Adam Smith.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Quantification of Integrity Michael Clarkson and Fred B. Schneider Cornell University IEEE Computer Security Foundations Symposium July 17, 2010.
An Introduction to Differential Privacy and its Applications 1 Ali Bagherzandi Ph.D Candidate University of California at Irvine 1- Most slides in this.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Privacy Preserving in Social Network Based System PRENTER: YI LIANG.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Slide 1 CS 380S Differential Privacy Vitaly Shmatikov most slides from Adam Smith (Penn State)
University of Texas at El Paso
Towards Human Computable Passwords
Oliver Schulte Machine Learning 726
Private Data Management with Verification
Techniques for Achieving Vertex Level Differential Privacy
Sampling Distributions
Understanding Generalization in Adaptive Data Analysis
Bayes Net Learning: Bayesian Approaches
Sampling Distributions
Chapter 7: Sampling Distributions
Human Computable Passwords
Graph Analysis with Node Differential Privacy
Thinking (and teaching) Quantitatively about Bias and Privacy
Differential Privacy in Practice
Current Developments in Differential Privacy
Differential Privacy and Statistical Inference: A TCS Perspective
Foundations of Privacy Lecture 7
Differential Privacy (2)
Published in: IEEE Transactions on Industrial Informatics
CS639: Data Management for Data Science
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Differential Privacy.
Presentation transcript:

Privacy-preserving Release of Statistics: Differential Privacy 18734: Foundations of Privacy Privacy-preserving Release of Statistics: Differential Privacy Anupam Datta CMU Fall 2015

Privacy-Preserving Statistics: Non-Interactive Setting Add noise, sample, generalize, suppress x1 … xn Analyst Sanitized Database D’ Database D maintained by trusted curator Goals: Accurate statistics (low noise) Preserve individual privacy (what does that mean?) Census data Health data Network data …

Privacy-Preserving Statistics: Interactive Setting # individuals with salary > $30K Query f x1 … xn f(D)+noise Analyst Database D maintained by trusted curator Goals: Accurate statistics (low noise) Preserve individual privacy (what does that mean?) Census data Health data Network data …

Classical Intuition for Privacy “If the release of statistics S makes it possible to determine the value [of private information] more accurately than is possible without access to S, a disclosure has taken place.” [Dalenius 1977] Privacy means that anything that can be learned about a respondent from the statistical database can be learned without access to the database Similar to semantic security of encryption

Impossibility Result [Dwork, Naor 2006] Result: For reasonable “breach”, if sanitized database contains information about database, then some adversary breaks this definition Example Terry Gross is two inches shorter than the average Lithuanian woman DB allows computing average height of a Lithuanian woman This DB breaks Terry Gross’s privacy according to this definition… even if her record is not in the database!

Very Informal Proof Sketch Suppose DB is uniformly random “Breach” is predicting a predicate g(DB) Adversary’s background knowledge: r, H(r ; San(DB))  g(DB) where H is a suitable hash function, r=H(DB) By itself, does not leak anything about DB Together with San(DB), reveals g(DB)

Differential Privacy: Idea [Dwork, McSherry, Nissim, Smith 2006] If there is already some risk of revealing a secret of C by combining auxiliary information and something learned from DB, then that risk is still there but not increased by C’s participation in the database C is no worse off because her record is included in the computation Released statistic is about the same if any individual’s record is removed from the database

An Information Flow Idea Changing input databases in a specific way changes output statistic by a small amount

Not Absolute Confidentiality Does not guarantee that Terry Gross’s height won’t be learned by the adversary

Differential Privacy: Definition Randomized sanitization function κ has ε-differential privacy if for all data sets D1 and D2 differing by at most one element and all subsets S of the range of κ, Pr[κ(D1)  S ] ≤ eε Pr[κ(D2)  S ] Answer to query # individuals with salary > $30K is in range [100, 110] with approximately the same probability in D1 and D2

Achieving Differential Privacy: Interactive Setting User Database D Tell me f(D) x1 … xn f(D)+noise How much and what type of noise should be added?

Example: Noise Addition Slide: Adam Smith

Global Sensitivity Slide: Adam Smith

Exercise Function f: # individuals with salary > $30K Global Sensitivity of f = ? Answer: 1

Background on Probability Theory (see Oct 11, 2013 recitation)

Continuous Probability Distributions Probability density function (PDF), fX Example distributions Normal, exponential, Gaussian, Laplace

Laplace Distribution PDF = Mean = μ Variance = 2b2 Source: Wikipedia

Laplace Distribution Change of notation from previous slide: x  y μ  0 b  

Achieving Differential Privacy

Laplace Mechanism Slide: Adam Smith

Laplace Mechanism: Proof Idea Pr[A(x) = t] Pr[A(x’) = t]

Laplace Mechanism: More details Pr 𝐴 𝑥 ∈𝑆 = 𝑡∈𝑆 𝑝 𝐴 𝑥 =𝑡 𝑑𝑡 𝑝 𝐴 𝑥 =𝑡 =𝑝 𝐿=𝑡−𝑓 𝑥 =ℎ 𝑡−𝑓 𝑥 ℎ(𝑡−𝑓 𝑥 ) ℎ(𝑡−𝑓 𝑥 ′ ) ≤ exp 𝑡 −𝑓 𝑥 𝜆 exp 𝑡 −𝑓 𝑥 ′ 𝜆 ≤ exp 𝑓 𝑥 −𝑓 𝑥 ′ 𝜆 ≤ exp 𝐺 𝑆 𝑓 𝜆 Pr 𝐴 𝑥 ∈𝑆 Pr 𝐴 𝑥 ′ ∈𝑆 = 𝑡∈𝑆 𝑝 𝐴 𝑥 =𝑡 𝑑𝑡 𝑡∈𝑆 𝑝 𝐴 𝑥 ′ =𝑡 𝑑𝑡 = 𝑡∈𝑆 ℎ 𝑡−𝑓(𝑥) 𝑑𝑡 𝑡∈𝑆 ℎ 𝑡−𝑓( 𝑥 ′ ) 𝑑𝑡 ≤ exp 𝐺 𝑆 𝑓 𝜆 For 𝜆= 𝐺 𝑆 𝑓 𝜖 , we have 𝜖-differential privacy

Example: Noise Addition Slide: Adam Smith

Using Global Sensitivity Many natural functions have low global sensitivity Histogram, covariance matrix, strongly convex optimization problems

Composition Theorem If A1 is ε1-differentially private and A2 is ε2-differentially private and they use independent random coins then < A1 , A2 > is (ε1+ε2)-differentially private Repeated querying degrades privacy; degradation is quantifiable

Applications Netflix data set [McSherry, Mironov 2009; MSR] Accuracy of differentially private recommendations (wrt one movie rating) comparable to baseline set by Netflix Network trace data sets [McSherry, Mahajan 2010; MSR]

Challenge: High Sensitivity Approach: Add noise proportional to sensitivity to preserve -differential privacy Improvements: Smooth sensitivity [Nissim, Raskhodnikova, Smith 2007; BGU-PSU] Restricted sensitivity [Blocki, Blum, Datta, Sheffet 2013; CMU]

Challenge: Identifying an Individual’s Information Information about an individual may not be just in their own record Example: In a social network, information about node A also in node B influenced by A, for example, because A may have caused a link between B and C

Differential Privacy: Summary An approach to releasing privacy-preserving statistics A rigorous privacy guarantee Significant activity in theoretical CS community Several applications to real data sets Recommendation systems, network trace data,.. Some challenges High sensitivity, identifying individual’s information, repeated querying