Slide 1 CS 380S Differential Privacy Vitaly Shmatikov most slides from Adam Smith (Penn State)

Slides:



Advertisements
Similar presentations
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Advertisements

Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
Privacy Enhancing Technologies
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
An brief tour of Differential Privacy Avrim Blum Computer Science Dept Your guide:
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
Presenting: Assaf Tzabari
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
1 Pinning Down “Privacy” Defining Privacy in Statistical Databases Adam Smith Weizmann Institute of Science
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Shuchi Chawla, Cynthia Dwork, Frank McSherry, Adam Smith, Larry Stockmeyer, Hoeteck Wee From Idiosyncratic to Stereotypical: Toward Privacy in Public Databases.
Calibrating Noise to Sensitivity in Private Data Analysis
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Thanks to Nir Friedman, HU
Cramer-Shoup is Plaintext Aware in the Standard Model Alexander W. Dent Information Security Group Royal Holloway, University of London.
0x1A Great Papers in Computer Security
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Defining and Achieving Differential Privacy Cynthia Dwork, Microsoft TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
Statistical Databases – Query Auditing Li Xiong CS573 Data Privacy and Anonymity Partial slides credit: Vitaly Shmatikov, Univ Texas at Austin.
PARAMETRIC STATISTICAL INFERENCE
Data Privacy Vitaly Shmatikov CS slide 2 uHealth-care datasets Clinical studies, hospital discharge databases … uGenetic datasets $1000 genome,
Slide 1 Differential Privacy Xintao Wu slides (P2-20) from Vitaly Shmatikove, then from Adam Smith.
Slide 1 Vitaly Shmatikov CS 380S Introduction to Secure Multi-Party Computation.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Privacy preserving data mining Li Xiong CS573 Data Privacy and Anonymity.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Additive Data Perturbation: the Basic Problem and Techniques.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Inference: Probabilities and Distributions Feb , 2012.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
CS555Spring 2012/Topic 31 Cryptography CS 555 Topic 3: One-time Pad and Perfect Secrecy.
Slide 1 Vitaly Shmatikov CS 380S Privacy-Preserving Data Mining.
An Introduction to Differential Privacy and its Applications 1 Ali Bagherzandi Ph.D Candidate University of California at Irvine 1- Most slides in this.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Database Privacy (ongoing work) Shuchi Chawla, Cynthia Dwork, Adam Smith, Larry Stockmeyer, Hoeteck Wee.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Topic 36: Zero-Knowledge Proofs
University of Texas at El Paso
Oliver Schulte Machine Learning 726
Private Data Management with Verification
Bayes Net Learning: Bayesian Approaches
Privacy-preserving Release of Statistics: Differential Privacy
Course Business I am traveling April 25-May 3rd
Sublinear Algorithmic Tools 2
Differential Privacy in Practice
Topic 3: Perfect Secrecy
Parametric Methods Berlin Chen, 2005 References:
Published in: IEEE Transactions on Industrial Informatics
CS639: Data Management for Data Science
Mathematical Foundations of BME Reza Shadmehr
Gentle Measurement of Quantum States and Differential Privacy *
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Differential Privacy.
Presentation transcript:

slide 1 CS 380S Differential Privacy Vitaly Shmatikov most slides from Adam Smith (Penn State)

slide 2 Reading Assignment uDwork. “Differential Privacy” (invited talk at ICALP 2006).

Basic Setting xnxn x n-1  x3x3 x2x2 x1x1 San Users (government, researchers, marketers, …) query 1 answer 1 query T answer T  DB= random coins ¢¢¢ slide 3

Examples of Sanitization Methods uInput perturbation Add random noise to database, release uSummary statistics Means, variances Marginal totals Regression coefficients uOutput perturbation Summary statistics with noise uInteractive versions of the above methods Auditor decides which queries are OK, type of noise slide 4

Strawman Definition uAssume x 1,…,x n are drawn i.i.d. from unknown distribution uCandidate definition: sanitization is safe if it only reveals the distribution uImplied approach: Learn the distribution Release description of distribution or re-sample points uThis definition is tautological! Estimate of distribution depends on data… why is it safe? slide 5

Frequency in DB or frequency in underlying population? Blending into a Crowd uIntuition: “I am safe in a group of k or more” k varies (3… 6… 100… 10,000?) uMany variations on theme Adversary wants predicate g such that 0 < #{i | g(x i )=true} < k uWhy? Privacy is “protection from being brought to the attention of others” [Gavison] Rare property helps re-identify someone Implicit: information about a large group is public –E.g., liver problems more prevalent among diabetics slide 6

Clustering-Based Definitions uGiven sanitization S, look at all databases consistent with S uSafe if no predicate is true for all consistent databases uk-anonymity Partition D into bins Safe if each bin is either empty, or contains at least k elements uCell bound methods Release marginal sums slide 7 brownblue  blond [0,12] 12 brown [0,14][0,16] 18  1416 brownblue  blond brown  1416

Issues with Clustering uPurely syntactic definition of privacy uWhat adversary does this apply to? Does not consider adversaries with side information Does not consider probability Does not consider adversarial algorithm for making decisions (inference) slide 8

“Bayesian” Adversaries uAdversary outputs point z  D uScore = 1/f z if f z > 0, 0 otherwise f z is the number of matching points in D uSanitization is safe if E(score) ≤  uProcedure: Assume you know adversary’s prior distribution over databases Given a candidate output, update prior conditioned on output (via Bayes’ rule) If max z E( score | output ) < , then safe to release slide 9

Issues with “Bayesian” Privacy uRestricts the type of predicates adversary can choose uMust know prior distribution Can one scheme work for many distributions? Sanitizer works harder than adversary uConditional probabilities don’t consider previous iterations Remember simulatable auditing? slide 10

Classical Intution for Privacy u“If the release of statistics S makes it possible to determine the value [of private information] more accurately than is possible without access to S, a disclosure has taken place.” [Dalenius 1977] Privacy means that anything that can be learned about a respondent from the statistical database can be learned without access to the database uSimilar to semantic security of encryption Anything about the plaintext that can be learned from a ciphertext can be learned without the ciphertext slide 11

Problems with Classic Intuition uPopular interpretation: prior and posterior views about an individual shouldn’t change “too much” What if my (incorrect) prior is that every UTCS graduate student has three arms? uHow much is “too much?” Can’t achieve cryptographically small levels of disclosure and keep the data useful Adversarial user is supposed to learn unpredictable things about the database slide 12

Impossibility Result uPrivacy: for some definition of “privacy breach,”  distribution on databases,  adversaries A,  A’ such that Pr(A(San)=breach) – Pr(A’()=breach) ≤  For reasonable “breach”, if San(DB) contains information about DB, then some adversary breaks this definition uExample Vitaly knows that Alex Benn is 2 inches taller than the average Russian DB allows computing average height of a Russian This DB breaks Alex’s privacy according to this definition… even if his record is not in the database! slide 13 [Dwork]

(Very Informal) Proof Sketch uSuppose DB is uniformly random Entropy I( DB ; San(DB) ) > 0 u“Breach” is predicting a predicate g(DB) uAdversary knows r, H(r ; San(DB))  g(DB) H is a suitable hash function, r=H(DB) uBy itself, does not leak anything about DB (why?) uTogether with San(DB), reveals g(DB) (why?) slide 14

Differential Privacy (1) xnxn x n-1  x3x3 x2x2 x1x1 San query 1 answer 1 query T answer T  DB= random coins ¢¢¢ slide 15 uExample with Russians and Alex Benn Adversary learns Alex’s height even if he is not in the database uIntuition: “Whatever is learned would be learned regardless of whether or not Alex participates” Dual: Whatever is already known, situation won’t get worse Adversary A

Differential Privacy (2) xnxn x n-1  0 x2x2 x1x1 San query 1 answer 1 query T answer T  DB= random coins ¢¢¢ Adversary A uDefine n+1 games Game 0:Adv. interacts with San(DB) Game i:Adv. interacts with San(DB -i ); DB -i = (x 1,…,x i-1,0,x i+1,…,x n ) Given S and prior p() on DB, define n+1 posterior distrib’s slide 16

Differential Privacy (3) xnxn x n-1  0 x2x2 x1x1 San query 1 answer 1 query T answer T  DB= random coins ¢¢¢ Adversary A slide 17 Definition: San is safe if  prior distributions p(¢) on DB,  transcripts S,  i =1,…,n StatDiff( p 0 (¢|S), p i (¢|S) ) ≤ 

Indistinguishability xnxn x n-1  x3x3 x2x2 x1x1 San query 1 answer 1 query T answer T  DB= random coins ¢¢¢ slide 18 transcript S xnxn x n-1  y3y3 x2x2 x1x1 San query 1 answer 1 query T answer T  DB’= random coins ¢¢¢ transcript S’ Differ in 1 row Distance between distributions is at most 

Which Distance to Use? uProblem:  must be large Any two databases induce transcripts at distance ≤ n  To get utility, need  > 1/n uStatistical difference 1/n is not meaningful! uExample: release random point in database San(x 1,…,x n ) = ( j, x j ) for random j uFor every i, changing x i induces statistical difference 1/n uBut some x i is revealed with probability 1 slide 19

? Definition: San is  -indistinguishable if  A,  DB, DB’ which differ in 1 row,  sets of transcripts S Adversary A query 1 answer 1 transcript S query 1 answer 1 transcript S’ Equivalently,  S: p( San(DB) = S ) p( San(DB’)= S )  1 ±  p( San(DB)  S )  (1 ±  ) p( San(DB’)  S ) Formalizing Indistinguishability slide 20

Indistinguishability  Diff. Privacy Definition: San is safe if  prior distributions p(¢) on DB,  transcripts S,  i =1,…,n StatDiff( p 0 (¢|S), p i (¢|S) ) ≤  For every S and DB, indistinguishability implies This implies StatDiff( p 0 (¢|S), p i (¢| S) ) ≤  slide 21

Diff. Privacy in Output Perturbation uIntuition: f(x) can be released accurately when f is insensitive to individual entries x 1, … x n uGlobal sensitivity GS f = max neighbors x,x’ ||f(x) – f(x’)|| 1 Example: GS average = 1/n for sets of bits uTheorem: f(x) + Lap(GS f /  ) is  -indistinguishable Noise generated from Laplace distribution slide 22 Tell me f(x) f(x)+noise x1…xnx1…xn Database User Lipschitz constant of f

Sensitivity with Laplace Noise slide 23

Differential Privacy: Summary uSan gives  -differential privacy if for all values of DB and Me and all transcripts t: slide 24 Pr [t] Pr[ San (DB - Me) = t] Pr[ San (DB + Me) = t] ≤ e   1 

Intuition uNo perceptible risk is incurred by joining DB uAnything adversary can do to me, it could do without me (my data) Bad Responses: XXX Pr [response] slide 25