Privacy of Correlated Data & Relaxations of Differential Privacy CompSci 590.03 Instructor: Ashwin Machanavajjhala 1Lecture 16: 590.03 Fall 12.

Slides:

Advertisements

Similar presentations

Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.

Advertisements

I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.

Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.

. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.

Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University

Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.

Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.

Fast Algorithms For Hierarchical Range Histogram Constructions

Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev

Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.

Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.

Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.

Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.

Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)

Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.

CSE 421 Algorithms Richard Anderson Lecture 27 NP Completeness.

Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.

Foundations of Privacy Lecture 11 Lecturer: Moni Naor.

1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.

Aspects of Bayesian Inference and Statistical Disclosure Control in Python Duncan Smith Confidentiality and Privacy Group CCSR University of Manchester.

Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,

Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.

Private Analysis of Graphs

Differential Privacy in US Census CompSci Instructor: Ashwin Machanavajjhala 1Lecture 17: Fall 12.

Efficient Gathering of Correlated Data in Sensor Networks

CS573 Data Privacy and Security Statistical Databases

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.

Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.

Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU),

Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.

Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.

K-Anonymity & Algorithms

Implementing Differential Privacy & Side-channel attacks CompSci Instructor: Ashwin Machanavajjhala 1Lecture 14 : Fall 12.

The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.

Data Streams Part 3: Approximate Query Evaluation Reynold Cheng 23 rd July, 2002.

Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.

Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.

1 Relational Algebra and Calculas Chapter 4, Part A.

1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.

On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.

Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.

Foundations of Privacy Lecture 5 Lecturer: Moni Naor.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Differential Privacy Some contents are borrowed from Adam Smith’s slides.

MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.

Differential Privacy (1). Outline  Background  Definition.

Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,

Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.

1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.

No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.

Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.

Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.

Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.

Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies Florian Tramèr, Zhicong Huang, Erman Ayday,

Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.

University of Texas at El Paso

ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,

Private Data Management with Verification

A paper on Join Synopses for Approximate Query Answering

Privacy-preserving Release of Statistics: Differential Privacy

Differential Privacy in Practice

Presented by : SaiVenkatanikhil Nimmagadda

Lecture 6: Counting triangles Dynamic graphs & sampling

CS639: Data Management for Data Science

Refined privacy models

Differential Privacy (1)

Differential Privacy.

Presentation transcript:

Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12

Announcements Starting next class, we will discuss applications of privacy to real- world problems. – Next class: Real World Application of Differential Privacy in US Census No class on Wednesday, Oct 31 Guest Lecture: Prof. Jerry Reiter, (Duke Stats) Friday Nov 2, “Privacy in the U.S. Census and Synthetic Data Generation” Lecture 16: Fall 122

Outline Recap: Pufferfish Privacy Framework [Kifer-M PODS’12] Defining Privacy for Correlated Data [Kifer-M PODS’12 & Ding-M ‘13] – Induced Neighbor Privacy Relaxing differential privacy for utility – Crowd Blending Privacy [Gehrke et al CRYPTO ‘12] – E-privacy [M et al VLDB ‘09] Lecture 16: Fall 123

Recap: No Free Lunch Theorem It is not possible to guarantee any utility in addition to privacy, without making assumptions about the data generating distribution the background knowledge available to an adversary 4 [Kifer-Machanavajjhala SIGMOD ‘11] Lecture 16: Fall 12 [Dwork-Naor JPC ‘10]

Correlations & Differential Privacy When an adversary knows that individuals in a table are correlated, then (s)he can learn sensitive information about individuals even from the output of a differentially private mechanism. Example 1: Contingency tables with pre-released exact counts Example 2: Social Networks Lecture 16: Fall 125

Mean : 8 Variance : 2/ke 2 Marginal counts Lap(1/ε) Lap(1/ε)10 4 D 2 + Lap(1/ε) can reconstruct the table with high precision for large k Lecture 16: Fall 12

Reason for Privacy Breach 7 Pairs of tables that differ in one tuple cannot distinguish them Tables that do not satisfy background knowledge Space of all possible tables Lecture 16: Fall 12

Reason for Privacy Breach 8 can distinguish between every pair of these tables based on the output Space of all possible tables Lecture 16: Fall 12

Recap: Why we need domain specific privacy? For handling correlations – Prereleased marginals & Social networks [Kifer-M SIGMOD ‘11] Utility driven applications – For some applications existing privacy definitions do not provide sufficient utility [M et al PVLDB ‘11] Personalized privacy & aggregate secrets [Kifer-M PODS ‘12] Qn: How to design principled privacy definitions customized to such scenarios? Lecture 16: Fall 129

Recap: Pufferfish Framework Lecture 16: Fall 1210

Recap: Pufferfish Semantics What is being kept secret? Who are the adversaries? How is information disclosure bounded? Lecture 16: Fall 1211

Recap: Sensitive Information Secrets: S be a set of potentially sensitive statements – “individual j’s record is in the data, and j has Cancer” – “individual j’s record is not in the data” Discriminative Pairs: Spairs is a subset of SxS. Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”) – (“Bob has cancer”, “Bob has diabetes”) Lecture 16: Fall 1212

Recap: Adversaries An adversary can be completely characterized by his/her prior information about the data – We do not assume computational limits Data Evolution Scenarios: set of all probability distributions that could have generated the data. – No assumptions: All probability distributions over data instances are possible. – I.I.D.: Set of all f such that: P(data = {r 1, r 2, …, r k }) = f(r 1 ) x f(r 2 ) x…x f(r k ) Lecture 16: Fall 1213

Recap: Pufferfish Framework Mechanism M satisfies ε-Pufferfish(S, Spairs, D), if for every – w ε Range(M), – (s i, s j ) ε Spairs – Θ ε D, such that P(s i | θ) ≠ 0, P(s j | θ) ≠ 0 P(M(data) = w | s i, θ) ≤ e ε P(M(data) = w | s j, θ) Lecture 16: Fall 1214

Recap: Pufferfish Semantic Guarantee Lecture 16: Fall 1215 Prior odds of s i vs s j Posterior odds of s i vs s j

Recap: Pufferfish & Differential Privacy Spairs: – “record j is in the table” vs “record j is not in the table” – “record j is in the table with value x” vs “record j is not in the table” Data evolution: – For all θ = [f 1, f 2, f 3, …, f k, π 1, π 2, …, π k ] – P[Data = D | θ] = Π rj not in D (1-π j ) x Π rj in D π j x f j (r j ) A mechanism M satisfies differential privacy if and only if it satisfies Pufferfish instantiated using Spairs and {θ} (as defined above) Lecture 16: Fall 1216

Outline Recap: Pufferfish Privacy Framework [Kifer-M PODS’12] Defining Privacy for Correlated Data [Kifer-M PODS’12 & Ding-M ‘13] – Current research Relaxing differential privacy for utility – Crowd Blending Privacy [Gehrke et al CRYPTO ‘12] – E-privacy [M et al VLDB ‘09] Lecture 16: Fall 1217

Reason for Privacy Breach 18 can distinguish between every pair of these tables based on the output Space of all possible tables Lecture 16: Fall 12

Induced Neighbor Privacy Differential Privacy: Neighboring tables differ in one value … But one or both the neighbors may not satisfy the constraints. Lecture 16: Fall 1219 DD’ Differs in one record, but D’ does not satisfy row and column sums Move (add/delete a record)

Induced Neighbor Privacy Induces Neighbors (Q) [Kifer-M ’11 & Pan] Pick an individual j Consider 2 tables D a, D b that differ in j’s record – D a (j) = a, and D b (j) = b D a and D b are induced neighbors if they are minimally different – D a and D b satisfy the constraints in Q – Let M = {m1, m2, …, mk} be the smallest set of moves that change D a to D b – There does not exist a D c which satisfies the constraints and can be constructed from D a using a subset of moves from D b Lecture 16: Fall 1220

Example 1 a1a2a3 Lecture 16: Fall b1 b2 b3 a1,b1 a2,b2 a3,b3 Is Table B an Induced Neighbor of Table A given the row and column sums? Table A a1,b2 a2,b2 a3,b3 Table B Ans: NO

Example 1 Lecture 16: Fall 1222 Table B does not satisfy row and column sums. a1,b2 a2,b2 a3,b3 Table B a1a2a b1 b2 b3

Example 2 a1a2a3 Lecture 16: Fall b1 b2 b3 a1,b1 a2,b2 a3,b3 a1,b1 a2,b2 a3,b3 Is Table B an Induced Neighbor of Table A given the row and column sums? Table A Ans: No a1,b2 a2,b3 a3,b1 a1,b2 a2,b3 a3,b1 Table B a1a2a b1 b2 b3

Example 2 Lecture 16: Fall 1224 a1,b1 a2,b2 a3,b3 a1,b1 a2,b2 a3,b3 Table C can be generated from Table A using a subset of moves. Table A a1,b2 a2,b3 a3,b1 a1,b2 a2,b3 a3,b1 Table B a1,b2 a2,b3 a3,b1 a1,b1 a2,b2 a3,b3 Table C a1a2a b1 b2 b3

25 Example 3 a1,b1 a2,b2 a3,b3 a1,b1 a2,b2 a3,b3 Table C and Table A are induced neighbors. Table A a1,b2 a2,b3 a3,b1 a1,b1 a2,b2 a3,b3 Table C a1a2a b1 b2 b3 a1a2a b1 b2 b3

Induced Neighbor Privacy For every output … OD2D2 D1D1 Adversary should not be able to distinguish between any D 1 and D 2 based on any O Pr[A(D 1 ) = O] Pr[A(D 2 ) = O]. Adversary should not be able to distinguish between any D 1 and D 2 based on any O Pr[A(D 1 ) = O] Pr[A(D 2 ) = O]. For every pair of induced neighbors 0) log 26Lecture 16: Fall 12

Induced Neighbors Privacy and Pufferfish Given a set of count constraints Q, Spairs: – “record j is in the table” vs “record j is not in the table” – “record j is in the table with value x” vs “record j is not in the table” Data evolution: – For all θ = [f 1, f 2, f 3, …, f k, π 1, π 2, …, π k ] – P[Data = D | θ] α Π rj not in D (1-π j ) x Π rj in D π j x f j (r j ), if D satisfies Q – P[Data = D | θ] = 0, if D does not satisfy Q Conjecture: A mechanism M satisfies induced neighbors privacy if and only if it satisfies Pufferfish instantiated using Spairs and {θ} Lecture 16: Fall 1227

Laplace Mechanism for Induced Neighbors Privacy Thm: If induced-sensitivity of the query is S in (q), then adding Lap(λ) noise guarantees ε-participation privacy. λ = S in (q)/ε S in (q): Smallest number s.t. for any induced-neighbors d, d’, || q(d) – q(d’) || 1 ≤ S in (q) 28 Lecture 16: Fall 12

Induced Sensitivity q a1,b1 : The number of records with A = a1 and B = b1? – Sensitivity = ? q b1 : The number of records with B=b1? – Sensitivity = ? q all : All the counts in the contingency table? – Sensitivity = ? Lecture 16: Fall 1229 ???2 ???2 ???2 222

Induced Sensitivity q a1,b1 : The number of records with A = a1 and B = b1? – Sensitivity = 1 q b1 : The number of records with B=b1? – Sensitivity = 0 q all : All the counts in the contingency table? – Sensitivity = 6 Lecture 16: Fall 1230 ???2 ???2 ???2 222

Induced Sensitivity What is the sensitivity if all counts in the contingency table are released? Sensitivity ≥ 6 Table A Table CDiff Lecture 16: Fall =-

Induced sensitivity The Diff between two induced neighbors represents the moves – + means addition and – means deletion. – +1 in each cell must be offset by a -1 in the same row and another -1 in the same column (degree = 2) – Hence, if we have an edge between every +1 and -1 in the same row or column, we get a graph which is a collection of cycles!. 32 Lecture 16: Fall 12

Induced Sensitivity Lecture 16: Fall =- Table ATable CDiff Simple cycle can have at most min(2r, 2c) nodes where r = number of rows c = number of columns

Induced Sensitivity Lecture 16: Fall =- Table ATable DDiff =- Table ATable EDiff (NOT an induced neighbor of A) (is an induced neighbor of A)

Computing induced sensitivity 2D case: q all : outputs all the counts in a 2-D contingency table. Marginals: row and column sums. The induced-sensitivity of q all = min(2r, 2c). General Case: Deciding whether S in (q) > 0 is NP-hard. Conjecture: Computing S in (q) is hard (and complete) for the second level of the polynomial hierarchy. 35 Lecture 16: Fall 12

Summary Correlations in the data can allow adversaries to learn sensitive information even from a differentially private release. Induced Neighbors Privacy helps limit this disclosure when correlations are due constraints that are publicly known about the data. Algorithms for differential privacy can be used to ensure induced neighbor privacy by using the appropriate sensitivity. Lecture 16: Fall 1236

Open Questions Induced neighbor privacy for general count constraints – Are ways to approximate the sensitivity? Answering queries using noisy data + exact knowledge Privacy of social networks – Adversaries may use social network evolution models to infer sensitive information about edges in a network [Kifer-M SIGMOD ‘11] – Can correlations in a social network be generatively described? Lecture 16: Fall 1237

Outline Recap: Pufferfish Privacy Framework [Kifer-M PODS’12] Defining Privacy for Correlated Data [Kifer-M PODS’12 & Ding-M ‘13] – Current research Relaxing differential privacy for utility – Crowd Blending Privacy [Gehrke et al CRYPTO ‘12] – E-privacy [M et al VLDB ‘09] Lecture 16: Fall 1238

Recap: Pufferfish & Differential Privacy Spairs: – “record j is in the table” vs “record j is not in the table” – “record j is in the table with value x” vs “record j is not in the table” Data evolution: – For all θ = [f 1, f 2, f 3, …, f k, π 1, π 2, …, π k ] – P[Data = D | θ] = Π rj not in D (1-π j ) x Π rj in D π j x f j (r j ) A mechanism M satisfies differential privacy if and only if it satisfies Pufferfish instantiated using Spairs and {θ} Lecture 16: Fall 1239 An adversary may know an arbitrary distribution about each individual

Need for relaxed notions of privacy In certain applications, differentially private mechanisms do not provide sufficient utility How to define privacy while guarding against restricted forms of attackers? – Need to be resistant to attacks: Previous definitions were susceptible to composition, minimality, and other attacks. Lecture 16: Fall 1240

Approaches to Relax Privacy Computationally Bounded Adversaries [Groce et al TCC ‘11] Allowing certain disclosures [Gehrke et al CRYPTO ‘12] Considering “realistic” adversaries with bounded prior knowledge [M et al VLDB ‘09] Lecture 16: Fall 1241

Restricting the Adversary’s computational power Consider attackers who can execute a polynomial time Turing machine (e.g., only use algorithms in P) [Groce et al TCC ‘11] “… for queries with output in R d (for a constant d) and a natural class of utilities, any computationally private mechanism can be converted to a statistically private mechanism that is roughly as efficient and achieves almost the same utility …” Lecture 16: Fall 1242

Crowd-blending Privacy Definition: Individuals t and t’ in a database D are indistinguishable with respect to mechanism M if, for all outputs w P[M(D) = w] ≤ e ε P[M(D t,t’ ) = w] where, D t,t’ is the database where t is replaced with t’ Blending in a Crowd: An individual t in D is said to ε-blend in a crowd of k people with respect to mechanism M if t is indistinguishable from k-1 other individuals in the data. Lecture 16: Fall 1243 [Gehrke et al CRYPTO ‘12]

Crowd Blending Privacy Lecture 16: Fall 1244 D This individual 0-blends in a crowd of size

Crowd Blending Privacy Lecture 16: Fall 1245 D Every tuple ε-blends in a crowd of size N = Lap(2/ε) 8+ Lap(2/ε)

Crowd Blending Privacy Definition: A mechanism M is (k, ε)-crowd blending private if for every database D and every individual t, - either, t ε-blends in a crowd of size k - or, for all w, P(M(D) = w) ≤ e ε P(M(D – {t}) = w) Lecture 16: Fall 1246

Mechanisms Release a histogram by suppressing all counts less than k – Satisfies (K,0)-crowd blending privacy Release a histogram by adding Laplace noise to counts less than k – Satisfies (K, ε)-crowd blending privacy Lecture 16: Fall 1247

Weaker than differential privacy Adversary can infer a sensitive property of an individual. But it will be shared by at least k other people – This looks like a property of the population rather than that of the individual. The definition does not satisfy composability. Lecture 16: Fall 1248

Sampling + Crowd-blending => Differential Privacy Let M p be a mechanism that: – Constructs a sample S by picking each record in the data with probability p – Executes mechanism M on S. Theorem: If M is (k,ε)-crowd-blending private (for k > 1). Then M p satisfies: Lecture 16: Fall 1249

Open Questions What other mechanisms satisfy Crowd-blending privacy? Given a privacy budget, can we answer a workload of queries with minimum error by using the sampling + crowd-blending approach? Sampling + k-anonymity => Differential Privacy – What other mechanisms in addition to sampling give sufficient privacy? How big should K be? – K is the boundary between individual-specific and population level properties. Lecture 16: Fall 1250

Next Class E-privacy – Relaxation of differential privacy which limits the adversaries considered. Application of privacy technology to US Census Lecture 16: Fall 1251

References [Groce et al TCC ‘11] A. Groce, J. Katz, A. Yerukhimovic, “Limits on computational privacy in the client/server setting”, TCC 2011 [Gehrke et al CRYPTO ‘12] J. Gehrke, M. Hay, E. Liu, R. Pass, “Crowd Blending Privacy”, CRYPTO 2012 [M et al VLDB ‘09] A. Machanavajjhala, J. Gehrke, M. Gotz, “Data Publishing against Realistic Adversaries”, PVLDB 2(1) 2009 [Kifer-M SIGMOD’11] D. Kifer, A. Machanavajjhala, “No Free Lunch in Data Privacy”, SIGMOD 2011 [Kifer-M PODS’12] D. Kifer, A. Machanavajjhala, “A Rigorous and Customizable Framework for Privacy”, PODS 2012 [Ding-M ‘12] B. Ding, A. Machanavajjhala, “Induced Neighbors Privacy(Work in progress)”, 2012 Lecture 16: Fall 1252