Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU),

Slides:



Advertisements
Similar presentations
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Advertisements

Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
A Privacy Preserving Index for Range Queries
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Fast Algorithms For Hierarchical Range Histogram Constructions
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
Privacy Enhancing Technologies
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Class-based graph anonymization for social network data Shai Peretz DB Seminar (winter 2009)
Calibrating Noise to Sensitivity in Private Data Analysis
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy Microsoft Research.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
Privacy and trust in social network
Private Analysis of Graphs
Differential Privacy in US Census CompSci Instructor: Ashwin Machanavajjhala 1Lecture 17: Fall 12.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
CS573 Data Privacy and Security Statistical Databases
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
K-Anonymity & Algorithms
Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12.
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Geo-Indistinguishability: Differential Privacy for Location Based Services Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.
Privacy-preserving data publishing
Thesis Sumathie Sundaresan Advisor: Dr. Huiping Guo.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Quantification of Integrity Michael Clarkson and Fred B. Schneider Cornell University IEEE Computer Security Foundations Symposium July 17, 2010.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Secure Data Outsourcing
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
Space for things we might want to put at the bottom of each slide. Part 6: Open Problems 1 Marianne Winslett 1,3, Xiaokui Xiao 2, Yin Yang 3, Zhenjie Zhang.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies Florian Tramèr, Zhicong Huang, Erman Ayday,
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
No Free Lunch: Working Within the Tradeoff Between Quality and Privacy
Private Data Management with Verification
Privacy-preserving Release of Statistics: Differential Privacy
Differential Privacy in Practice
Differential Privacy (2)
Published in: IEEE Transactions on Industrial Informatics
CS639: Data Management for Data Science
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Differential Privacy.
Presentation transcript:

Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU), Bolin Ding (MSR), Xi He (Duke) Census, 8/15/2013

Overview of the talk An inherent trade-off between privacy (confidentiality) of individuals and utility of statistical analyses over data collected from individuals. Differential privacy has revolutionized how we reason about privacy – Nice tuning knob ε for trading off privacy and utility Census, 8/15/20132

Overview of the talk However, differential privacy only captures a small part of the privacy-utility trade-off space – No Free Lunch Theorem – Differentially private mechanisms may not ensure sufficient utility – Differentially private mechanisms may not ensure sufficient privacy Census, 8/15/20133

Overview of the talk I will present a new privacy framework that allows data publishers to more effectively tradeoff privacy for utility – Better control on what to keep secret and who the adversaries are – Can ensure more utility than differential privacy in many cases – Can ensure privacy where differential privacy fails Census, 8/15/20134

Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Census, 8/15/20135

Data Privacy Problem 6 Individual 1 r1r1 Individual 2 r2r2 Individual 3 r3r3 Individual N rNrN Server DBDB Utility: Privacy: No breach about any individual Utility: Privacy: No breach about any individual Census, 8/15/2013

Data Privacy in the real world Census, 8/15/20137 ApplicationData CollectorThird Party (adversary) Private Information Function (utility) MedicalHospitalEpidemiologistDiseaseCorrelation between disease and geography Genome analysis HospitalStatistician/ Researcher GenomeCorrelation between genome and disease AdvertisingGoogle/FB/Y!AdvertiserClicks/Brows ing Number of clicks on an ad by age/region/gender … Social Recommen- dations FacebookAnother userFriend links / profile Recommend other users or ads to users based on social network

T-closeness Li et. al ICDE ‘07 K-Anonymity Sweeney et al. IJUFKS ‘02 Many definitions Linkage attack Background knowledge attack Minimality /Reconstruction attack de Finetti attack Composition attack Census, 8/15/ L-diversity Machanavajjhala et. al TKDD ‘07 E-Privacy Machanavajjhala et. al VLDB ‘09 & several attacks Differential Privacy Dwork et. al ICALP ‘06

Differential Privacy For every output … OD2D2 D1D1 Adversary should not be able to distinguish between any D 1 and D 2 based on any O Pr[A(D 1 ) = O] Pr[A(D 2 ) = O]. Adversary should not be able to distinguish between any D 1 and D 2 based on any O Pr[A(D 1 ) = O] Pr[A(D 2 ) = O]. For every pair of inputs that differ in one value 0) log Census, 8/15/2013

Algorithms No deterministic algorithm guarantees differential privacy. Random sampling does not guarantee differential privacy. Randomized response satisfies differential privacy. Census, 8/15/201310

Laplace Mechanism Database D Researcher Query q True answer q(D) q(D) + η η h(η) α exp(-η / λ) Privacy depends on the λ parameter Mean: 0, Variance: 2 λ 2 Census, 8/15/2013

Laplace Mechanism Thm: If sensitivity of the query is S, then the following guarantees ε- differential privacy. λ = S/ε Sensitivity: Smallest number s.t. for any D,D’ differing in one entry, || q(D) – q(D’) || 1 ≤ S(q) Census, 8/15/2013 [Dwork et al., TCC 2006]

Contingency tables D Count(, ) Each tuple takes k=4 different values Census, 8/15/2013

Laplace Mechanism for Contingency Tables Lap(2/ε) 8 + Lap(2/ε) D Mean : 8 Variance : 8/ε 2 Sensitivity = 2 Census, 8/15/2013

Composition Property If algorithms A 1, A 2, …, A k use independent randomness and each A i satisfies ε i -differential privacy, resp. Then, outputting all the answers together satisfies differential privacy with ε = ε 1 + ε 2 + … + ε k Census, 8/15/ Privacy Budget

Differential Privacy Privacy definition that is independent of the attacker’s prior knowledge. Tolerates many attacks that other definitions are susceptible to. – Avoids composition attacks – Claimed to be tolerant against adversaries with arbitrary background knowledge. Allows simple, efficient and useful privacy mechanisms – Used in LEHD’s OnTheMap [M et al ICDE ‘08] Census, 8/15/201316

Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Census, 8/15/201317

Differential Privacy & Utility Differentially private mechanisms may not ensure sufficient utility for many applications. Sparse Data: Integrated Mean Square Error due to Laplace mechanism can be worse than returning a random contingency table for typical values of ε (around 1) Social Networks [M et al PVLDB 2011] Census, 8/15/201318

Differential Privacy & Privacy Differentially private algorithms may not limit the ability of an adversary to learn sensitive information about individuals when records in the data are correlated Correlations across individuals occur in many ways: – Social Networks – Data with pre-released constraints – Functional Dependencies Census, 8/15/201319

Laplace Mechanism and Correlations Lap(2/ε) Lap(2/ε)10 4 D Does Laplace mechanism still guarantee privacy? Auxiliary marginals published for following reasons: 1.Legal: 2002 Supreme Court case Utah v. Evans 2.Contractual: Advertisers must know exact demographics at coarse granularities Census, 8/15/2013

Laplace Mechanism and Correlations Lap(2/ε) Lap(2/ε)10 4 D 2 + Lap(2/ε) Count (, ) = 8 + Lap(2/ε) Count (, ) = 8 – Lap(2/ε) Count (, ) = 8 + Lap(2/ε) Census, 8/15/2013

Mean : 8 Variance : 8/ke 2 Laplace Mechanism and Correlations Lap(1/ε) Lap(2/ε)10 4 D 2 + Lap(2/ε) can reconstruct the table with high precision for large k Census, 8/15/2013

No Free Lunch Theorem It is not possible to guarantee any utility in addition to privacy, without making assumptions about the data generating distribution the background knowledge available to an adversary 23 [Kifer-M SIGMOD ‘11] Census, 8/15/2013 [Dwork-Naor JPC ‘10]

To sum up … Differential privacy only captures a small part of the privacy-utility trade-off space – No Free Lunch Theorem – Differentially private mechanisms may not ensure sufficient privacy – Differentially private mechanisms may not ensure sufficient utility Census, 8/15/201324

Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Census, 8/15/201325

Pufferfish Framework Census, 8/15/201326

Pufferfish Semantics What is being kept secret? Who are the adversaries? How is information disclosure bounded? – (similar to epsilon in differential privacy) Census, 8/15/201327

Sensitive Information Secrets: S be a set of potentially sensitive statements – “individual j’s record is in the data, and j has Cancer” – “individual j’s record is not in the data” Discriminative Pairs: Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”) – (“Bob has cancer”, “Bob has diabetes”) Census, 8/15/201328

Adversaries We assume a Bayesian adversary who is can be completely characterized by his/her prior information about the data – We do not assume computational limits Data Evolution Scenarios: set of all probability distributions that could have generated the data ( … think adversary’s prior). – No assumptions: All probability distributions over data instances are possible. – I.I.D.: Set of all f such that: P(data = {r 1, r 2, …, r k }) = f(r 1 ) x f(r 2 ) x…x f(r k ) Census, 8/15/201329

Information Disclosure Mechanism M satisfies ε-Pufferfish(S, Spairs, D), if Census, 8/15/201330

Pufferfish Semantic Guarantee Census, 8/15/ Prior odds of s vs s’ Posterior odds of s vs s’

Applying Pufferfish to Differential Privacy Spairs: – “record j is in the table” vs “record j is not in the table” – “record j is in the table with value x” vs “record j is not in the table” Data evolution: – Probability record j is in the table: π j – Probability distribution over values of record j: f j – For all θ = [ f 1, f 2, f 3, …, f k, π 1, π 2, …, π k ] Census, 8/15/201332

Applying Pufferfish to Differential Privacy Spairs: – “record j is in the table” vs “record j is not in the table” – “record j is in the table with value x” vs “record j is not in the table” Data evolution: – For all θ = [ f 1, f 2, f 3, …, f k, π 1, π 2, …, π k ] A mechanism M satisfies differential privacy if and only if it satisfies Pufferfish instantiated using Spairs and {θ} (as defined above) Census, 8/15/201333

Pufferfish & Differential Privacy Spairs: – s i x : record i takes the value x – Attackers should not be able to significantly distinguish between any two values from the domain for any individual record. Census, 8/15/201334

Pufferfish & Differential Privacy Data evolution: – For all θ = [ f 1, f 2, f 3, …, f k ] Adversary’s prior may be any distribution that makes records independent Census, 8/15/201335

Pufferfish & Differential Privacy Spairs: – s i x : record i takes the value x – Data evolution: – For all θ = [ f 1, f 2, f 3, …, f k ] A mechanism M satisfies differential privacy if and only if it satisfies Pufferfish instantiated using Spairs and {θ} Census, 8/15/201336

Summary of Pufferfish A semantic approach to defining privacy – Enumerates the information that is secret and the set of adversaries. – Bounds the odds ratio of pairs of mutually exclusive secrets Helps understand assumptions under which privacy is guaranteed – Differential privacy is one specific choice of secret pairs and adversaries How should a data publisher use this framework? Algorithms? Census, 8/15/201337

Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Census, 8/15/201338

Blowfish Privacy A special class of Pufferfish instantiations Both pufferfish and blowfish are marine fish of the Tetraodontidae family Census, 8/15/201339

Blowfish Privacy A special class of Pufferfish instantiations Extends differential privacy using policies – Specification of sensitive information Allows more utility – Specification of publicly known constraints in the data Ensures privacy in correlated data Satisfies the composition property Census, 8/15/201340

Blowfish Privacy A special class of Pufferfish instantiations Extends differential privacy using policies – Specification of sensitive information Allows more utility – Specification of publicly known constraints in the data Ensures privacy in correlated data Satisfies the composition property Census, 8/15/201341

Sensitive Information Secrets: S be a set of potentially sensitive statements – “individual j’s record is in the data, and j has Cancer” – “individual j’s record is not in the data” Discriminative Pairs: Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”) – (“Bob has cancer”, “Bob has diabetes”) Census, 8/15/201342

Sensitive information in Differential Privacy Spairs: – s i x : record i takes the value x – Attackers should not be able to significantly distinguish between any two values from the domain for any individual record. Census, 8/15/201343

Other notions of Sensitive Information Medical Data – OK to infer whether individual is healthy or not. – E.g., (Bob is Healthy, Bob is Diabetes) is not a discriminative pair of secrets for any individual Partitioned Sensitive Information: Census, 8/15/201344

Other notions of Sensitive Information Geospatial Data – Do not want the attacker to distinguish between “close-by” points in the space. – May distinguish between “far-away” points Distance based Sensitive Information Census, 8/15/201345

Other notions of Sensitive Information Social Networks – Domain of individual’s record is the power set of V (nodes) Edge Privacy: Node Privacy: Census, 8/15/201346

Generalization as a graph Consider a graph G = (V, E), where V is the set of values that an individual’s record can take. E encodes the set of discriminative pairs – Same for all records. Census, 8/15/201347

Blowfish Privacy + “Policy of Secrets” A mechanism M satisfy blowfish privacy wrt policy G if – For every set of outputs of the mechanism S – For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E Census, 8/15/201348

Blowfish Privacy + “Policy of Secrets” A mechanism M satisfy blowfish privacy wrt policy G if – For every set of outputs of the mechanism S – For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E For any x and y in the domain, Census, 8/15/ Shortest distance between x and y in G

Blowfish Privacy + “Policy of Secrets” A mechanism M satisfy blowfish privacy wrt policy G if – For every set of outputs of the mechanism S – For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E Adversary is allowed to distinguish between x and y that appear in different disconnected components in G Census, 8/15/201350

Algorithm1: Randomized Response Perturb each record in the table using the following distribution Non-interactive mechanism Census, 8/15/201351

Algorithms for Blowfish Consider an ordered 1-D attribute – Dom = {x 1,x 2,x 3,…,x d } – E.g., ranges of Age, Salary, etc. Suppose our policy is: Adversary should not distinguish whether an individual’s value is x j or x j+1. Census, 8/15/ x1x1 x2x2 x3x3 xdxd

Algorithms for Blowfish Suppose we want to release histogram privately – Number of individuals in each age range Any differentially private algorithm also satisfies blowfish – Can use Laplace mechanism (with sensitivity 2) Census, 8/15/ x1x1 x2x2 x3x3 xdxd C(x 1 )C(x 3 )C(x d )

Ordered Mechanism We can answer a different set of queries to get a different private estimator for the histogram. Census, 8/15/ x1x1 x2x2 x3x3 xdxd C(x 1 )C(x 3 )C(x d ) S3 S2 S1 Sd …

Ordered Mechanism We can answer each Si using Laplace mechanism … … but sensitivity for all the queries is only 1 Census, 8/15/ x1x1 x2x2 x3x3 xdxd C(x 3 ) +1 S3 S2 S1 Sd … C(x 2 ) -1 Changing one tuple from x2 to x3 only changes S2

Ordered Mechanism We can answer each Si using Laplace mechanism … … but sensitivity for all the queries is only 1 Census, 8/15/ Factor of 2 improvement

Ordered Mechanism In addition, we have the following constraint: However, the noisy counts may not satisfy this constraint. We can post-process the noisy counts to ensure this constraint: Census, 8/15/201357

Ordered Mechanism We can post-process the noisy counts to ensure this constraint: Census, 8/15/ Order of magnitude improvement for large d

Ordered Mechanism By leveraging the weaker sensitive information in the policy, we can provide significantly better utility Extends to more general policy specifications. Ordered mechanisms and other blowfish algorithms are being tested on the synthetic data generator for LODES data product. Census, 8/15/201359

Blowfish Privacy & Correlations Differentially private mechanisms may not ensure privacy when correlations exist in the data. Blowfish can handle constraints in the form of publicly known constraints. – Well know marginal counts in the data – Other dependencies Privacy definition is similar to differential privacy with a modified notion of neighboring tables Census, 8/15/201360

Other instantiations of Pufferfish All blowfish instantiations are extensions of differential privacy using – Weaker notions of sensitive information – Allowing knowledge of constraints about the data – All blowfish mechanisms satisfy composition property We can instantiate Pufferfish with other “realistic” adversary notions – Only prior distributions that are similar to the expected data distribution – Open question: Which definitions satisfy composition property? Census, 8/15/201361

Summary Differential privacy (and the tuning knob epsilon) is insufficient for trading off privacy for utility in many applications – Sparse data, Social networks, … Pufferfish framework allows more expressive privacy definitions – Can vary sensitive information, adversary priors, and epsilon Blowfish shows one way to create more expressive definitions – Can provide useful composable mechanisms There is an opportunity to correctly tune privacy by using the above expressive privacy frameworks Census, 8/15/201362

Thank you [M et al PVLDB’11] A. Machanavajjhala, A. Korolova, A. Das Sarma, “Personalized Social Recommendations – Accurate or Private?”, PVLDB 4(7) 2011 [Kifer-M SIGMOD’11] D. Kifer, A. Machanavajjhala, “No Free Lunch in Data Privacy”, SIGMOD 2011 [Kifer-M PODS’12] D. Kifer, A. Machanavajjhala, “A Rigorous and Customizable Framework for Privacy”, PODS 2012 [ongoing work] A. Machanavajjhala, B. Ding, X. He, “Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies”, in preparation Census, 8/15/201363