Download presentation
Presentation is loading. Please wait.
Published byGeorgia Taylor Modified over 9 years ago
1
Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU), Bolin Ding (MSR), Xi He (Duke) 1Summer @ Census, 8/15/2013
2
Overview of the talk An inherent trade-off between privacy (confidentiality) of individuals and utility of statistical analyses over data collected from individuals. Differential privacy has revolutionized how we reason about privacy – Nice tuning knob ε for trading off privacy and utility Summer @ Census, 8/15/20132
3
Overview of the talk However, differential privacy only captures a small part of the privacy-utility trade-off space – No Free Lunch Theorem – Differentially private mechanisms may not ensure sufficient utility – Differentially private mechanisms may not ensure sufficient privacy Summer @ Census, 8/15/20133
4
Overview of the talk I will present a new privacy framework that allows data publishers to more effectively tradeoff privacy for utility – Better control on what to keep secret and who the adversaries are – Can ensure more utility than differential privacy in many cases – Can ensure privacy where differential privacy fails Summer @ Census, 8/15/20134
5
Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Summer @ Census, 8/15/20135
6
Data Privacy Problem 6 Individual 1 r1r1 Individual 2 r2r2 Individual 3 r3r3 Individual N rNrN Server DBDB Utility: Privacy: No breach about any individual Utility: Privacy: No breach about any individual Summer @ Census, 8/15/2013
7
Data Privacy in the real world Summer @ Census, 8/15/20137 ApplicationData CollectorThird Party (adversary) Private Information Function (utility) MedicalHospitalEpidemiologistDiseaseCorrelation between disease and geography Genome analysis HospitalStatistician/ Researcher GenomeCorrelation between genome and disease AdvertisingGoogle/FB/Y!AdvertiserClicks/Brows ing Number of clicks on an ad by age/region/gender … Social Recommen- dations FacebookAnother userFriend links / profile Recommend other users or ads to users based on social network
8
T-closeness Li et. al ICDE ‘07 K-Anonymity Sweeney et al. IJUFKS ‘02 Many definitions Linkage attack Background knowledge attack Minimality /Reconstruction attack de Finetti attack Composition attack Summer @ Census, 8/15/2013 8 L-diversity Machanavajjhala et. al TKDD ‘07 E-Privacy Machanavajjhala et. al VLDB ‘09 & several attacks Differential Privacy Dwork et. al ICALP ‘06
9
Differential Privacy For every output … OD2D2 D1D1 Adversary should not be able to distinguish between any D 1 and D 2 based on any O Pr[A(D 1 ) = O] Pr[A(D 2 ) = O]. Adversary should not be able to distinguish between any D 1 and D 2 based on any O Pr[A(D 1 ) = O] Pr[A(D 2 ) = O]. For every pair of inputs that differ in one value 0) log 9Summer @ Census, 8/15/2013
10
Algorithms No deterministic algorithm guarantees differential privacy. Random sampling does not guarantee differential privacy. Randomized response satisfies differential privacy. Summer @ Census, 8/15/201310
11
Laplace Mechanism Database D Researcher Query q True answer q(D) q(D) + η η h(η) α exp(-η / λ) Privacy depends on the λ parameter Mean: 0, Variance: 2 λ 2 11Summer @ Census, 8/15/2013
12
Laplace Mechanism Thm: If sensitivity of the query is S, then the following guarantees ε- differential privacy. λ = S/ε Sensitivity: Smallest number s.t. for any D,D’ differing in one entry, || q(D) – q(D’) || 1 ≤ S(q) 12Summer @ Census, 8/15/2013 [Dwork et al., TCC 2006]
13
Contingency tables 13 22 28 D Count(, ) Each tuple takes k=4 different values Summer @ Census, 8/15/2013
14
Laplace Mechanism for Contingency Tables 14 2 + Lap(2/ε) 8 + Lap(2/ε) D Mean : 8 Variance : 8/ε 2 Sensitivity = 2 Summer @ Census, 8/15/2013
15
Composition Property If algorithms A 1, A 2, …, A k use independent randomness and each A i satisfies ε i -differential privacy, resp. Then, outputting all the answers together satisfies differential privacy with ε = ε 1 + ε 2 + … + ε k Summer @ Census, 8/15/201315 Privacy Budget
16
Differential Privacy Privacy definition that is independent of the attacker’s prior knowledge. Tolerates many attacks that other definitions are susceptible to. – Avoids composition attacks – Claimed to be tolerant against adversaries with arbitrary background knowledge. Allows simple, efficient and useful privacy mechanisms – Used in LEHD’s OnTheMap [M et al ICDE ‘08] Summer @ Census, 8/15/201316
17
Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Summer @ Census, 8/15/201317
18
Differential Privacy & Utility Differentially private mechanisms may not ensure sufficient utility for many applications. Sparse Data: Integrated Mean Square Error due to Laplace mechanism can be worse than returning a random contingency table for typical values of ε (around 1) Social Networks [M et al PVLDB 2011] Summer @ Census, 8/15/201318
19
Differential Privacy & Privacy Differentially private algorithms may not limit the ability of an adversary to learn sensitive information about individuals when records in the data are correlated Correlations across individuals occur in many ways: – Social Networks – Data with pre-released constraints – Functional Dependencies Summer @ Census, 8/15/201319
20
Laplace Mechanism and Correlations 20 2 + Lap(2/ε) 4 8 + Lap(2/ε)10 4 D Does Laplace mechanism still guarantee privacy? Auxiliary marginals published for following reasons: 1.Legal: 2002 Supreme Court case Utah v. Evans 2.Contractual: Advertisers must know exact demographics at coarse granularities 4 10 4 Summer @ Census, 8/15/2013
21
Laplace Mechanism and Correlations 21 2 + Lap(2/ε) 4 8 + Lap(2/ε)10 4 D 2 + Lap(2/ε) Count (, ) = 8 + Lap(2/ε) Count (, ) = 8 – Lap(2/ε) Count (, ) = 8 + Lap(2/ε) Summer @ Census, 8/15/2013
22
Mean : 8 Variance : 8/ke 2 Laplace Mechanism and Correlations 22 2 + Lap(1/ε) 4 8 + Lap(2/ε)10 4 D 2 + Lap(2/ε) can reconstruct the table with high precision for large k Summer @ Census, 8/15/2013
23
No Free Lunch Theorem It is not possible to guarantee any utility in addition to privacy, without making assumptions about the data generating distribution the background knowledge available to an adversary 23 [Kifer-M SIGMOD ‘11] Summer @ Census, 8/15/2013 [Dwork-Naor JPC ‘10]
24
To sum up … Differential privacy only captures a small part of the privacy-utility trade-off space – No Free Lunch Theorem – Differentially private mechanisms may not ensure sufficient privacy – Differentially private mechanisms may not ensure sufficient utility Summer @ Census, 8/15/201324
25
Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Summer @ Census, 8/15/201325
26
Pufferfish Framework Summer @ Census, 8/15/201326
27
Pufferfish Semantics What is being kept secret? Who are the adversaries? How is information disclosure bounded? – (similar to epsilon in differential privacy) Summer @ Census, 8/15/201327
28
Sensitive Information Secrets: S be a set of potentially sensitive statements – “individual j’s record is in the data, and j has Cancer” – “individual j’s record is not in the data” Discriminative Pairs: Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”) – (“Bob has cancer”, “Bob has diabetes”) Summer @ Census, 8/15/201328
29
Adversaries We assume a Bayesian adversary who is can be completely characterized by his/her prior information about the data – We do not assume computational limits Data Evolution Scenarios: set of all probability distributions that could have generated the data ( … think adversary’s prior). – No assumptions: All probability distributions over data instances are possible. – I.I.D.: Set of all f such that: P(data = {r 1, r 2, …, r k }) = f(r 1 ) x f(r 2 ) x…x f(r k ) Summer @ Census, 8/15/201329
30
Information Disclosure Mechanism M satisfies ε-Pufferfish(S, Spairs, D), if Summer @ Census, 8/15/201330
31
Pufferfish Semantic Guarantee Summer @ Census, 8/15/201331 Prior odds of s vs s’ Posterior odds of s vs s’
32
Applying Pufferfish to Differential Privacy Spairs: – “record j is in the table” vs “record j is not in the table” – “record j is in the table with value x” vs “record j is not in the table” Data evolution: – Probability record j is in the table: π j – Probability distribution over values of record j: f j – For all θ = [ f 1, f 2, f 3, …, f k, π 1, π 2, …, π k ] Summer @ Census, 8/15/201332
33
Applying Pufferfish to Differential Privacy Spairs: – “record j is in the table” vs “record j is not in the table” – “record j is in the table with value x” vs “record j is not in the table” Data evolution: – For all θ = [ f 1, f 2, f 3, …, f k, π 1, π 2, …, π k ] A mechanism M satisfies differential privacy if and only if it satisfies Pufferfish instantiated using Spairs and {θ} (as defined above) Summer @ Census, 8/15/201333
34
Pufferfish & Differential Privacy Spairs: – s i x : record i takes the value x – Attackers should not be able to significantly distinguish between any two values from the domain for any individual record. Summer @ Census, 8/15/201334
35
Pufferfish & Differential Privacy Data evolution: – For all θ = [ f 1, f 2, f 3, …, f k ] Adversary’s prior may be any distribution that makes records independent Summer @ Census, 8/15/201335
36
Pufferfish & Differential Privacy Spairs: – s i x : record i takes the value x – Data evolution: – For all θ = [ f 1, f 2, f 3, …, f k ] A mechanism M satisfies differential privacy if and only if it satisfies Pufferfish instantiated using Spairs and {θ} Summer @ Census, 8/15/201336
37
Summary of Pufferfish A semantic approach to defining privacy – Enumerates the information that is secret and the set of adversaries. – Bounds the odds ratio of pairs of mutually exclusive secrets Helps understand assumptions under which privacy is guaranteed – Differential privacy is one specific choice of secret pairs and adversaries How should a data publisher use this framework? Algorithms? Summer @ Census, 8/15/201337
38
Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Summer @ Census, 8/15/201338
39
Blowfish Privacy A special class of Pufferfish instantiations Both pufferfish and blowfish are marine fish of the Tetraodontidae family Summer @ Census, 8/15/201339
40
Blowfish Privacy A special class of Pufferfish instantiations Extends differential privacy using policies – Specification of sensitive information Allows more utility – Specification of publicly known constraints in the data Ensures privacy in correlated data Satisfies the composition property Summer @ Census, 8/15/201340
41
Blowfish Privacy A special class of Pufferfish instantiations Extends differential privacy using policies – Specification of sensitive information Allows more utility – Specification of publicly known constraints in the data Ensures privacy in correlated data Satisfies the composition property Summer @ Census, 8/15/201341
42
Sensitive Information Secrets: S be a set of potentially sensitive statements – “individual j’s record is in the data, and j has Cancer” – “individual j’s record is not in the data” Discriminative Pairs: Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”) – (“Bob has cancer”, “Bob has diabetes”) Summer @ Census, 8/15/201342
43
Sensitive information in Differential Privacy Spairs: – s i x : record i takes the value x – Attackers should not be able to significantly distinguish between any two values from the domain for any individual record. Summer @ Census, 8/15/201343
44
Other notions of Sensitive Information Medical Data – OK to infer whether individual is healthy or not. – E.g., (Bob is Healthy, Bob is Diabetes) is not a discriminative pair of secrets for any individual Partitioned Sensitive Information: Summer @ Census, 8/15/201344
45
Other notions of Sensitive Information Geospatial Data – Do not want the attacker to distinguish between “close-by” points in the space. – May distinguish between “far-away” points Distance based Sensitive Information Summer @ Census, 8/15/201345
46
Other notions of Sensitive Information Social Networks – Domain of individual’s record is the power set of V (nodes) Edge Privacy: Node Privacy: Summer @ Census, 8/15/201346
47
Generalization as a graph Consider a graph G = (V, E), where V is the set of values that an individual’s record can take. E encodes the set of discriminative pairs – Same for all records. Summer @ Census, 8/15/201347
48
Blowfish Privacy + “Policy of Secrets” A mechanism M satisfy blowfish privacy wrt policy G if – For every set of outputs of the mechanism S – For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E Summer @ Census, 8/15/201348
49
Blowfish Privacy + “Policy of Secrets” A mechanism M satisfy blowfish privacy wrt policy G if – For every set of outputs of the mechanism S – For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E For any x and y in the domain, Summer @ Census, 8/15/201349 Shortest distance between x and y in G
50
Blowfish Privacy + “Policy of Secrets” A mechanism M satisfy blowfish privacy wrt policy G if – For every set of outputs of the mechanism S – For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E Adversary is allowed to distinguish between x and y that appear in different disconnected components in G Summer @ Census, 8/15/201350
51
Algorithm1: Randomized Response Perturb each record in the table using the following distribution Non-interactive mechanism Summer @ Census, 8/15/201351
52
Algorithms for Blowfish Consider an ordered 1-D attribute – Dom = {x 1,x 2,x 3,…,x d } – E.g., ranges of Age, Salary, etc. Suppose our policy is: Adversary should not distinguish whether an individual’s value is x j or x j+1. Summer @ Census, 8/15/201352 x1x1 x2x2 x3x3 xdxd
53
Algorithms for Blowfish Suppose we want to release histogram privately – Number of individuals in each age range Any differentially private algorithm also satisfies blowfish – Can use Laplace mechanism (with sensitivity 2) Summer @ Census, 8/15/201353 x1x1 x2x2 x3x3 xdxd C(x 1 )C(x 3 )C(x d )
54
Ordered Mechanism We can answer a different set of queries to get a different private estimator for the histogram. Summer @ Census, 8/15/201354 x1x1 x2x2 x3x3 xdxd C(x 1 )C(x 3 )C(x d ) S3 S2 S1 Sd …
55
Ordered Mechanism We can answer each Si using Laplace mechanism … … but sensitivity for all the queries is only 1 Summer @ Census, 8/15/201355 x1x1 x2x2 x3x3 xdxd C(x 3 ) +1 S3 S2 S1 Sd … C(x 2 ) -1 Changing one tuple from x2 to x3 only changes S2
56
Ordered Mechanism We can answer each Si using Laplace mechanism … … but sensitivity for all the queries is only 1 Summer @ Census, 8/15/201356 Factor of 2 improvement
57
Ordered Mechanism In addition, we have the following constraint: However, the noisy counts may not satisfy this constraint. We can post-process the noisy counts to ensure this constraint: Summer @ Census, 8/15/201357
58
Ordered Mechanism We can post-process the noisy counts to ensure this constraint: Summer @ Census, 8/15/201358 Order of magnitude improvement for large d
59
Ordered Mechanism By leveraging the weaker sensitive information in the policy, we can provide significantly better utility Extends to more general policy specifications. Ordered mechanisms and other blowfish algorithms are being tested on the synthetic data generator for LODES data product. Summer @ Census, 8/15/201359
60
Blowfish Privacy & Correlations Differentially private mechanisms may not ensure privacy when correlations exist in the data. Blowfish can handle constraints in the form of publicly known constraints. – Well know marginal counts in the data – Other dependencies Privacy definition is similar to differential privacy with a modified notion of neighboring tables Summer @ Census, 8/15/201360
61
Other instantiations of Pufferfish All blowfish instantiations are extensions of differential privacy using – Weaker notions of sensitive information – Allowing knowledge of constraints about the data – All blowfish mechanisms satisfy composition property We can instantiate Pufferfish with other “realistic” adversary notions – Only prior distributions that are similar to the expected data distribution – Open question: Which definitions satisfy composition property? Summer @ Census, 8/15/201361
62
Summary Differential privacy (and the tuning knob epsilon) is insufficient for trading off privacy for utility in many applications – Sparse data, Social networks, … Pufferfish framework allows more expressive privacy definitions – Can vary sensitive information, adversary priors, and epsilon Blowfish shows one way to create more expressive definitions – Can provide useful composable mechanisms There is an opportunity to correctly tune privacy by using the above expressive privacy frameworks Summer @ Census, 8/15/201362
63
Thank you [M et al PVLDB’11] A. Machanavajjhala, A. Korolova, A. Das Sarma, “Personalized Social Recommendations – Accurate or Private?”, PVLDB 4(7) 2011 [Kifer-M SIGMOD’11] D. Kifer, A. Machanavajjhala, “No Free Lunch in Data Privacy”, SIGMOD 2011 [Kifer-M PODS’12] D. Kifer, A. Machanavajjhala, “A Rigorous and Customizable Framework for Privacy”, PODS 2012 [ongoing work] A. Machanavajjhala, B. Ding, X. He, “Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies”, in preparation Summer @ Census, 8/15/201363
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.