Download presentation
Presentation is loading. Please wait.
Published byEustacia Jones Modified over 9 years ago
1
slide 1 Differential Privacy Xintao Wu slides (P2-20) from Vitaly Shmatikove, then from Adam Smith
2
slide 2 Reading Assignment uDwork. “Differential Privacy: A Survey of Results”.
3
Basic Setting xnxn x n-1 x3x3 x2x2 x1x1 San Users (government, researchers, marketers, …) query 1 answer 1 query T answer T DB= random coins ¢¢¢ slide 3
4
Examples of Sanitization Methods uInput perturbation Add random noise to database, release uSummary statistics Means, variances Marginal totals Regression coefficients uOutput perturbation Summary statistics with noise uInteractive versions of the above methods Auditor decides which queries are OK, type of noise slide 4
5
Strawman Definition uAssume x 1,…,x n are drawn i.i.d. from unknown distribution uCandidate definition: sanitization is safe if it only reveals the distribution uImplied approach: Learn the distribution Release description of distribution or re-sample points uThis definition is tautological! Estimate of distribution depends on data… why is it safe? slide 5
6
Blending into a Crowd uIntuition: “I am safe in a group of k or more” k varies (3… 6… 100… 10,000?) uMany variations on theme Adversary wants predicate g such that 0 < #{i | g(x i )=true} < k uWhy? Privacy is “protection from being brought to the attention of others” [Gavison] Rare property helps re-identify someone Implicit: information about a large group is public slide 6
7
Clustering-Based Definitions uGiven sanitization S, look at all databases consistent with S uSafe if no predicate is true for all consistent databases uk-anonymity Partition D into bins Safe if each bin is either empty, or contains at least k elements uCell bound methods Release marginal sums slide 7 brownblue blond [0,12] 12 brown [0,14][0,16] 18 1416 brownblue blond 210 12 brown 126 18 1416
8
Issues with Clustering uPurely syntactic definition of privacy uWhat adversary does this apply to? Does not consider adversaries with side information Does not consider adversarial algorithm for making decisions (inference) slide 8
9
“Bayesian” Adversaries uAdversary outputs point z D uScore = 1/f z if f z > 0, 0 otherwise f z is the number of matching points in D uSanitization is safe if E(score) ≤ uProcedure: Assume you know adversary’s prior distribution over databases Given a candidate output, update prior conditioned on output (via Bayes’ rule) If max z E( score | output ) < , then safe to release slide 9
10
Issues with “Bayesian” Privacy uRestricts the type of predicates adversary can choose uMust know prior distribution Can one scheme work for many distributions? slide 10
11
Classical Intution for Privacy u“If the release of statistics S makes it possible to determine the value [of private information] more accurately than is possible without access to S, a disclosure has taken place.” [Dalenius 1977] Privacy means that anything that can be learned about a respondent from the statistical database can be learned without access to the database uSimilar to semantic security of encryption Anything about the plaintext that can be learned from a ciphertext can be learned without the ciphertext slide 11
12
Problems with Classic Intuition uPopular interpretation: prior and posterior views about an individual shouldn’t change “too much” What if my (incorrect) prior is that every UTCS graduate student has three arms? uHow much is “too much?” Can’t achieve cryptographically small levels of disclosure and keep the data useful Adversarial user is supposed to learn unpredictable things about the database slide 12
13
Impossibility Result uPrivacy: for some definition of “privacy breach,” distribution on databases, adversaries A, A’ such that Pr(A(San)=breach) – Pr(A’()=breach) ≤ For reasonable “breach”, if San(DB) contains information about DB, then some adversary breaks this definition uExample Vitaly knows that Alex Benn is 2 inches taller than the average Russian DB allows computing average height of a Russian This DB breaks Alex’s privacy according to this definition… even if his record is not in the database! slide 13 [Dwork]
14
Differential Privacy (1) xnxn x n-1 x3x3 x2x2 x1x1 San query 1 answer 1 query T answer T DB= random coins ¢¢¢ slide 14 uExample with Russians and Alex Benn Adversary learns Alex’s height even if he is not in the database uIntuition: “Whatever is learned would be learned regardless of whether or not Alex participates” Dual: Whatever is already known, situation won’t get worse Adversary A
15
Indistinguishability xnxn x n-1 x3x3 x2x2 x1x1 San query 1 answer 1 query T answer T DB= random coins ¢¢¢ slide 15 transcript S xnxn x n-1 y3y3 x2x2 x1x1 San query 1 answer 1 query T answer T DB’= random coins ¢¢¢ transcript S’ Differ in 1 row Distance between distributions is at most
16
? Definition: San is -indistinguishable if A, DB, DB’ which differ in 1 row, sets of transcripts S Adversary A query 1 answer 1 transcript S query 1 answer 1 transcript S’ p( San(DB) = S ) p( San(DB’)= S ) 1 ± Formalizing Indistinguishability slide 16
17
Diff. Privacy in Output Perturbation uIntuition: f(x) can be released accurately when f is insensitive to individual entries x 1, … x n uGlobal sensitivity GS f = max neighbors x,x’ ||f(x) – f(x’)|| 1 uTheorem: f(x) + Lap(GS f / ) is -indistinguishable Noise generated from Laplace distribution slide 17 Tell me f(x) f(x)+noise x1…xnx1…xn Database User Lipschitz constant of f
18
Sensitivity with Laplace Noise slide 18
19
Differential Privacy: Summary uSan gives -differential privacy if for all values of DB and Me and all transcripts t: slide 19 Pr [t] Pr[ San (DB - Me) = t] Pr[ San (DB + Me) = t] ≤ e 1
20
Intuition uNo perceptible risk is incurred by joining DB uAnything adversary can do to me, it could do without me (my data) Bad Responses: XXX Pr [response] slide 20
21
Differential Privacy u It aims to ensure that the inclusion or exclusion of a single individual from the database make no statistical difference to the query/mining results. u A differentially private algorithm provides an assurance that the probability of a particular output is almost the same whether or not Alice’s record is included. uThe privacy parameter controls the amount by which the distributions induced by two neighboring databases may differ (smaller values enforce a stronger privacy guarantee) u The definition is agnostic to auxiliary information an adversary may possess, and provides guarantees against arbitray attacks. slide 21
22
Property slide 22
23
Hot Research Topic uPINQ Interface for aggregate queries uDifferential Privacy Preserving Data Mining ID3 Decision tree K-means clustering Logistic regression u Contingency table release uRelaxations of strict differential privacy Shift from global sensitivity to local sensitivity Incorporate a smaller magnitude of added noise uApplications to social networks slide 23
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.