Download presentation
Presentation is loading. Please wait.
2
Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia Dwork and Joe Kilian
3
2 The Hospital Story Patient data q?q? a Medical DB
4
3 Easy Tempting Solution Observation: ‘harmless’ attributes uniquely identify many patients (gender, approx age, approx weight, ethnicity, marital status…)Observation: ‘harmless’ attributes uniquely identify many patients (gender, approx age, approx weight, ethnicity, marital status…) Worse:`rare’ attribute (CF 1/3000)Worse:`rare’ attribute (CF 1/3000) dd Mr. Smith Ms. John Mr. Doe A Bad Solution Idea: a. Remove identifying information (name, SSN, …) b. Publish data
5
4 Our Model: Statistical Database (SDB) {0,1} n d {0,1} n q [n] a q = i q d i Mr. Smith Ms. John Mr. Doe
6
5 The Privacy Game: Information-Privacy Tradeoff Private functions:Private functions: –want to hide i (d 1, …,d n )=d i Information functions:Information functions: –want to revealf q (d 1, …,d n )= i q d i Explicit definition of private functionsExplicit definition of private functions Crypto: secure function evaluationCrypto: secure function evaluation –want to reveal f() –want to hide all functions () not computable from f() –Implicit definition of private functions
7
6 Approaches to SDB Privacy [AW 89] Query RestrictionQuery Restriction –Require queries to obey some structure PerturbationPerturbation –Give `noisy’ or `approximate’ answers This talk
8
7 Perturbation Database: d = d 1,…,d n Database: d = d 1,…,d n Query: q [n] Query: q [n] Exact answer: a q = i q d i Exact answer: a q = i q d i Perturbed answer: â q Perturbed answer: â q Perturbation E: For all q: | â q – a q | ≤ E General Perturbation: Pr q [|â q – a q | ≤ E] = 1-neg(n) = 99%, 51% = 99%, 51%
9
8 Data perturbation: –Swapping [Reiss 84][Liew, Choi, Liew 85] –Fixed perturbations [Traub, Yemini, Wozniakowski 84] [Agrawal, Srikant 00] [Agrawal, Aggarwal 01] Additive perturbation d’ i =d i +E iAdditive perturbation d’ i =d i +E i Output perturbation: –Random sample queries [Denning 80] Sample drawn from query setSample drawn from query set –Varying perturbations [Beck 80] Perturbation variance grows with number of queriesPerturbation variance grows with number of queries –Rounding [Achugbue, Chin 79] Randomized [Fellegi, Phillips 74] … Perturbation Techniques [AW89]
10
9 Main Question: How much perturbation is needed to achieve privacy?
11
10 Privacy from Perturbation Privacy from n Perturbation Database: d R {0,1} n Database: d R {0,1} n On query q: On query q: 1. Let a q = i q d i 2. If |a q -|q|/2| > E return â q = a q 3. Otherwise return â q = |q|/2 Privacy is preserved Privacy is preserved, whp always use rule 3 – If E n (lgn) 2, whp always use rule 3 No information about d is given! No information about d is given! No usability! Can we do better? Smaller E ? Usability ??? (an example of a useless database)
12
11 Defining Privacy Elusive definitionElusive definition –Application dependent –Partial vs. exact compromise –Prior knowledge, how to model it? –Other issues … Defining Privacy (not) Defining Privacy Instead of defining privacy: What is surely non-private…Instead of defining privacy: What is surely non-private… –Strong breaking of privacy
13
12 Strong Breaking of Privacy The Useless Database Achieves Best Possible Perturbation: Perturbation << n Implies no Privacy! Main Theorem: Given a DB response algorithm with perturbation E << n, there is a poly- time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n).
14
13 d n bits (Recall â q = i q d i + pert q ) Decoding Problem: Given access to â q1,…, â q 2 n reconstruct d in time poly(n). encode 2 n subsets of [n] â q1 â q2 â q3 ’ The Adversary as a Decoding Algorithm
15
14 Where â q = i q d i mod 2 on 51% of the subsets The GL Algorithm finds in time poly(n) a small list of candidates, containing d d encode 2 n subsets of [n] n bits â q1 â q2 â q3 Goldreich-Levin Hardcore Bit Side remark
16
15 Comparing the Tasks RandomDependentQueries: n d’ s.t. dist(d,d’) < n (List decoding impossible) List decoding Decoding: Additive perturbation fraction of the queries deviate from perturbation Corrupt ½- of the queries Noise: a q = i q d i a q = i q d i (mod 2)Encoding: Side remark
17
16 Main Theorem: Given a DB response algorithm with perturbation E < n, there is a poly-time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n). Recall Our Goal: Perturbation << n Implies no Privacy!
18
17 Proof of Main Theorem The Adversary Reconstruction Algorithm Observation: An LP solution always exists, e.g. x=d. Query phase: Get â q j for t random subsets q 1,…,q t of [n] Query phase: Get â q j for t random subsets q 1,…,q t of [n] Weeding phase: Solve the Linear Program: Weeding phase: Solve the Linear Program: 0 x i 1 | i q j x i - â q j | E Rounding: Let c i = round(x i ), output c Rounding: Let c i = round(x i ), output c
19
18q Proof of Main Theorem Correctness of the Algorithm Consider x=(0.5,…,0.5) as a solution for the LP dx Observation: A random q often shows a n advantage either to 0’s or to 1’s. - Such a q disqualifies x as a solution for the LP dist(x,d) > n - We prove that if dist(x,d) > n, then whp there will be a q among q 1,…,q t that disqualifies x
20
19 `Imperfect’ perturbation:`Imperfect’ perturbation: –Can approximate the original bit string even if database answer is within perturbation only for 99% of the queries Other information functions:Other information functions: –Given access to “noisy majority” of subsets we can approximate the original bit-string. Extensions of the Main Theorem
21
20 Notes on Impossibility Results Exponential Adversary:Exponential Adversary: –Strong breaking of privacy if E << n Polynomial Adversary:Polynomial Adversary: –Non-adaptive queries –Oblivious of perturbation method and database distribution –Tight threshold E –Tight threshold E n What if adversary is more restricted?What if adversary is more restricted?
22
21 Bounded Adversary Model Database: d R {0,1} nDatabase: d R {0,1} n : If the is bounded by T, then there is a DB response algorithm with perturbation of ~T that maintains privacy.Theorem: If the number of queries is bounded by T, then there is a DB response algorithm with perturbation of ~ T that maintains privacy. With a reasonable definition of privacy
23
22 Summary and Open Questions Very high perturbation is needed for privacyVery high perturbation is needed for privacy –Threshold phenomenon – above n: total privacy, below n: none (poly-time adversary) –Rules out many currently proposed solutions for SDB privacy –Q: what’s on the threshold? Usability? Main tool: A reconstruction algorithmMain tool: A reconstruction algorithm –Reconstructing an n-bit string from perturbed partial sums/thresholds Privacy for a T-bounded adversary with a random databasePrivacy for a T-bounded adversary with a random database – T perturbation –Q: other database distributions Q: Crypto and SDB privacy?Q: Crypto and SDB privacy?
24
23 Our Privacy Definition (bounded adversary model) d -i i didi Fails w.p. > ½- … (transcript, i) R {0,1} n d R {0,1} n d
25
24 d aq1aq1aq1aq1 aq2aq2aq2aq2 aqtaqtaqtaqt aq3aq3aq3aq3 âq1âq1âq1âq1 âq2âq2âq2âq2 âqtâqtâqtâqt âq3âq3âq3âq3 d’ encode pert decode partial sumsperturbed sums The Adversary as a Decoding Algorithm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.