Download presentation
Presentation is loading. Please wait.
Published byImogene Riley Modified over 9 years ago
1
Database Privacy (ongoing work) Shuchi Chawla, Cynthia Dwork, Adam Smith, Larry Stockmeyer, Hoeteck Wee
2
Shuchi Chawla, Carnegie Mellon University 2 You are being watched! Databases abound… Population Census Market Research Used for statistical analysis explaining phenomena making predictions Prone to malicious use using an individual’s information for marketing, discrimination
3
Shuchi Chawla, Carnegie Mellon University 3 The Privacy vs. Utility trade-off Inherent tension between Privacy and Utility One extreme – no information; complete privacy Other extreme – complete information; no privacy We want a middle path: - Preserve macroscopic properties statistical/distributional information clustering information - “Disguise” individual identifying information
4
Shuchi Chawla, Carnegie Mellon University 4 What is privacy? [Gavison] Protection from being brought to the attention of others inherently valuable attention invites further privacy loss Each individual should blend in a sufficiently large crowd
5
Shuchi Chawla, Carnegie Mellon University 5 Application-oriented approaches Statistical approaches Alter the frequency of particular features, while preserving means. Alternately, erase records that reveal too much Do not consider possible privacy breach from combining information from different records Query-based approaches Disallow queries that reveal too much Combination of seemingly innocuous queries could reveal individual traits Only good for specific applications
6
Shuchi Chawla, Carnegie Mellon University 6 Towards a general approach Allow arbitrary tests and queries Preserve macroscopic properties, but not individual records Approach: “perturb” individual records appropriately and publish the entire dataset Perturbation has to be probabilistic
7
Shuchi Chawla, Carnegie Mellon University 7 A geometric view A first-attempt – an oversimplified abstract model Simplifying assumption : each attribute is real-valued Think metric space Real Database (RDB) n unlabeled points in d-dimensional space. Sanitized Database (SDB) n new points possibly in a different space.
8
Shuchi Chawla, Carnegie Mellon University 8 The adversary or Isolator Using SDB and auxiliary information (AUX), outputs a point q q “isolates” a real point x, if it is very close to x, but not to many other real points. No way of obtaining privacy if AUX already reveals too much! SDB compromises privacy if the adversary is able to increase his probability of isolating a point considerably by looking at it
9
Shuchi Chawla, Carnegie Mellon University 9 Isolation – a relative notion (c-1) Tightly clustered points have a smaller radius of isolation T-radius of x – distance to its T-nearest neighbor x is isolated if B(q,c ) contains less than T points x is “safe” if distance between x and q is more than T-radius/(c-1) c – privacy parameter; constant q x cc
10
Shuchi Chawla, Carnegie Mellon University 10 Our contribution A precise definition of privacy using T-radii A perturbation algorithm, closely linked to the definition of privacy Prove that the algorithm preserves privacy under reasonable assumptions Working towards showing that macroscopic properties are preserved
11
Shuchi Chawla, Carnegie Mellon University 11 What about the real world? Lessons from the abstract model High dimensionality is our friend Outliers Our notion of c-isolation deals with them – they get perturbed by a very large amount Existence of outlier may be disclosed Put more on this slide…
12
Shuchi Chawla, Carnegie Mellon University 12 What about Outliers? Bill Gates example here Reconsider definition of privacy do not want to disclose existence of outlier do not want to disclose anything about outlier do not want to disclose identity of outlier c-isolation falls in the third category
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.