Download presentation
Presentation is loading. Please wait.
1
http://theory.stanford.edu/~rajeev/privacy.html Anonymizing Tables for Privacy Protection Gagan Aggarwal, Tomás Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, An Zhu
2
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 2 An example: Medical Records IdentifyingSensitive SSNNameAgeRaceZipcod e Disease 614Sara31Cauc94305Flu 615Joan34Cauc94307Cold 629Kelly27Cauc94301Diabetes 710Mike41Afr-A94305Flu 840Carl41Afr-A94059Arthritis 780Joe65Hisp94042Heart problem 614Rob46Hisp94042Arthritis
3
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 3 Medical Records: De-identify & Release Sensitive AgeRaceZipcod e Disease 31Cauc94305Flu 34Cauc94307Cold 27Cauc94301Diabetes 41Afr-A94305Flu 41Afr-A94059Arthritis 65Hisp94042Heart problem 46Hisp94042Arthritis
4
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 4 Not sufficient! [Swe02, SS98] Public Database Uniquely identify you! Sensitive AgeRaceZipcod e Disease 31Cauc94305Flu 34Cauc94307Cold 27Cauc94301Diabetes 41Afr-A94305Flu 41Afr-A94059Arthritis 65Hisp94042Heart problem 46Hisp94042Arthritis Quasi-identifiers: reveal less information k-anonymity model
5
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 5 k-anonymity – Problem Definition Input: Database consisting of n rows, each with m attributes drawn from a finite alphabet. Goal: Suppress some entries in the table such that each modified row becomes identical to at least k-1 other rows. More the suppression, lesser the utility of the modified table. Objective: Minimize the number of suppressed entries.
6
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 6 Medical Records: 2-anonymized table AgeRaceZipcodeDisease *Cauc*Flu *Cauc*Cold *Cauc*Diabetes 41Afr-A*Flu 41Afr-A*Arthritis *Hisp94042Heart problem *Hisp94042Arthritis Suppress entriesCost = 10
7
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 7 k-anonymity – Results [MW04] NP-hardness for a linear size alphabet O(k log k) - approximation algorithm NP-hardness (even for ternary alphabet) O(k) - approximation for k-anonymity 1.5 - approximation for 2-anonymity 2 - approximation for 3-anonymity
8
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 8 O(k)-approximation algorithm (for k = 3) Create a complete graph s.t. Each row vector in the table is a vertex. Weight of an edge is the number of attributes on which the two rows differ (Hamming distance). AgeRaceZipcod e 31Cauc94305 34Cauc94307 41Afr-A94305 41Afr-A94059 2 2 1 3 3 3
9
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 9 O(k)-approximation algorithm (for k = 3) We create a forest as follows: Each node picks its nearest neighbor and connects to it. If the resulting graph has a component with only two nodes, connect this component to the second nearest neighbor of one of the two nodes.
10
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 10 An example graph 7 9 4 3 2 7 2 3 5 10 12 9 5 1 1 Nearest-neighbor edge Other edges 7
11
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 11 The forest obtained 4 3 2 2 3 1 1
12
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 12 O(k)-approximation algorithm (for k = 3) The forest has: Components of size at least 3. The total cost of edges in the forest is no more than the cost of the optimal solution. In optimal solution, each node has at least as many *s as its Hamming distance to its second nearest neighbor. Each node has at most as many *s as the cost of the tree containing the node. If there is any component with size greater than 5, break it into components of size at least 3 (resp. k).
13
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 13 The final partition 3 3 2 2 3 1 1 4
14
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 14 Analysis of the algorithm Cluster the row vectors according to this partition Cost incurred ≤ OPT * (size of largest partition) = 5 * OPT. For general k, the cost of this solution is within max{3k-5,2k-1} of the cost of optimal solution.
15
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 15 Better than O(k)-approximation? Not possible, using only the graph representation Lose information about the structure of the problem There exist two instances with: Same underlying graph k-anonymity costs differing by a factor of O(k)
16
Krishnaram KenthapadiPORTIA Workshop, 8 July 2004 16 Open problems Lower bounds on the approximation factor (without assuming the graph representation) Extend the k-anonymity model to account for changes in the database: Handle inserts, deletes and updates
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.