Download presentation
Presentation is loading. Please wait.
Published byReynard Simpson Modified over 6 years ago
1
De-identifying Health Data: Measuring and Controlling Disclosure Risk
Traian Marius Truta
2
Traian Marius Truta – DIMACS Tutorial
Content of the Talk Global Disclosure Risk Remove Identifiers Sampling Microaggregation Any combination of masking techniques Anonymity models Information Loss Greedy algorithms Constrained k-anonymity April 30, 2009 Traian Marius Truta – DIMACS Tutorial
3
Global Disclosure Risk Measures
Assumptions The intruder does not know any confidential information. The intruder knows all the key and identifier values for population. Objectives DR Measures for specific DC methods (Remove Identifiers, Sampling, Microaggregation, etc.). DR Measures for any combinations of DC methods. Proposed measures DRmin DRW Drmax [Truta 2003, Truta 2004] April 30, 2009 Traian Marius Truta – DIMACS Tutorial
4
Notations for IM and IMM
n – the number of entities in the population. F – the number of clusters with the same values for key attributes. Ak – the set of elements from the k-th cluster for all k, 1 k F. Fi = | {Ak | |Ak| = i, for all k = 1, .., F } | for all i, 1 i n. Fi represents the number of clusters with the same length. ni =| {x Ak | |Ak| = i, for all k = 1, .., F } | for all i, 1 i n. ni represents the number of records in clusters of length i. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
5
Traian Marius Truta – DIMACS Tutorial
Content of the Talk Global Disclosure Risk Remove Identifiers Sampling Microaggregation Any combination of masking techniques Anonymity models Information Loss Greedy algorithms Constrained k-anonymity April 30, 2009 Traian Marius Truta – DIMACS Tutorial
6
Disclosure Risk Measures for Remove Identifiers Method
RecID Age State Diagnosis Income Billing 1 44 MI AIDS 45,500 1,200 2 Asthma 37,900 2,500 3 55 67,000 3,000 4 21,000 1,000 5 90,000 900 6 45 Diabetes 48,000 750 7 25 IN 49,000 8 35 66,000 2,200 9 69,000 4,200 10 Tuberculosis 34,000 3,100 {1, 2, 4} {3, 5, 9} {6, 10} {7} {8} n =10 n1 = 2 n2 = 2 n3 = 6 F = 5 F1 = 2 F2 = 1 F3 = 2 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
7
Disclosure Risk Measures for Remove Identifiers Method
- percentage of unique records. - considers probabilistic linkage. - weights defined by data owner. w = (w1, w2, …, wN) disclosure risk weight vector. Properties a) wi R+ for all i = 1, .. , n; b) wi wj for all i j, i,j = 1, .. , n; April 30, 2009 Traian Marius Truta – DIMACS Tutorial
8
Disclosure Risk Measures for Remove Identifiers Method
RecID Age State Diagnosis Income Billing 1 44 MI AIDS 45,500 1,200 2 Asthma 37,900 2,500 3 55 67,000 3,000 4 21,000 1,000 5 90,000 900 6 45 Diabetes 48,000 750 7 25 IN 49,000 8 35 66,000 2,200 9 69,000 4,200 10 Tuberculosis 34,000 3,100 n =10 n1 = 2 n2 = 2 n3 = 6 F = 5 F1 = 2 F2 = 1 F3 = 2 w1 = (5, 5, 0, 0, ..., 0) w2 = (4, 3, 3, 0, ..., 0) DRmin DRw1 DRw2 DRmax 0.2 0.3 0.425 0.5 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
9
Traian Marius Truta – DIMACS Tutorial
Content of the Talk Global Disclosure Risk Remove Identifiers Sampling Microaggregation Any combination of masking techniques Anonymity models Information Loss Greedy algorithms Constrained k-anonymity April 30, 2009 Traian Marius Truta – DIMACS Tutorial
10
Notations for Masked Microdata
f – the number of clusters with the same values for key attributes in M. We cluster all records from M based on their key values. Bk – the set of elements from the k-th cluster for all k, 1 k f. fi = | {Bk | |Bk| = i, for all k = 1, .., f } | for all i, 1 i n. fi represents the number of clusters with the same length. ti =| {x Bk | |Bk| = i, for all k = 1, .., f } | for all i, 1 i n. ti represents the number of records in clusters of length i. C – the classification matrix. For all i, j = 1, .., n; cij ==| {x Bk and x Ap | |Bk| = i, for all k = 1, .., f and |Ap| = j, for all p = 1, .., F }|. Each element of C, cij, represents the number of records that appears in clusters of size i in the masked microdata and appeared in clusters of size j in the initial masked microdata. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
11
Disclosure Risk Measures for Sampling
RecID Age State Diagnosis Income Billing 1 44 MI AIDS 45,500 1,200 2 Asthma 37,900 2,500 3 55 67,000 3,000 4 21,000 1,000 5 90,000 900 6 45 Diabetes 48,000 750 7 25 IN 49,000 8 35 66,000 2,200 9 69,000 4,200 10 Tuberculosis 34,000 3,100 n =10 n1 = 2 n2 = 2 n3 = 6 F = 5 F1 = 2 F2 = 1 F3 = 2 t = 5 t1 = 2 t2 = 0 t3 = 3 f = 3 f1 = 2 f2 = 0 f3 = 1 RecID Age State Diagnosis Income Billing 1 44 MI AIDS 45,500 1,200 2 Asthma 37,900 2,500 4 21,000 1,000 8 35 66,000 2,200 9 55 69,000 4,200 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
12
Algorithm for Creating Classification Matrix
Initialize each element from C with 0. For each element s from masked microdata MM do Count the number of occurrences of key values of s in masked microdata MM.Let i be this number. Count the number of occurrences of key values of s in initial microdata IM.Let j be this number. Increment cij by 1. End for. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
13
Disclosure Risk Measures for Sampling
- disclosure risk weight vector April 30, 2009 Traian Marius Truta – DIMACS Tutorial
14
Disclosure Risk Measures for Sampling
RecID Age State Diagnosis Income Billing 1 44 MI AIDS 45,500 1,200 2 Asthma 37,900 2,500 3 55 67,000 3,000 4 21,000 1,000 5 90,000 900 6 45 Diabetes 48,000 750 7 25 IN 49,000 8 35 66,000 2,200 9 69,000 4,200 10 Tuberculosis 34,000 3,100 RecID Age State Diagnosis Income Billing 1 44 MI AIDS 45,500 1,200 2 Asthma 37,900 2,500 4 21,000 1,000 8 35 66,000 2,200 9 55 69,000 4,200 DRmin DRw1 DRw2 DRmax 0.1 0.144 0.233 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
15
Traian Marius Truta – DIMACS Tutorial
Content of the Talk Global Disclosure Risk Remove Identifiers Sampling Microaggregation Any combination of masking techniques Anonymity models Information Loss Greedy algorithms Constrained k-anonymity April 30, 2009 Traian Marius Truta – DIMACS Tutorial
16
Disclosure Risk Measures for Microaggregation Method
Initial Microdata RecID Name SSN Age Sex Diagnosis 1 John Wayne 8 Male AIDS 2 Pete Gore 10 Asthma 3 John Banks 19 4 Jessica Casey 23 Female 5 Mary Stone 37 6 Patricia Kopi 43 Diabetes 7 Stan Simms 68 Kim Wood 72 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
17
Disclosure Risk Measures for Microaggregation Method
Univariate microaggregation for attribute Age and size = 2,4,8. RecID Age Sex Diagnosis 1 9 Male AIDS 2 Asthma 3 21 4 Female 5 40 6 Diabetes 7 70 8 RecID Age Sex Diagnosis 1 15 Male AIDS 2 Asthma 3 4 Female 5 55 6 Diabetes 7 8 RecID Age Sex Diagnosis 1 35 Male AIDS 2 Asthma 3 4 Female 5 6 Diabetes 7 8 Masked Microdata 1 Masked Microdata 2 Masked Microdata 3 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
18
Disclosure Risk Measures for Microaggregation Method
April 30, 2009 Traian Marius Truta – DIMACS Tutorial
19
Disclosure Risk Measures for Microaggregation Method
Example – Disclosure risk values W1 W2 W3 W4 MM1 0.5 0.75 0.612 MM2 0.25 0.367 MM3 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
20
Disclosure Risk Measures for Sampling and Microaggregation Methods
April 30, 2009 Traian Marius Truta – DIMACS Tutorial
21
Disclosure Risk Measures for Sampling and Microaggregation Methods
April 30, 2009 Traian Marius Truta – DIMACS Tutorial
22
Global Disclosure Risk Measures
Remove Identifiers Sampling Microaggregation, Top and Bottom Coding, etc. Combination of those methods This approach does not work for Random Noise, Data Swapping Global and Local Recoding, etc. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
23
Traian Marius Truta – DIMACS Tutorial
Content of the Talk Global Disclosure Risk Remove Identifiers Sampling Microaggregation Any combination of masking techniques Anonymity models Information Loss Greedy algorithms Constrained k-anonymity April 30, 2009 Traian Marius Truta – DIMACS Tutorial
24
General Disclosure Risk Measures
Ordered Attribute Partial Ordered Attribute Unordered Attribute Inversion Week and strong change Change Inversion Factor Change Factor Inversion-Change Factor April 30, 2009 Traian Marius Truta – DIMACS Tutorial
25
Inversion Factor Inversions for Age: (1, 2); (3, 4) and (3, 5)
Initial Microdata Masked Microdata RecID Age Zip Diagnosis Income 1 17 48202 AIDS 17,000 2 24 68,000 3 44 48201 Asthma 80,000 4 55 48310 55,000 5 71 Diabetes 23,000 RecID Age Zip Diagnosis Income 1 34 48202 AIDS 17,000 2 24 68,000 3 81 48201 Asthma 80,000 4 55 48310 55,000 5 71 Diabetes 23,000 Inversions for Age: (1, 2); (3, 4) and (3, 5) ifAge = 3 / 5 = 0.6 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
26
Change Factor Strong change for Zip: 4 Weak change for Zip: 3
Initial Microdata Masked Microdata RecID Age Zip Diagnosis Income 1 17 48202 AIDS 17,000 2 24 68,000 3 44 48201 Asthma 80,000 4 55 48310 55,000 5 71 Diabetes 23,000 RecID Age Zip Diagnosis Income 1 17 48202 AIDS 17,000 2 24 68,000 3 44 48235 Asthma 80,000 4 55 89340 55,000 5 71 48310 Diabetes 23,000 Strong change for Zip: 4 Weak change for Zip: 3 cfAge = (wcf(48201, 48235) + 1 ) / 5 = ( ) / 5 = 0.28 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
27
General Disclosure Risk Measures
icfk – inversion-change factor for attribute k. p – number of key attributes. v – binary vector associated to key attribute. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
28
General Disclosure Risk Measures
Lemma For every disclosure risk weights matrix W the following relations are true: DRmin DRW DRmax For every disclosure risk weights matrix W, 0 DRW 1 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
29
General Disclosure Risk Measures
Initial Microdata Masked Microdata RecID SSN Zip Code Age Gender 1 48202 20 M 2 25 F 3 30 4 48201 35 5 6 48310 40 7 8 67890 42 9 48319 10 RecID Zip Code Age Gender 1 482 20 M 2 25 F 3 4 35 5 6 483 7 40 8 678 30 9 42 10 Age attribute value for records 2, 3, 4, 6, 8, 9 and 10 are changed, followed by global recoding for Zip Code attribute. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
30
General Disclosure Risk Measures
K(1,1,1) = {Zip, Age, Sex} K(1,0,1) = {Zip, Sex} K(0,1,1) = {Age, Sex} K(0,0,1) = {Sex} Result V DRmin 0.176 0.12 0.146 vmin=(1,1,1) DRW 0.205 0.220 vW=(0,1,1) DRmax 0.308 0.24 0.440 0.2 vmax=(0,1,1) April 30, 2009 Traian Marius Truta – DIMACS Tutorial
31
Traian Marius Truta – DIMACS Tutorial
Experimental Data Simulated medical record billing data. Age, Sex, Zip and Amount_Billed. Three initial microdata: n = 1,000 (called IM1000). n = 5,000 (IM5000). n = 25,000 (IM25000). Key attributes: KA = {Age, Sex, Zip}. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
32
Sampling when KA is the set of key attributes
Results for Sampling Sampling when KA is the set of key attributes April 30, 2009 Traian Marius Truta – DIMACS Tutorial
33
Results for Sampling and Microaggregation
Sampling, followed by microaggregation for Age when IM5000 and KA are used. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
34
Results for Sampling and Microaggregation
Sampling and microaggregation for Age when IM5000 and KA are used. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
35
Traian Marius Truta – DIMACS Tutorial
Content of the Talk Global Disclosure Risk Remove Identifiers Sampling Microaggregation Any combination of masking techniques Anonymity models Information Loss Greedy algorithms Constrained k-anonymity April 30, 2009 Traian Marius Truta – DIMACS Tutorial
36
K-anonymization by Clustering
Let IM be the initial microdata set. The k-anonymization by clustering problem is to find a partition S = {cl1, cl2, … , clv} of IM, where clj IM, j=1..v, are called clusters and: IM ; , i, j=1..v, ij ; |clj | k, j=1..v ; is minimized. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
37
Generalization Information
Let cl = {r1, r2, …, rq} S be a cluster, KN = {N1, N2, ..., Ns} be the set of numerical quasi-identifier attributes and KC = {C1, C2,,…, Ct} be the set of categorical quasi-identifier attributes. The generalization information of cl, w.r.t. quasi-identifier attribute set K = KN KC is the “tuple” gen(cl), having the scheme K, where: For each categorical attribute Cj K , gen(cl)[Cj] = the lowest common ancestor in HCj of {r1[Cj], …, rq[Cj]}; For each numerical attribute Cj K , gen(cl)[Cj] = the interval [min{r1[Cj], …, rq[Cj]}, max{r1[Cj], …, rq[Cj]}]. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
38
Traian Marius Truta – DIMACS Tutorial
Content of the Talk Global Disclosure Risk Remove Identifiers Sampling Microaggregation Any combination of masking techniques Anonymity models Information Loss Greedy algorithms Constrained k-anonymity April 30, 2009 Traian Marius Truta – DIMACS Tutorial
39
Traian Marius Truta – DIMACS Tutorial
Two IL Measures Discernability metric (DM) penalizes each tuple with the size of the group it belongs. intuitively, the ideal grouping is the one in which all groups have size k. DM (S) = [Bayardo 2005] Normalized average cluster size metric (AVG) inversely proportional with the number of clusters (v). minimizing AVG is equivalent to maximizing the total number of clusters. AVG (S) = [LeFevre 2006] April 30, 2009 Traian Marius Truta – DIMACS Tutorial
40
Information Loss for a Cluster
April 30, 2009 Traian Marius Truta – DIMACS Tutorial
41
Total Information Loss
Total information loss for a solution S = {cl1, cl2, … , clv} of the k-anonymization by clustering problem is the sum of the information loss measure for all the clusters in S. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
42
Traian Marius Truta – DIMACS Tutorial
Content of the Talk Global Disclosure Risk Remove Identifiers Sampling Microaggregation Any combination of masking techniques Anonymity models Information Loss Greedy algorithms Constrained k-anonymity April 30, 2009 Traian Marius Truta – DIMACS Tutorial
43
Algorithms for k-clustering problem
Greedy_k-member_Clustering [Byun 2006] 1. Create one cluster cl with one tuple randomly selected from IM. 2. Finds the “closer” wrt IL tuple r. Add r to cl. 3. Repeat step 2 until cl has k tuples. 4. Save cl in the set of final clusters S. 5. IM = IM – cl. 6. Restart from 1 with the new IM . Note: The last IM (with size less than k) are added to the last computed cluster. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
44
Other Algorithms for K-Anonymity
The general version of optimal k-anonymization for a microdata is a NP-hard problem. [Aggarwal 2006, Meyerson 2004] Curse of dimensionality – for many QI attributes. [Aggarwal 2005] Binary Search [Samarati 2001] Incognito [LeFevre 2005] Mondrian [LeFevre 2006] Genetic [Samarati 2001] Clustering [Aggarwal 2006, Byun 2006] April 30, 2009 Traian Marius Truta – DIMACS Tutorial
45
Traian Marius Truta – DIMACS Tutorial
Content of the Talk Global Disclosure Risk Remove Identifiers Sampling Microaggregation Any combination of masking techniques Anonymity models Information Loss Greedy algorithms Constrained k-anonymity [Miller 2008] April 30, 2009 Traian Marius Truta – DIMACS Tutorial
46
Maximum Allowed Generalization Value
Let Q be a quasi-identifier attribute (categorical or numerical), and HQ its predefined value generalization hierarchy. For every leaf value v HQ, the maximum allowed generalization value of v, denoted by MAGVal(v), is the value (leaf or not-leaf) in HQ situated on the path from v to the root, such that: for any released microdata, the value v is permitted to be generalized only up to MAGVal(v) and when several MAGVals exist on the path between v and the hierarchy root, then the MAGVal(v) is the first MAGVal that is reached when following the path from v to the root node. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
47
Maximum Allowed Generalization Value
*Kansas* United States *Midwest* *California* Nebraska San Diego Los Angeles Wichita Kansas City Lincoln West Coast MAGVal(“San Diego”) = “California”. MAGVal(“Wichita”) = “Kansas” (Part 2 of the definition !). MAGVal(“Lincoln”) = “Midwest”. MAGSet(Country) = {“California”, “Kansas”, “Midwest”}. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
48
Constrained K-Anonymity
Constraint Violation We say that the masked microdata MM has a constraint violation if one quasi-identifier value, v, in IM, is generalized in one tuple in MM beyond its specific maximal generalization value, MAGVal(v). Constrained K-Anonymity The masked microdata MM satisfies the constrained k-anonymity property if it satisfies k-anonymity and it does not have any constraint violation. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
49
Constrained K-Anonymity Example
RecID Name Age Location Sex Race Diagnosis 1 Alice 32 San Diego Male White AIDS 2 Bob 30 Los Angeles Asthma 3 Charley 42 Wichita 4 Dave Kansas City 5 Eva 35 Lincoln Female Diabetes 6 John 20 Black 7 Casey 25 We use the MAGVals defined for the Location attribute. For Age, Sex, and Race we assume the root of the hierarchy is the MagVal or every leaf value (these QI attributes are unconstrained). April 30, 2009 Traian Marius Truta – DIMACS Tutorial
50
Constrained K-Anonymity Example
RecID Age Location Sex Race Diagnosis 1 30-32 California Male White AIDS 2 Asthma 3 30-42 Midwest * 4 5 Diabetes 6 20-25 Black 7 Satisfies 2-anonymity. Does not satisfy constrained 2-anonymity (Kansas City and Wichita are generalized past their MAGVals). April 30, 2009 Traian Marius Truta – DIMACS Tutorial
51
Constrained K-Anonymity Example
RecID Age Location Sex Race Diagnosis 1 30-32 California Male White AIDS 2 Asthma 3 25-42 Kansas * 4 5 20-35 Midwest Diabetes 6 7 Satisfies 2-anonymity and constrained 2-anonymity. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
52
Traian Marius Truta – DIMACS Tutorial
A Few Definitions Constrained K-Anonymization by Clustering Find a partition S = {cl1, cl2, … , clv, clv+1} . clv+1 – tuples that must be suppressed (minimum set). Cost measure is optimized (IL, see its definition in paper). Generalization Information gen(cl) – the least generalized tuple that represents the entire cluster cl. Maximum Allowed Microdata (MAM) Every QI value v is generalized to MAGVal(v). April 30, 2009 Traian Marius Truta – DIMACS Tutorial
53
Traian Marius Truta – DIMACS Tutorial
A Few Properties For a given IM, if its maximum allowed microdata MAM is not k-anonymous, then any masked microdata obtained from IM by applying generalization only will not satisfy constrained k-anonymity. If MAM satisfies k-anonymity then MAM satisfies the constrained k-anonymity property. An initial microdata, IM, can be masked to comply with constrained k-anonymity using only generalization if and only if its corresponding MAM satisfies k-anonymity. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
54
Can constrained k-anonymity be achieved?
Can an initial microdata IM can be masked to satisfy the constrained k-anonymity property using generalization only? We follow the next two steps: Compute MAM for IM. This is done by replacing each quasi-identifier attribute value with its corresponding MAGVal. If all QI-clusters from MAM have at least k entities than the IM can be masked to satisfy constrained k-anonymity. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
55
Traian Marius Truta – DIMACS Tutorial
More Properties OUT - represents all entities from QI-clusters from MAM with size < k. IM \ OUT can be masked using generalization only to comply with constrained k-anonymity. Any subset of IM that contains one or more entities from OUT cannot be masked using generalization only to achieve constrained k-anonymity. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
56
Traian Marius Truta – DIMACS Tutorial
Algorithm GreedyCKA Input IM – initial microdata; k – as in k-anonymity; Output S ={cl1,cl2,… clv,clv+1} - a solution for the constrained k-anonymization for IM; Compute MAM and OUT; S = ; For each QI-cluster from MAM \ OUT, cl, { // By cl we refer to the entities from IM that are clustered together in MAM. S’ = Greedy_k-member_Clustering(cl, k); // [Byun] S = S S’; } v = | S |; clv+1 = OUT; End GreedyCKA; April 30, 2009 Traian Marius Truta – DIMACS Tutorial
57
GreedyCKA - Two-Stage Process
Initial microdata IM QI-clusters in MAM Stage 1, forming MAM Suppressed tuples Stage 2, apply a k-anonymization algorithm on every MAM cluster with more than k elements. Final QI-clusters in IM, k=3 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
58
Traian Marius Truta – DIMACS Tutorial
Test Data Adult dataset from the UC Irvine Machine Learning Repository. [Newman 1998] QI = {Education-num, Work-class, Marital-status, Occupation, Race, Sex, Age, Native-country}. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
59
MAGVals and Generalization Hierarchies
USA *America* Country *Africa* *North_A* *Asia* *Europe* South_A *Central_A* *West_E* East_E West_A *East_A* *North_Af* South_Af Greece Italy … Canada South Africa *0-19* 0-100 0-9 60-100 10-19 *20-29* *60-69* *70-100* … 1 *30-39* *40-49* 20-59 *50-59* 100 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
60
Information Loss Results
April 30, 2009 Traian Marius Truta – DIMACS Tutorial
61
Traian Marius Truta – DIMACS Tutorial
Running Time Results April 30, 2009 Traian Marius Truta – DIMACS Tutorial
62
Constraint violations in Greedy_k-member_Clustering
No of constraint violations for 1 constrained attribute – native_country No of constraint violations for 2 constrained attributes – native_country, age 2 605 2209 3 991 3824 4 1377 5297 5 1657 6163 6 1906 6964 7 2198 7743 8 2354 8417 9 2550 8931 10 2728 9549 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
63
Number of tuples suppressed by GreedyCKA
2 3 4 5 6 7 8 9 10 No of suppressed tuples for 1 constrained attribute – native_country No of suppressed tuples for 2 constrained attributes– native_country, age 15 24 28 48 60 81 97 106 April 30, 2009 Traian Marius Truta – DIMACS Tutorial
64
Traian Marius Truta – DIMACS Tutorial
References Aggarwal C. (2005), On k-Anonymity and the Curse of Dimensionality, Proceedings of the Very Large Databases (VLDB), 223 – 228. Aggarwal G., Feder T., Kenthapadi K., Motwani R., Panigrahy R., Thomas D., and Zhu A. (2005), Anonymizing Tables, Proceedings of the 10th International Conference on Database Theory, 246 – 258. Bayardo R.J, Agrawal R. (2005), Data Privacy through Optimal k-Anonymization, In Proceedings of the IEEE ICDE, 217 – 228. Byun J.W., Kamra A., Bertino E, Li N. (2006), Efficient k-Anonymity using Clustering Technique, CERIAS Tech Report LeFevre K., DeWitt D., Ramakrishnan R. (2005), Incognito: Efficient Full-Domain K-Anonymity, Proceedings of the ACM SIGMOD, Baltimore, Maryland, 49 – 60. LeFevre K., DeWitt D., Ramakrishnan R. (2006), Mondrian Multidimensional K-Anonymity, Proceedings of the IEEE International Conference of Data Engineering, Atlanta, Georgia, 25. Meyerson A., Williams R. (2004), On the Complexity of Optimal k-Anonymity, Proceedings of the ACM PODS Conference, 223 – 228. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
65
Traian Marius Truta – DIMACS Tutorial
References Miller J., Campan A., Truta T.M. (2008), Constrained K-Anonymity: Privacy with Generalization Boundaries, Proceedings of the Practical Preserving Data Mining Workshop (P3DM2008), In Conjunction with SIAM Conference on Data Mining (SDM), Atlanta. Newman D.J., Hettich S., Blake C.L., Merz C.J. (1998), UCI Repository of Machine Learning Databases. Samarati P. (2001), Protecting Respondents Identities in Microdata Release, IEEE Transactions on Knowledge and Data Engineering, Vol. 13, No. 6, 1010 – 1027. Truta T.M., Fotouhi F., Barth-Jones D. (2003), Privacy and Confidentiality Management for the Microaggregation Disclosure Control Method, Proceedings of the Workshop on Privacy and Electronic Society, In Conjunction with 10th ACM CCS, Washington DC, 21 – 30. Truta T.M., Fotouhi F., Barth-Jones D. (2003), Disclosure Risk Measures for Microdata, Proceedings of the International Conference on Scientific and Statistical Database Management, Cambridge, Ma, 15 – 22. Truta T.M., Fotouhi F., Barth-Jones D. (2004), Disclosure Risk Measures for Sampling Disclosure Control Method, Proceedings of ACM Symposium on Applied Computing, Truta T.M., Fotouhi F., Barth-Jones D. (2004), Assessing Global Disclosure Risk Measures in Masked Microdata, Proceedings of the Workshop on Privacy and Electronic Society, In Conjunction with 11th ACM CCS, Washington DC, 85 – 93. April 30, 2009 Traian Marius Truta – DIMACS Tutorial
66
Traian Marius Truta – DIMACS Tutorial
Questions April 30, 2009 Traian Marius Truta – DIMACS Tutorial
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.