Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis.

Similar presentations


Presentation on theme: "Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis."— Presentation transcript:

1 Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis University of Hong Kong (HKU) Panos Kalnis King Abdullah University of Science and Technology (KAUST)

2 2 Motivation  Attacker can see up to m items Any m items No distinction between sensitive and non-sensitive items 0% Milk Pregnancy test Beer Helen

3 3 Motivation (cont.) Helen: Beer, 0% Milk, Pregnancy test John: Cola, Cheese Tom: 2% Milk, Coffee …. Mary: Wine, Beer, Full-fat Milk Database t1: Beer, 0%Milk, Pregnancy test t2: Cola, Cheese t3: 2% Milk, Coffee …. tn: Wine, Beer, Full-fat Milk Published Attacker Find all transactions that contain Beer & 0% Milk t1: Beer, Milk, Pregnancy test t2: Cola, Cheese t3: Milk, Coffee …. tn: Wine, Beer, Milk

4 4 k m -anonymity Set of items Transaction Database Query terms k m -anonymity:

5 5 Related Work: K-Anonymity [Swe02] AgeZipCodeDisease 4225000Flu 4635000AIDS 5020000Cancer 5440000Gastritis 4850000Dyspepsia 5655000Bronchitis [Swe02] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002. (a) Microdata Quasi-identifier AgeZipCodeDisease 42-4625000-35000Flu 42-4625000-35000AIDS 50-5420000-40000Cancer 50-5420000-40000Gastritis 48-5650000-55000Dyspepsia 48-5650000-55000Bronchitis (a) 2-anonymous microdata NOT suitable for high-dimensionality

6 6 Related Work: L-diversity in Transactions [GTK08] G. Ghinita, Y. Tao, P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data”, ICDE, 2008 Requires knowledge of (non)-sensitive attributes

7 7 Our Approach: Employs Generalization Generalization Hierarchy Information loss k=2 m=2

8 8 Lattice of Generalizations

9 9 Optimal Algorithm      Q:    Q:    Q: 

10 10 Count Tree 11111 1 1 1 1 1 1  All generalized forms of the paths reside in the tree  We can find easily which anonymizations are needed

11 11 Apriori-based Anonymization  Global Optimal vs Local Optimal  Solution for each path  We examine the paths  By size (A priori principle)  Paths with invalid nodes are skipped

12 12 Apriori-based Anonymization 1. Initialize gen_map 2. For i := 1 to m do 1. For all t  D do 1. Extend t acccording to gen_map 2. Add all i-subsets of extended t to count-tree 3. Check all paths in count tree and update gen_map

13 13 Small Datasets (2-15K, BMS-WebView2)  |I|=40..60, k=100, m=3

14 14 Small Datasets (BMS-WebView2)  |D|=10K, k=100, m=1..4

15 15 Apriori Anonymization for Large Datasets 500sec 10sec 100sec |D||D||I||I| 515K1657 59K497 77K3340  k=5  m=3

16 16 Points to Remember  Anonymization of Transactional Data Attacker knows m items Any m items can be the quasi-identifier  Global recoding method Optimal solution: too slow Apriori Anonymization: fast and low information loss  Extensions (VLDBJ 2010) Local recoding (sort by Gray order and partition) Global recoding (by partitioning the data domain)


Download ppt "Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis."

Similar presentations


Ads by Google