Download presentation
Presentation is loading. Please wait.
Published byBeryl Parker Modified over 9 years ago
1
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis University of Hong Kong (HKU) Panos Kalnis King Abdullah University of Science and Technology (KAUST)
2
2 Motivation Attacker can see up to m items Any m items No distinction between sensitive and non-sensitive items 0% Milk Pregnancy test Beer Helen
3
3 Motivation (cont.) Helen: Beer, 0% Milk, Pregnancy test John: Cola, Cheese Tom: 2% Milk, Coffee …. Mary: Wine, Beer, Full-fat Milk Database t1: Beer, 0%Milk, Pregnancy test t2: Cola, Cheese t3: 2% Milk, Coffee …. tn: Wine, Beer, Full-fat Milk Published Attacker Find all transactions that contain Beer & 0% Milk t1: Beer, Milk, Pregnancy test t2: Cola, Cheese t3: Milk, Coffee …. tn: Wine, Beer, Milk
4
4 k m -anonymity Set of items Transaction Database Query terms k m -anonymity:
5
5 Related Work: K-Anonymity [Swe02] AgeZipCodeDisease 4225000Flu 4635000AIDS 5020000Cancer 5440000Gastritis 4850000Dyspepsia 5655000Bronchitis [Swe02] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002. (a) Microdata Quasi-identifier AgeZipCodeDisease 42-4625000-35000Flu 42-4625000-35000AIDS 50-5420000-40000Cancer 50-5420000-40000Gastritis 48-5650000-55000Dyspepsia 48-5650000-55000Bronchitis (a) 2-anonymous microdata NOT suitable for high-dimensionality
6
6 Related Work: L-diversity in Transactions [GTK08] G. Ghinita, Y. Tao, P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data”, ICDE, 2008 Requires knowledge of (non)-sensitive attributes
7
7 Our Approach: Employs Generalization Generalization Hierarchy Information loss k=2 m=2
8
8 Lattice of Generalizations
9
9 Optimal Algorithm Q: Q: Q:
10
10 Count Tree 11111 1 1 1 1 1 1 All generalized forms of the paths reside in the tree We can find easily which anonymizations are needed
11
11 Apriori-based Anonymization Global Optimal vs Local Optimal Solution for each path We examine the paths By size (A priori principle) Paths with invalid nodes are skipped
12
12 Apriori-based Anonymization 1. Initialize gen_map 2. For i := 1 to m do 1. For all t D do 1. Extend t acccording to gen_map 2. Add all i-subsets of extended t to count-tree 3. Check all paths in count tree and update gen_map
13
13 Small Datasets (2-15K, BMS-WebView2) |I|=40..60, k=100, m=3
14
14 Small Datasets (BMS-WebView2) |D|=10K, k=100, m=1..4
15
15 Apriori Anonymization for Large Datasets 500sec 10sec 100sec |D||D||I||I| 515K1657 59K497 77K3340 k=5 m=3
16
16 Points to Remember Anonymization of Transactional Data Attacker knows m items Any m items can be the quasi-identifier Global recoding method Optimal solution: too slow Apriori Anonymization: fast and low information loss Extensions (VLDBJ 2010) Local recoding (sort by Gray order and partition) Global recoding (by partitioning the data domain)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.