Download presentation
Presentation is loading. Please wait.
Published byHilary Kory Barton Modified over 9 years ago
1
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore www.comp.nus.edu.sg/~kalnis
2
2 Motivation Attacker can see up to m items Any m items No distinction between sensitive and non-sensitive items 0% Milk Pregnancy test Beer Helen
3
3 Motivation (cont.) Helen: Beer, 0% Milk, Pregnancy test John: Cola, Cheese Tom: 2% Milk, Coffee …. Mary: Wine, Beer, Full-fat Milk Database t1: Beer, 0%Milk, Pregnancy test t2: Cola, Cheese t3: 2% Milk, Coffee …. tn: Wine, Beer, Full-fat Milk Published Attacker Find all transactions that contain Beer & 0% Milk t1: Beer, Milk, Pregnancy test t2: Cola, Cheese t3: Milk, Coffee …. tn: Wine, Beer, Milk
4
4 k m -anonymity Set of items Transaction Database Query terms k m -anonymity:
5
5 Related Work: K-Anonymity [Swe02] AgeZipCodeDisease 4225000Flu 4635000AIDS 5020000Cancer 5440000Gastritis 4850000Dyspepsia 5655000Bronchitis [Swe02] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002. (a) Microdata Quasi-identifier AgeZipCodeDisease 42-4625000-35000Flu 42-4625000-35000AIDS 50-5420000-40000Cancer 50-5420000-40000Gastritis 48-5650000-55000Dyspepsia 48-5650000-55000Bronchitis (a) 2-anonymous microdata NOT suitable for high-dimensionality
6
6 Related Work: L-diversity in Transactions [GTK08] G. Ghinita, Y. Tao, P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data”, ICDE, 2008 Requires knowledge of (non)-sensitive attributes
7
7 Our Approach: Employs Generalization Generalization Hierarchy Information loss k=2 m=2
8
8 Lattice of Generalizations
9
9 Count Tree 11111 1 1 1 2 32 2
10
10 Optimal Algorithm Q: Q: Q:
11
11 “Direct” Anonymization COUNT({a 1,a 2 })=1 Solves each “problem” independently
12
12 “Apriori-based” Anonymization Construct the count-tree incrementally Prune unnecessary branches
13
13 Small Datasets (2-15K, BMS-WebView2) |I|=40..60, k=100, m=3
14
14 Small Datasets (BMS-WebView2) |D|=10K, k=100, m=1..4
15
15 Apriori Anonymization for Large Datasets 500sec 10sec 100sec |D||D||I||I| 515K1657 59K497 77K3340 k=5 m=3
16
16 Points to Remember Anonymization of Transactional Data Attacker knows m items Any m items can be the quasi-identifier Global recoding method Optimal solution: too slow Apriori Anonymization: fast and low information loss On-going work Local recoding (sort by Gray order and partition) Transactional data in streaming environments
17
17 Bibliography on LBS Privacy http://anonym.comp.nus.edu.sg
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.