Download presentation
Presentation is loading. Please wait.
Published byRodney Hoover Modified over 8 years ago
1
Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı mehmet.nergiz@zirve.edu.tr muhammed.gok@std.zirve.edu.tr ufuk.ozkanli@zirve.edu.tr *Project supported by TUBITAK
2
Data Privacy Personal information is collected by hospitals, companies etc. The potential value of such data is so great It needs to be published for research, analysis Such data contains sensitive information Data needs to be anonymized properly
3
Privacy Violation AgeGenderNationDisease 16FBulgariaFever 17MTurkeyAIDS 23FUSCancer 25MCanadaFlu Quasi Identifiers NameAgeGenderNation Obi17MTurkey Private Dataset Adversary knowledge
4
Privacy Violation AgeGenderNationDisease 16FBulgariaFever 17MTurkeyAIDS 23FUSCancer 25MCanadaFlu Quasi Identifiers NameAgeGenderNation Obi17MTurkey Private Dataset Adversary knowledge
5
Generalization We say a table T* is a generalization of table T if and only if and only if T*[i][j] is some generalization of T[i][j] for every quasi identifier attribute i and every column j. AgeGenderNationDisease 17MTurkeyAIDS 16FBulgariaFever 23FUSCancer 25MCanadaFlu AgeGenderNationDisease 11-20ME.EuropeAIDS 11-20FE.EuropeFever 21-30*N.AmericaCancer 21-30*N.AmericaFlu Table T Table T*
6
Single Dimensional Generalization AgeGenderNation Disease 17MTurkey AIDS 16FBulgaria Fever 23FUS Cancer 25FCanada Flu 11-20FE.EuropeFever 21-30FN.AmericaCancer 21-30FN.AmericaFlu AgeGenderNationDisease E.EuropeAIDSM11-20 * MF S. America Brazil W. Europe Peru E. Europe * AmericaEurope N. America CanadaUSAFranceSpainBulgariaTurkey 172325 * 11-2021-30 16 1 2 μ = [1,0,1] (T) Private Dataset T * μ [1,0,1] Generalization 0
7
Privacy Violation Name AgeGenderNation Obi 17MTurkey Adversary knowledge 11-20FE.EuropeFever 21-30FN.AmericaCancer 21-30FN.AmericaFlu AgeGenderNationDisease E.EuropeAIDSM11-20 T * μ [1,0,1] Generalization
8
l-diversity An equality group is said to satisfy l-diversity if the probability that any tuple in this group is linked to a sensitive value is at most 1/l. A table satisfies l-diversity if each equality-group in the table is l-diverse. AgeGenderNation Disease 11-20*Europe AIDS 11-20*Europe Fever 21-30*N.America Cancer 21-30*N.America Flu 2-diversity AgeGenderNationDisease 17MTurkeyAIDS 16FBulgariaFever 23FUSCancer 25FCanadaFlu Name Obi Leila Padme Yoda PrivateTable
9
The problem Find a single dimensional generalization which Satisfies l-diversity and minimizes information loss (maximizesutility) Algorithms based on heuristics have been proposed. Common weakness: outliers
10
Single Dimensional Algorithm Incognito(LeFevre et al.,2005) IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q423Spain47703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cancer q843Canada49001Flu q948Canada49001Fever T IdAgeNationZipDis q111-30EU47*AIDS q211-30EU47*Fever q311-30EU47*Flu q411-30EU47*Fever q531-50AM49*Flu q631-50AM49*AIDS q731-50AM49*Cancer q831-50AM49*Flu q931-50AM49*Fever μ=[2,3,2] T*μT*μ 2-diversity
11
Outliers μ=[2,2,2] T*μT*μ T*μT*μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q411-30W. EU47*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q423Spain47703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cancer q843Canada49001Flu q948Canada49001Fever T
12
Our Contribution We introduce a Hybrid anonymization technique which releases a dataset satisfies l-diversity improves the utility of released dataset (minimizes information loss). handles the outliers We have a novel approach in our hybrid anonymization integrating data relocation mechanism with generalizations
13
Data Relocation IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q423Spain47703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cancer q843Canada49001Flu q948Canada49001Fever T μ=[2,2,2] T*μT*μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q411-30W. EU47*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever
14
Data Relocation IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q434Brazil49703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cancer q843Canada49001Flu q948Canada49001Fever TrTr IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q423Spain47703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cancer q843Canada49001Flu q948Canada49001Fever T
15
Table Relocation TrTr μ=[2,2,2] T r* μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q431-50S. AM49*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q434Brazil49703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cance r q843Canada49001Flu q948Canada49001Fever Table Relocation
16
Hybrid Generalization Table RelocationHybrid Generalization TrTr μ=[2,2,2] T r* μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q431-50S. AM49*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q434Brazil49703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cance r q843Canada49001Flu q948Canada49001Fever
17
Hybrid Generalization p%-Hybrid Generalizationp%-Table Relocation TrTr μ=[2,2,2] T r* μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q431-50S. AM49*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q434Brazil49703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cance r q843Canada49001Flu q948Canada49001Fever
18
Hybrid Anonymization p%-Table RelocationHybrid anonymization 2-diversity TrTr μ=[2,2,2] T r* μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q431-50S. AM49*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q434Brazil49703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cance r q843Canada49001Flu q948Canada49001Fever
19
The problem Given a private Table T, find a hybrid anonymization of T which satisfies l-diversity and minimizes information loss (maximizes utility)
20
Hybrid Anonymizations We propose solutions in different adversarial models 1. Classical Adversary
21
Hybrid Algorithm μ=[2,2,2] T*μT*μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q411-30W. EU47*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever Parameters: Distortion limit (p)=%50 l=2
22
Hybrid Algorithm q411-30W. EU47*14K Parameters: Distortion limit (p)=%50 l=2 μ=[2,2,2] T*μT*μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q411-30W. EU47*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever
23
Hybrid Algorithm q411-30W. EU47*14K μ=[2,2,2] T r* μ Parameters: Distortion limit (p)=%50 l=2 IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q411-30E.EU47*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever 2-diversity
24
Hybrid Anonymizations We propose solutions in different adversarial models. 1. Classical Adversary 2. Statistical Adversary
25
Hybrid Anonymizations We propose solutions in different adversarial models. 1. Classical Adversary 2. Statistical Adversary Pearson’s chi-square test
26
Hybrid Anonymizations We propose solutions in different adversarial models. 1. Classical Adversary 2. Statistical Adversary 3. Algorithm-aware adversary
27
Experiments According to utility cost experiments, in nearly all cases, all Hybrid approaches give better results than the algorithm Incognito. In classical adversary setting, with only 1% relocations, our algorithm creates hybrid anonymizations with 70% better utility compared to Incognito. Hybrid and randomized Hybrid perform better than statistical Hybrid algorithms.
28
Conclusion We introduced hybrid generalizations in which a limited number of data elements can be relocated. We showed that relocations can potentially increase the utility of anonymizations at the cost of truthfulness. As one possible future work, new hybrid algorithms can be designed for other privacy metric such as δ-presence or (α,k)-anonymity.
29
Questions? Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.