Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı

Similar presentations


Presentation on theme: "Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı"— Presentation transcript:

1 Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı mehmet.nergiz@zirve.edu.tr muhammed.gok@std.zirve.edu.tr ufuk.ozkanli@zirve.edu.tr *Project supported by TUBITAK

2 Data Privacy Personal information is collected by hospitals, companies etc. The potential value of such data is so great It needs to be published for research, analysis Such data contains sensitive information Data needs to be anonymized properly

3 Privacy Violation AgeGenderNationDisease 16FBulgariaFever 17MTurkeyAIDS 23FUSCancer 25MCanadaFlu Quasi Identifiers NameAgeGenderNation Obi17MTurkey Private Dataset Adversary knowledge

4 Privacy Violation AgeGenderNationDisease 16FBulgariaFever 17MTurkeyAIDS 23FUSCancer 25MCanadaFlu Quasi Identifiers NameAgeGenderNation Obi17MTurkey Private Dataset Adversary knowledge

5 Generalization We say a table T* is a generalization of table T if and only if and only if T*[i][j] is some generalization of T[i][j] for every quasi identifier attribute i and every column j. AgeGenderNationDisease 17MTurkeyAIDS 16FBulgariaFever 23FUSCancer 25MCanadaFlu AgeGenderNationDisease 11-20ME.EuropeAIDS 11-20FE.EuropeFever 21-30*N.AmericaCancer 21-30*N.AmericaFlu Table T Table T*

6 Single Dimensional Generalization AgeGenderNation Disease 17MTurkey AIDS 16FBulgaria Fever 23FUS Cancer 25FCanada Flu 11-20FE.EuropeFever 21-30FN.AmericaCancer 21-30FN.AmericaFlu AgeGenderNationDisease E.EuropeAIDSM11-20 * MF S. America Brazil W. Europe Peru E. Europe * AmericaEurope N. America CanadaUSAFranceSpainBulgariaTurkey 172325 * 11-2021-30 16 1 2 μ = [1,0,1] (T) Private Dataset T * μ [1,0,1] Generalization 0

7 Privacy Violation Name AgeGenderNation Obi 17MTurkey Adversary knowledge 11-20FE.EuropeFever 21-30FN.AmericaCancer 21-30FN.AmericaFlu AgeGenderNationDisease E.EuropeAIDSM11-20 T * μ [1,0,1] Generalization

8 l-diversity An equality group is said to satisfy l-diversity if the probability that any tuple in this group is linked to a sensitive value is at most 1/l. A table satisfies l-diversity if each equality-group in the table is l-diverse. AgeGenderNation Disease 11-20*Europe AIDS 11-20*Europe Fever 21-30*N.America Cancer 21-30*N.America Flu 2-diversity AgeGenderNationDisease 17MTurkeyAIDS 16FBulgariaFever 23FUSCancer 25FCanadaFlu Name Obi Leila Padme Yoda PrivateTable

9 The problem Find a single dimensional generalization which Satisfies l-diversity and minimizes information loss (maximizesutility) Algorithms based on heuristics have been proposed. Common weakness: outliers

10 Single Dimensional Algorithm Incognito(LeFevre et al.,2005) IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q423Spain47703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cancer q843Canada49001Flu q948Canada49001Fever T IdAgeNationZipDis q111-30EU47*AIDS q211-30EU47*Fever q311-30EU47*Flu q411-30EU47*Fever q531-50AM49*Flu q631-50AM49*AIDS q731-50AM49*Cancer q831-50AM49*Flu q931-50AM49*Fever μ=[2,3,2] T*μT*μ 2-diversity

11 Outliers μ=[2,2,2] T*μT*μ T*μT*μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q411-30W. EU47*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q423Spain47703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cancer q843Canada49001Flu q948Canada49001Fever T

12 Our Contribution We introduce a Hybrid anonymization technique which releases a dataset satisfies l-diversity improves the utility of released dataset (minimizes information loss). handles the outliers We have a novel approach in our hybrid anonymization integrating data relocation mechanism with generalizations

13 Data Relocation IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q423Spain47703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cancer q843Canada49001Flu q948Canada49001Fever T μ=[2,2,2] T*μT*μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q411-30W. EU47*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever

14 Data Relocation IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q434Brazil49703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cancer q843Canada49001Flu q948Canada49001Fever TrTr IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q423Spain47703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cancer q843Canada49001Flu q948Canada49001Fever T

15 Table Relocation TrTr μ=[2,2,2] T r* μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q431-50S. AM49*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q434Brazil49703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cance r q843Canada49001Flu q948Canada49001Fever Table Relocation

16 Hybrid Generalization Table RelocationHybrid Generalization TrTr μ=[2,2,2] T r* μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q431-50S. AM49*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q434Brazil49703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cance r q843Canada49001Flu q948Canada49001Fever

17 Hybrid Generalization p%-Hybrid Generalizationp%-Table Relocation TrTr μ=[2,2,2] T r* μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q431-50S. AM49*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q434Brazil49703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cance r q843Canada49001Flu q948Canada49001Fever

18 Hybrid Anonymization p%-Table RelocationHybrid anonymization 2-diversity TrTr μ=[2,2,2] T r* μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q431-50S. AM49*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever IdAgeNationZipDis q112Greece47906AIDS q219Turkey47906Fever q317Greece47907Flu q434Brazil49703Fever q538Brazil49705Flu q633Peru49812AIDS q741USA49001Cance r q843Canada49001Flu q948Canada49001Fever

19 The problem Given a private Table T, find a hybrid anonymization of T which satisfies l-diversity and minimizes information loss (maximizes utility)

20 Hybrid Anonymizations We propose solutions in different adversarial models 1. Classical Adversary

21 Hybrid Algorithm μ=[2,2,2] T*μT*μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q411-30W. EU47*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever Parameters: Distortion limit (p)=%50 l=2

22 Hybrid Algorithm q411-30W. EU47*14K Parameters: Distortion limit (p)=%50 l=2 μ=[2,2,2] T*μT*μ IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q411-30W. EU47*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever

23 Hybrid Algorithm q411-30W. EU47*14K μ=[2,2,2] T r* μ Parameters: Distortion limit (p)=%50 l=2 IdAgeNationZipDis q111-30E.EU47*AIDS q211-30E.EU47*Fever q311-30E.EU47*Flu q411-30E.EU47*Fever q531-50S. AM49*Flu q631-50S. AM49*AIDS q731-50N. AM49*Cancer q831-50N. AM49*Flu q931-50N. AM49*Fever 2-diversity

24 Hybrid Anonymizations We propose solutions in different adversarial models. 1. Classical Adversary 2. Statistical Adversary

25 Hybrid Anonymizations We propose solutions in different adversarial models. 1. Classical Adversary 2. Statistical Adversary Pearson’s chi-square test

26 Hybrid Anonymizations We propose solutions in different adversarial models. 1. Classical Adversary 2. Statistical Adversary 3. Algorithm-aware adversary

27 Experiments According to utility cost experiments, in nearly all cases, all Hybrid approaches give better results than the algorithm Incognito. In classical adversary setting, with only 1% relocations, our algorithm creates hybrid anonymizations with 70% better utility compared to Incognito. Hybrid and randomized Hybrid perform better than statistical Hybrid algorithms.

28 Conclusion We introduced hybrid generalizations in which a limited number of data elements can be relocated. We showed that relocations can potentially increase the utility of anonymizations at the cost of truthfulness. As one possible future work, new hybrid algorithms can be designed for other privacy metric such as δ-presence or (α,k)-anonymity.

29 Questions? Thank You


Download ppt "Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı"

Similar presentations


Ads by Google