Download presentation
Presentation is loading. Please wait.
Published byBenjamin Robbins Modified over 9 years ago
1
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014
2
2 Reference N. Mohammed, R. Chen, B. C. M. Fung, and P. S. Yu. Differentially private data release for data mining. In Proceedings of the17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 493-501, 2011. 2
3
3 Outline Privacy Models Algorithm for Relational Data Algorithm for Genomic Data Conclusion 3
4
4 Overview 4 Privacy model Anonymization algorithm Data utility
5
5 k-Anonymity [Samarati & Sweeney, PODS 1998] Quasi-identifier (QID): The set of re-identification attributes. k-anonymity: Each record cannot be distinguished from at least k-1 other records in the table wrt QID. 3-anonymous patient table JobSexAgeDisease ProfessionalMale[36-40] Cancer ProfessionalMale[36-40] Cancer ProfessionalMale[36-40] Cancer ArtistFemale[30-35] Flu ArtistFemale[30-35] Hepatitis ArtistFemale[30-35] Fever ArtistFemale[30-35] Hepatitis Raw patient table JobSexAgeDisease EngineerMale36Cancer EngineerMale38Cancer LawyerMale38Cancer MusicianFemale30Flu MusicianFemale30Hepatitis DancerFemale30Fever DancerFemale30Hepatitis 5
6
6 Differential Privacy [DMNS, TCC 06] 6 A
7
7 Differential Privacy 7 A non-interactive privacy mechanism A gives ε -differential privacy if for all neighbour D and D’, and for any possible sanitized database D* Pr A [A(D) = D*] ≤ exp(ε) × Pr A [A(D’) = D*] DD’ D and D’ are neighbors if they differ on at most one record
8
8 Laplace Mechanism 8 For example, for a single counting query Q over a dataset D, returning Q(D) + Laplace(1/ε) maintains ε -differential privacy. ∆f = max D,D’ ||f(D) – f(D’)|| 1 For a counting query f: ∆f =1
9
9 Outline Privacy Models Algorithm for Relational Data Algorithm for Genomic Data Conclusion 9
10
10 Non-interactive Framework 0 + Lap(1/ ε ) 10
11
11 For high-dimensional data, noise is too big 0 + Lap(1/ ε ) 11 Non-interactive Framework
12
12 Non-interactive Framework
13
13 JobAgeClassCount Any_Job[18-65)4Y4N8 Artist[18-65)2Y2N4 Professional[18-65)2Y2N4 Age [18-65) [18-40)[40-65) Artist[18-40)2Y2N4Artist[40-65)0Y0N0 Anonymization Algorithm [18-30)[30-40) 13 Professional[18-40)2Y1N3Professional[40-65)0Y1N1 Job Any_Job ProfessionalArtist EngineerLawyerDancerWriter
14
14 Candidate Selection we favor the specialization with maximum Score value First utility function: ∆u = Second utility function: ∆u = 1 14
15
15 Anonymization Algorithm O(A pr x|D|log|D|) O(|candidates|) O(|D|) O(|D|log|D|) O(1) 15
16
16 Anonymization Algorithm O(A pr x|D|log|D|) O(|candidates|) O(|D|) O(|D|log|D|) O(1) O((A pr +h)x|D|log|D|) 16
17
17 Outline Privacy Models Algorithm for Relational Data Algorithm for Genomic Data Conclusion 17
18
18 case_chr2_29504091_30044866 rs11686243 AG AG AA AG GG AA AG AA GG AG AA AA AA AA AA AA AA GG GG AG AG AA AG GG AA AA GG AG AG AG GG AG AA AA AG AG AG AG AG AG AA AG GG AG AA GG GG GG GG AG AG AG AG AA GG GG GG AG AA AG GG AG AA GG GG AG AG AG AG AG AA AA AG AG AG AA AG AG AG AG GG AG AG AG GG GG AG AG GG AG AG AG AA AA GG AG AA GG AA AA AG GG AG AG AG AG AG AG AG AG GG GG AA AG AG AG AG AA AG GG AG GG AA AG GG AG AG AG AA AG AG AG GG AG GG AG GG AG AG AG GG AG AG GG GG AG AG GG AA GG AA AG AG AG AG GG AG AA AG GG GG AG AG AG AG AG GG AG AG AA AG AA AA AG GG AA AG AG GG AG GG AG AG GG GG AG AG AA AG AG AG GG AG GG GG AG AG GG AG GG rs4426491 CC CC CC CT CT CC CT CC CT CT CC CC CC CC CC CC CC CT CT CT CC CC CT CT CC CC CT CC CT CC CT CC CC CC CT …. rs4305230 CC CC CC CT CT CC CT CC TT CT CC CC CC CC CC CC CC TT TT CT CC CC CT CT CC CC CT CC CT CC TT CC CC CC CT …. 18
19
19 Casers11686243rs4426491rs4305230rs4630725…… 1AGCC 2AGCC 3AACC 4AGCT 5GGCT ……… ……… ……… ……… 198GGTT 199AGCT 200GGCT Raw Data 19
20
20 Blocks/Attributes Casers11686243rs4426491 1AGCC 2AGCC 3AACC 4AGCT 5GGCT 20 Unique Combinations: AG CC AA CC AG CT GG CT Any AG CCAA CCAG CTGG CT
21
21 Taxonomy Trees for Attributes SNP data was split evenly into N/6 blocks(attributes), where N is number of SNPs 21
22
22 Hierarchy Tree for Chr2 22
23
23 Hierarchy Tree for Chr10 23
24
24 Block 1Block 2Block 3Count Any 200 AA CCAny 130 AG CCAny 70 Block 3 Any CC GGCT AG AA CCAnyCC GG60AA CCAnyCT AG70 Genomic Data 24 AG CCAnyCC GG30AG CCAnyCT AG40 Block 1 Any AG CCAA CC
25
25 Anonymized Data 25 Casers11686243rs4426491rs4305230rs4630725…… 1AGCCAny 2AGCCAny 3…… 4…… 5AACCAny …AACCAny ……… ……… ……… … … …
26
26 Heterogeneous Healthcare Data IDJobAgers4305230rs4630725…… 1Engineer50AGCC…… 2Doctor45AACT…… 3……………… Relational DataGenomic Data 26
27
27 Privacy-Preserving Genomic Data Release Tree-based approach is promising Future work Partitioning the SNPs to generate blocks Utility function for specialization Two-level tree Vs. multi-level hierarchy trees Single-dimension Vs. multi-dimensional partitioning Conclusions 27
28
28 Privacy-Preserving Genomic Data Release Tree-based approach is promising Future work Partitioning the SNPs to generate blocks Utility function for specialization Two-level tree Vs. multi-level hierarchy trees Single-dimension Vs. multi-dimensional partitioning Thank You ! 28
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.