Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.

Slides:



Advertisements
Similar presentations
Anonymity for Continuous Data Publishing
Advertisements

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated.
Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca.
Hani AbuSharkh Benjamin C. M. Fung fung (at) ciise.concordia.ca
Secure Distributed Framework for Achieving -Differential Privacy Dima Alhadidi, Noman Mohammed, Benjamin C. M. Fung, and Mourad Debbabi Concordia Institute.
Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity.
Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada
Privacy-Preserving Data Mashup Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.
Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University
Linkage. Announcements 23andme genotyping. 23andme will genotype in ~3 weeks. You need to deliver finished spit kit by Friday NOON.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Finding Personally Identifying Information Mark Shaneck CSCI 5707 May 6, 2004.
1 On the Anonymization of Sparse High-Dimensional Data 1 National University of Singapore 2 Chinese University of Hong.
C MU U sable P rivacy and S ecurity Laboratory 1 Privacy Policy, Law and Technology Data Privacy October 30, 2008.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore.
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Using Data Privacy for Better Adaptive Predictions Vitaly Feldman IBM Research – Almaden Foundations of Learning Theory, 2014 Cynthia Dwork Moritz Hardt.
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
CS573 Data Privacy and Security Statistical Databases
Integrating Private Databases for Data Analysis IEEE ISI 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke Wang Simon Fraser.
Haplotype-Based Noise- Adding Approach to Genomic Data Anonymization Yongan Zhao, Xiaofeng Wang and Haixu Tang School of Informatics and Computing, Indiana.
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU),
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Topic 21: Data Privacy1 Information Security CS 526 Topic 21: Data Privacy.
SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
K-Anonymity & Algorithms
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12.
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
The 1 st Competition on Critical Assessment of Data Privacy and Protection The privacy workshop is jointly sponsored by iDASH (U54HL108460) and the collaborating.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.
Privacy-preserving data publishing
1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.
Trustworthy Semantic Web Dr. Bhavani Thuraisingham The University of Texas at Dallas Inference Problem March 4, 2011.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and Anonymity.
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Privacy-safe Data Sharing. Why Share Data? Hospitals share data with researchers – Learn about disease causes, promising treatments, correlations between.
ROLE OF ANONYMIZATION FOR DATA PROTECTION Irene Schluender and Murat Sariyar (TMF)
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies Florian Tramèr, Zhicong Huang, Erman Ayday,
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Private Data Management with Verification
By (Group 17) Mahesha Yelluru Rao Surabhee Sinha Deep Vakharia
Differential Privacy in Practice
Differential Privacy (2)
Walking in the Crowd: Anonymizing Trajectory Data for Pattern Analysis
Published in: IEEE Transactions on Industrial Informatics
Trustworthy Semantic Web
Presentation transcript:

Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

2 Reference  N. Mohammed, R. Chen, B. C. M. Fung, and P. S. Yu. Differentially private data release for data mining. In Proceedings of the17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages ,

3 Outline  Privacy Models  Algorithm for Relational Data  Algorithm for Genomic Data  Conclusion 3

4 Overview 4 Privacy model Anonymization algorithm Data utility

5 k-Anonymity [Samarati & Sweeney, PODS 1998]  Quasi-identifier (QID): The set of re-identification attributes.  k-anonymity: Each record cannot be distinguished from at least k-1 other records in the table wrt QID. 3-anonymous patient table JobSexAgeDisease ProfessionalMale[36-40] Cancer ProfessionalMale[36-40] Cancer ProfessionalMale[36-40] Cancer ArtistFemale[30-35] Flu ArtistFemale[30-35] Hepatitis ArtistFemale[30-35] Fever ArtistFemale[30-35] Hepatitis Raw patient table JobSexAgeDisease EngineerMale36Cancer EngineerMale38Cancer LawyerMale38Cancer MusicianFemale30Flu MusicianFemale30Hepatitis DancerFemale30Fever DancerFemale30Hepatitis 5

6 Differential Privacy [DMNS, TCC 06] 6 A

7 Differential Privacy 7 A non-interactive privacy mechanism A gives ε -differential privacy if for all neighbour D and D’, and for any possible sanitized database D* Pr A [A(D) = D*] ≤ exp(ε) × Pr A [A(D’) = D*] DD’ D and D’ are neighbors if they differ on at most one record

8 Laplace Mechanism 8 For example, for a single counting query Q over a dataset D, returning Q(D) + Laplace(1/ε) maintains ε -differential privacy. ∆f = max D,D’ ||f(D) – f(D’)|| 1 For a counting query f: ∆f =1

9 Outline  Privacy Models  Algorithm for Relational Data  Algorithm for Genomic Data  Conclusion 9

10 Non-interactive Framework 0 + Lap(1/ ε ) 10

11 For high-dimensional data, noise is too big 0 + Lap(1/ ε ) 11 Non-interactive Framework

12 Non-interactive Framework

13 JobAgeClassCount Any_Job[18-65)4Y4N8 Artist[18-65)2Y2N4 Professional[18-65)2Y2N4 Age [18-65) [18-40)[40-65) Artist[18-40)2Y2N4Artist[40-65)0Y0N0 Anonymization Algorithm [18-30)[30-40) 13 Professional[18-40)2Y1N3Professional[40-65)0Y1N1 Job Any_Job ProfessionalArtist EngineerLawyerDancerWriter

14 Candidate Selection  we favor the specialization with maximum Score value  First utility function: ∆u =  Second utility function: ∆u = 1 14

15 Anonymization Algorithm O(A pr x|D|log|D|) O(|candidates|) O(|D|) O(|D|log|D|) O(1) 15

16 Anonymization Algorithm O(A pr x|D|log|D|) O(|candidates|) O(|D|) O(|D|log|D|) O(1) O((A pr +h)x|D|log|D|) 16

17 Outline  Privacy Models  Algorithm for Relational Data  Algorithm for Genomic Data  Conclusion 17

18 case_chr2_ _ rs AG AG AA AG GG AA AG AA GG AG AA AA AA AA AA AA AA GG GG AG AG AA AG GG AA AA GG AG AG AG GG AG AA AA AG AG AG AG AG AG AA AG GG AG AA GG GG GG GG AG AG AG AG AA GG GG GG AG AA AG GG AG AA GG GG AG AG AG AG AG AA AA AG AG AG AA AG AG AG AG GG AG AG AG GG GG AG AG GG AG AG AG AA AA GG AG AA GG AA AA AG GG AG AG AG AG AG AG AG AG GG GG AA AG AG AG AG AA AG GG AG GG AA AG GG AG AG AG AA AG AG AG GG AG GG AG GG AG AG AG GG AG AG GG GG AG AG GG AA GG AA AG AG AG AG GG AG AA AG GG GG AG AG AG AG AG GG AG AG AA AG AA AA AG GG AA AG AG GG AG GG AG AG GG GG AG AG AA AG AG AG GG AG GG GG AG AG GG AG GG rs CC CC CC CT CT CC CT CC CT CT CC CC CC CC CC CC CC CT CT CT CC CC CT CT CC CC CT CC CT CC CT CC CC CC CT …. rs CC CC CC CT CT CC CT CC TT CT CC CC CC CC CC CC CC TT TT CT CC CC CT CT CC CC CT CC CT CC TT CC CC CC CT …. 18

19 Casers rs rs rs …… 1AGCC 2AGCC 3AACC 4AGCT 5GGCT ……… ……… ……… ……… 198GGTT 199AGCT 200GGCT Raw Data 19

20 Blocks/Attributes Casers rs AGCC 2AGCC 3AACC 4AGCT 5GGCT 20 Unique Combinations: AG CC AA CC AG CT GG CT Any AG CCAA CCAG CTGG CT

21 Taxonomy Trees for Attributes  SNP data was split evenly into N/6 blocks(attributes), where N is number of SNPs 21

22 Hierarchy Tree for Chr2 22

23 Hierarchy Tree for Chr10 23

24 Block 1Block 2Block 3Count Any 200 AA CCAny 130 AG CCAny 70 Block 3 Any CC GGCT AG AA CCAnyCC GG60AA CCAnyCT AG70 Genomic Data 24 AG CCAnyCC GG30AG CCAnyCT AG40 Block 1 Any AG CCAA CC

25 Anonymized Data 25 Casers rs rs rs …… 1AGCCAny 2AGCCAny 3…… 4…… 5AACCAny …AACCAny ……… ……… ……… … … …

26 Heterogeneous Healthcare Data IDJobAgers rs …… 1Engineer50AGCC…… 2Doctor45AACT…… 3……………… Relational DataGenomic Data 26

27  Privacy-Preserving Genomic Data Release  Tree-based approach is promising  Future work  Partitioning the SNPs to generate blocks  Utility function for specialization  Two-level tree Vs. multi-level hierarchy trees  Single-dimension Vs. multi-dimensional partitioning Conclusions 27

28  Privacy-Preserving Genomic Data Release  Tree-based approach is promising  Future work  Partitioning the SNPs to generate blocks  Utility function for specialization  Two-level tree Vs. multi-level hierarchy trees  Single-dimension Vs. multi-dimensional partitioning Thank You ! 28