Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies Florian Tramèr, Zhicong Huang, Erman Ayday,

Slides:



Advertisements
Similar presentations
Protecting Location Privacy: Optimal Strategy against Localization Attacks Reza Shokri, George Theodorakopoulos, Carmela Troncoso, Jean-Pierre Hubaux,
Advertisements

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
A Privacy Preserving Index for Range Queries
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
MALD Mapping by Admixture Linkage Disequilibrium.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
SOME ADDITIONAL POINTS ON MEASUREMENT ERROR IN EPIDEMIOLOGY Sholom May 28, 2011 Supplement to Prof. Carroll’s talk II.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
1 Privacy-Preserving Distributed Information Sharing Nan Zhang and Wei Zhao Texas A&M University, USA.
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
Statistical Databases – Query Auditing Li Xiong CS573 Data Privacy and Anonymity Partial slides credit: Vitaly Shmatikov, Univ Texas at Austin.
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Haplotype-Based Noise- Adding Approach to Genomic Data Anonymization Yongan Zhao, Xiaofeng Wang and Haixu Tang School of Informatics and Computing, Indiana.
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.
Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU),
Topic 21: Data Privacy1 Information Security CS 526 Topic 21: Data Privacy.
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
The 1 st Competition on Critical Assessment of Data Privacy and Protection The privacy workshop is jointly sponsored by iDASH (U54HL108460) and the collaborating.
Methods in genome wide association studies. Norú Moreno
The world’s libraries. Connected. Theoretical Research about Privacy Georgia State University Reporter: Zaobo He Seminar in 10 / 07 / 2015.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Time to Encrypt our DNA? Stuart Bradley Humbert, M., Huguenin, K., Hugonot, J., Ayday, E., Hubaux, J. (2015). De-anonymizing genomic databases using phenotypic.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Auditing Information Leakage for Distance Metrics Yikan Chen David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
Privacy-Preserving Data Aggregation without Secure Channel: Multivariate Polynomial Evaluation Taeho Jung 1, XuFei Mao 2, Xiang-Yang Li 1, Shao-Jie Tang.
Large-Scale Record Linkage Support for Cloud Computing Platforms Yuan Xue, Bradley Malin, Elizabeth Durham EECS Department, Biomedical Informatics Department,
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.
Efficient Multi-User Indexing for Secure Keyword Search
On the Size of Pairing-based Non-interactive Arguments
Statistical Data Analysis
Privacy-preserving Release of Statistics: Differential Privacy
Differential Privacy in Practice
Real-time Protection for Open Beacon Network
Differential Privacy and Statistical Inference: A TCS Perspective
Differential Privacy (2)
Part II: Potential Genetic Privacy Risks
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Statistical Data Analysis
Scott Aaronson (UT Austin) UNM, Albuquerque, October 18, 2018
Published in: IEEE Transactions on Industrial Informatics
Gentle Measurement of Quantum States and Differential Privacy *
Refined privacy models
Differential Privacy (1)
Differential Privacy.
Presentation transcript:

Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies Florian Tramèr, Zhicong Huang, Erman Ayday, Jean-Pierre Hubaux ACM CCS 2015 Denver, Colorado, USA October 15, 2015

CCS’15 – Denver, ColoradoOctober 15, 2015 Outline Data Privacy and Membership Disclosure – Differential Privacy (DP) – Positive Membership Privacy (PMP) – Prior-Belief Families and Equivalence between DP and PMP Bounded Priors – Modeling Adversaries with Limited Background Knowledge – Inference Attacks for Genome-Wide Association Studies (GWAS) Evaluation – Perturbation Mechanisms for GWAS – Trading Privacy, Medical Utility and Cost Differential Privacy with Bounded Priors2

CCS’15 – Denver, ColoradoOctober 15, 2015 Differential Privacy 1,2 Belonging to a dataset ≈ Not belonging to it A mechanism provides ε-DP iff for any datasets T 1 and T 2 differing in a single element, and any S ⊆ range(), we have: We focus on bounded DP (results hold for unbounded case) Differential Privacy with Bounded Priors3 T1T1 T2T2 T1T1 T2T2 Unbounded DPBounded DP 1 Dwork. “Differential privacy”. Automata, languages and programming Dwork et al. “Calibrating Noise to Sensitivity in Private Data Analysis”. TCC’

CCS’15 – Denver, ColoradoOctober 15, 2015 Positive Membership Privacy 1 Data Privacy: protection against membership disclosure – Adversary should not learn whether an entity from a universe = {t 1, t 2, …} belongs to the dataset T Privacy: posterior belief ≈ prior belief for all entities Impossible in general! (no free lunch) Differential Privacy with Bounded Priors4 Tt Is t in the dataset ? I am now sure t is in the dataset ! (T) 1 Li et al. “Membership privacy: a unifying framework for privacy definitions”. CCS ’

CCS’15 – Denver, ColoradoOctober 15, 2015 Prior Belief Families 1 Adversary’s prior belief:Distribution over 2 Range of adversaries: Distribution family A mechanism satisfies (ε, )-PMP iff for any S ⊆ range(), any prior distribution ∊, and any entity t ∊, we have Differential Privacy with Bounded Priors5 1 Li et al. “Membership privacy: a unifying framework for privacy definitions”. CCS ’ prior posterior ≤ e ε ∙ prior 1-prior (1-posterior) ≥ e -ε ∙ (1-prior)

CCS’15 – Denver, ColoradoOctober 15, 2015 PMP  DP 1 Mutually Independent Distributions: – Distribution family B Each entity t is in the dataset independently with some probability p t Conditioned on the size of the dataset being fixed Theorem 1 : Similar result exists for unbounded DP Differential Privacy with Bounded Priors6 1 Li et al. “Membership privacy: a unifying framework for privacy definitions”. CCS ’

CCS’15 – Denver, ColoradoOctober 15, 2015 Outline Data Privacy and Membership Disclosure – Differential Privacy (DP) – Positive Membership Privacy (PMP) – Prior-Belief Families and Equivalence between DP and PMP Bounded Priors – Modeling Adversaries with Limited Background Knowledge – Inference Attacks for Genome-Wide Association Studies (GWAS) Evaluation – Perturbation Mechanisms for GWAS – Trading Privacy, Medical Utility and Cost Differential Privacy with Bounded Priors7

CCS’15 – Denver, ColoradoOctober 15, 2015 Bounded Priors Observation: B includes adversarial priors with arbitrarily high certainty about all entities: Do we care about such strong adversaries? – All entities except t’ have no privacy a priori (w.r.t membership in T) – The membership status of t’ is known with arbitrarily high certainty – Unless membership is extremely rare / extremely likely, this means the adversary has strong background knowledge – How do we model an adversary with limited a priori knowledge? Differential Privacy with Bounded Priors8 0 1 low certainty high certainty

CCS’15 – Denver, ColoradoOctober 15, 2015 Bounded Priors We consider adversaries with the following priors: – Entities are independent (size of dataset possibly known) – Adversary might know membership of some entities (Pr[t ∊ T] ∊ {0,1}) – For an “unknown” entity, membership status is uncertain a priori a ≤ Pr[t ∊ T] ≤ b Questions: – Is such a model relevant in practice ? – What utility can we gain by considering a relaxed adversarial setting ? Differential Privacy with Bounded Priors9 01 a b

CCS’15 – Denver, ColoradoOctober 15, 2015 Bounded Priors In Practice: Example Genome-Wide Association Studies: – Case-Control study – Patient in case group patient has some disease – Find out which genetic variations (SNPs) are associated with disease Ex: chi-square test for each SNP Differential Privacy with Bounded Priors10 Cases AT AA TT AT AA TT AA AT Controls AT TT AT AA TT AA TT

CCS’15 – Denver, ColoradoOctober 15, 2015 Bounded Priors In Practice: Example Re-identification attacks 1,2 : – Collect published aggregate statistics for the case/control groups – Use a victim’s DNA sample & statistical testing to distinguish between: H 0 : victim is not in case group H 1 : victim is in case group (victim has the disease) – Attack Assumptions (some implicit): N case & N ctrl are known (usually published) Entities are independent Prior: Pr[t ∊ T] = N case / (N case + N ctrl ) typically ½ in evaluations – Attacks taken seriously! (some statistics removed from open databases) 3 Differential Privacy with Bounded Priors11 1 Homer et al. “Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high- density SNP genotyping microarrays”. PLoS genetics Wang et al. “Learning Your Identity and Disease from Research Papers: Information Leaks in Genome Wide Association Study”. CCS ’ Zerhouni and Nabel. "Protecting aggregate genomic data”. Science. 2008

CCS’15 – Denver, ColoradoOctober 15, 2015 Achieving PMP for Bounded Priors Recall: (ε, B )-PMP: – Posterior bounded in terms of prior and e ε These bounds are tight iff Pr[t ∊ T] ∊ {0,1} For bounded priors Pr[t ∊ T] ∊ [a,b]: – Multiplicative factor is smaller – Perturbation required to achieve ε-PMP depends on [a,b] – Minimal perturbation required when a = b = ½ – Actual utility gain to be evaluated Differential Privacy with Bounded Priors12

CCS’15 – Denver, ColoradoOctober 15, 2015 Outline Data Privacy and Membership Disclosure – Differential Privacy (DP) – Positive Membership Privacy (PMP) – Prior-Belief Families and Equivalence between DP and PMP Bounded Priors – Modeling Adversaries with Limited Background Knowledge – Inference Attacks for Genome-Wide Association Studies (GWAS) Evaluation – Perturbation Mechanisms for GWAS – Trading Privacy, Medical Utility and Cost Differential Privacy with Bounded Priors13

CCS’15 – Denver, ColoradoOctober 15, 2015 Evaluation Statistical Privacy for GWAS: – Laplace / Exponential mechanisms for chi-square statistics 1,2,3 What we want to achieve: – Privacy: ε-PMP for The adversarial setting of Homer et al., Wang et al. Compared to an adversary with an unbounded prior belief – Utility: High probability of outputting the correct SNPs – Low cost: Good privacy-utility tradeoff for small studies 4 (N ≃ 2000) Differential Privacy with Bounded Priors14 1 Uhler, Slavkovic, and Fienberg. “Privacy-Preserving Data Sharing for Genome-Wide Association Studies”. Journal of Privacy and Confidentiality Yu et al. “Scalable privacy-preserving data sharing methodology for genome-wide association studies”. Journal of biomedical informatics Johnson and Shmatikov. “Privacy-preserving Data Exploration in Genome-wide Association Studies”. KDD ’ Spencer et al. “Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip”. PLoS genetics. 2009

CCS’15 – Denver, ColoradoOctober 15, 2015 Evaluation GWAS simulation with 8532 SNPs, 2 associated SNPs – Variable sample size N (N case = N ctrl ) – Satisfy PMP for ε = ln(1.5) – Mechanism protects against adversary with unbounded prior B must satisfy ε-DP – Mechanism ’ protects against adversary with bounded prior (a=b=½) It is sufficient for ’ to satisfy ε’-DP for ε’ = ln(2) Differential Privacy with Bounded Priors15

CCS’15 – Denver, ColoradoOctober 15, 2015 Evaluation Exponential mechanism from 1 : Differential Privacy with Bounded Priors16 1 Johnson and Shmatikov. “Privacy-preserving Data Exploration in Genome-wide Association Studies”. KDD ’

CCS’15 – Denver, ColoradoOctober 15, 2015 Conclusion Membership privacy is easier to guarantee for adversaries with bounded priors We can tailor privacy mechanisms to specific attacks/threats – For GWAS: known attacks rely on assumptions on adversary’s prior – Evaluations can tell us if a reasonable privacy/utility tradeoff is even possible 1 Future Work: “differential privacy in practice” – Investigate the privacy/utility tradeoff for “state of the art” attacks What values of ε are appropriate ? – Scenario 1: all attacks are defeated by DP, and high utility remains Are there stronger inference attacks 2 ? – Scenario 2: defeating the attacks destroys utility  Are there more efficient DP mechanisms? Differential Privacy with Bounded Priors17 1 Fredrikson et al. "Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing." Proceedings of USENIX Security Sankararaman et al. "Genomic privacy and limits of individual detection in a pool." Nature genetics. 2009