Real-time Protection for Open Beacon Network

Slides:



Advertisements
Similar presentations
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Advertisements

450 PRESENTATION NURSING TURNOVER.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Multiple testing correction
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
Haplotype-Based Noise- Adding Approach to Genomic Data Anonymization Yongan Zhao, Xiaofeng Wang and Haixu Tang School of Informatics and Computing, Indiana.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007.
High dimensional genomic data, identifiability, and query-response Haixu Tang School of Informatics and Computing Indiana University, Bloomington.
Bringing Genomics Home Your DNA: A Blueprint for Better Health Dr. Brad Popovich Chief Scientific Officer Genome British Columbia March 24, 2015 / Vancouver,
Section A Confidence Interval for the Difference of Two Proportions Objectives: 1.To find the mean and standard error of the sampling distribution.
ORGANIZING IT SERVICES AND PERSONNEL (PART 1) Lecture 7.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Internet of Things in Industries
Confidence Intervals and Hypothesis Testing Mark Dancox Public Health Intelligence Course – Day 3.
Amia.org Session Title Twitter: #iHealth16 Speaker Institution.
Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies Florian Tramèr, Zhicong Huang, Erman Ayday,
The opportunities and challenges of sharing genomics data with the pharmaceutical industry Shahid Hanif, Head of Health Data & Outcomes, ABPI DNA digest.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
LCA1 Erman Ayday, Jean Louis Raisaro and Jean-Pierre Hubaux Privacy-Enhancing Technologies for Medical Tests and Personalized Medicine Laboratory for Computer.
WMO WIS strategy – Life cycle data management WIS strategy – Life cycle data management Matteo Dell’Acqua.
Lecture Notes and Electronic Presentations, © 2013 Dr. Kelly Significance and Sample Size Refresher Harrison W. Kelly III, Ph.D. Lecture # 3.
Sample size calculation
GISELA & CHAIN Workshop Digital Cultural Heritage Network
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Of Mice and Men The Future of Healthcare AI Roy Smythe, MD
Genetic Research in Addicted Individuals and their Families
Confidence Intervals and p-values
Evaluation of IR Systems
Speaker Institution Twitter: #AMIA2017
Sample Size Estimation
Introduction C.Eng 714 Spring 2010.
Genome Wide Association Studies using SNP
Bret J. Gardner University of Nebraska Medical Center
Lincoln R. Sheets, MD, PhD University of Missouri Twitter: #AMIA2017
Hong Kang, PhD (Presenter) Zhiguo Yu, PhD Yang Gong, MD, PhD
[ March 9, 2017] [ Bill Bowles, Audit Supervisor]
Ying He Wuhan University of Technology Twitter: #AMIA2017
Ying He Wuhan University of Technology
Differential Privacy in Practice
Kevin Read, MLIS, MAS NYU Langone Health
Position specific effect of SNP on signal ratio from long oligonucleotide CGH microarrays; most single probe aberrations represent genuine genomic variants.
1 Department of Engineering, 2 Department of Mathematics,
A Short Tutorial on Causal Network Modeling and Discovery
Joseph D. Romano, MPhil Columbia University Twitter: #AMIA2017 #S23
The ‘V’ in the Tajima D equation is:
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
WIS Strategy – WIS 2.0 Submitted by: Matteo Dell’Acqua(CBS) (Doc 5b)
Speaker Institution Twitter: #AMIA2018
Genetic Drift, followed by selection can cause linkage disequilibrium
Part III: Relevant ethics and regulations for protecting genetic privacy Good morning! Today I am going to talk about a software program recently developed.
Web Mining Department of Computer Science and Engg.
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Sampling and Power Slides by Jishnu Das.
GISELA & CHAIN Workshop Digital Cultural Heritage Network
A maximum likelihood estimation and training on the fly approach
False discovery rate estimation
Published in: IEEE Transactions on Industrial Informatics
The Secure Contingency Plan
Privacy Risks from Genomic Data-Sharing Beacons
Margaret R. Punch, MD Michigan Medicine
Cooperative AP Discovery
Regulatory Perspective of the Use of EHRs in RCTs
Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach  Zhiyu Wan, Yevgeniy Vorobeychik, Weiyi Xia, Ellen Wright.
Xing Hua, Haiming Xu, Yaning Yang, Jun Zhu, Pengyuan Liu, Yan Lu 
Open Data Sharing and its Statistical Limitations
Shumin Guo, Keke Chen Data Intensive Analysis and Computing (DIAC) Lab
Detecting Treatment by Biomarker Interaction with Binary Endpoints
OPIsrael And The Value Of Next Generation SOCs
Presentation transcript:

Real-time Protection for Open Beacon Network Open Data and Big Science S07 Diyue Bu, Xiaofeng Wang, Haixu Tang School of Informatics, Computing and Engineering Indiana University Bloomington

Disclosure I and my spouse/partner have no relevant relationships with commercial interests to disclose. AMIA 2018 | amia.org

Learning Objectives After participating in this session the learner should be better able to: Have an idea about the Beacon network. Know the potential attacks on the Beacon network. Know the mechanism of real-time flipping mitigation method. AMIA 2018 | amia.org

Presentation Outline Background Introduction Beacon Network Attacks on Beacon Network Existing Mitigation Methods Proposed Method: Real-time Flipping (RTF) Method Experiments & Results Secure-Beacon Implementation AMIA 2018 | amia.org

The Beacon Network https://beacon-network.org/ “The Beacon Network is a search engine across the world's public beacons. It enables global discovery of genetic mutations, federated across a large and growing network of shared genetic datasets.” Reference: Global Alliance for Genomics and Health. A federated ecosystem for sharing genomic, clinical data. Science 352, 1278–1280 (2016). What's a Beacon? Beacon is a genetic mutation sharing platform developed by the Global Alliance for Genomics and Health. A beacon is web service that any institution can implement to share genetic data. A beacon answers questions of the form "Do you have information about the following mutation?" and responds with one of "Yes" or "No", among potentially more information. A site offering this service is called a "beacon". This open web service is designed both to be technically simple while providing data generators options for distributing data through proportional safeguards. AMIA 2018 | amia.org

Attacks on Beacon Network Shringarpure and Bustamante (SB) Attack [1] “Optimal” Attack [2] Inference attack using linkage disequilibrium (LD) & Markov chain model [3] Reference: 1. Shringarpure, Suyash S., and Carlos D. Bustamante. "Privacy risks from genomic data-sharing beacons." The American Journal of Human Genetics 97.5 (2015): 631-646. 2. Raisaro, Jean Louis, et al. "Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks." Journal of the American Medical Informatics Association (2017) 3. Nora von Thenen, et al. Inference Attacks Against Genomic Data-Sharing Beacons. GenoPri17 linkage disequilibrium AMIA 2018 | amia.org

SB (LRT) Attack Given the responses of n queries H0: The queried victim’s genome is not in the target database. H1: The queried victim’s genome is in the target database. The power of the test: re-identification risk of an individual genome in a genomic database Reference: Shringarpure, Suyash S., and Carlos D. Bustamante. "Privacy risks from genomic data-sharing beacons." The American Journal of Human Genetics 97.5 (2015): 631-646. Power of test: indicates the confidence of the attackers can conclude that the victim (with queried variants) is present in the target database the probability of correctly reject the null hypothesis over multiple tests AMIA 2018 | amia.org

Attacks on Beacon Network Shringarpure and Bustamante (SB) Attack [1] An inference attack based on log-likelihood ratio test “Optimal” Attack [2] Query variants in rare-first order Inference attack using linkage disequilibrium (LD) & Markov chain model [3] Reference: 1. Shringarpure, Suyash S., and Carlos D. Bustamante. "Privacy risks from genomic data-sharing beacons." The American Journal of Human Genetics 97.5 (2015): 631-646. 2. Raisaro, Jean Louis, et al. "Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks." Journal of the American Medical Informatics Association (2017) 3. Nora von Thenen, et al. Inference Attacks Against Genomic Data-Sharing Beacons. GenoPri17 linkage disequilibrium AMIA 2018 | amia.org

Previous Mitigation Methods Random Flipping (RF) Method [1] Randomly mask a proportion (ℇ) of rare SNPs Query Budget Method [1] remove individual’s genome information if high re-identification risk detected Reference: 1. Raisaro, Jean Louis, et al. "Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks." Journal of the American Medical Informatics Association (2017) Rare SNPs: carried by only one individual in the database Mask/flip SNPs: a mechanism used by several mitigation method, which is to switch the answer of a query from Yes to No. We note that we do not switch the answer of a query from No to Yes, under which case the search result may confuse the researchers AMIA 2018 | amia.org

Previous Mitigation Methods Strategic Flipping (SF) Method [1] Mask k percent of variants with largest discriminative power Eliminating Random Positions & Biased Randomized Response [2] Pitfall: flip out-of-target variants Reference: 1. Wan, Zhiyu, et al. "Controlling the signal: Practical privacy protection of genomic data sharing through Beacon services." BMC medical genomics 2. Al Aziz, Md Momin, et al. "Aftermath of bustamante attack on genomic beacon service." BMC medical genomics AMIA 2018 | amia.org

Real-time Flipping (RTF) Method Mask different proportion of rare SNPs from each individual More efficient masking Real-time performance Better utility Safer environment Lower re-identification risk Utility: # of correctly answered queries/total # of queries The smaller the pvalue, the larger the noise AMIA 2018 | amia.org

Experiments Phase 3 of 1000 Genomes Project Beacon database: 1,235 non-relative individuals Control group: 300 genomes Perform LRT attack in the order of Random Rare-first (“optimal” attack) [1] Discriminative-first [2] Typical user (statistics from Beacon Browser logs) [1] Reference: 1. Raisaro, Jean Louis, et al. “Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks.” Journal of the American Medical Informatics Association (2017) 2. Wan, Zhiyu, et al. "Controlling the signal: Practical privacy protection of genomic data sharing through Beacon services." BMC medical genomics  Assume afs are known, ExAC Control group is used to validate to power of LRT attack AMIA 2018 | amia.org

Experiments Statistics Beacon database 5,046,666 variants from Chr10 and Chr21 2,002,246 (39.7%) rare variants Parameters Random flipping method: ℇ = 0.15 (flip 15% rare SNPs) Strategic flipping method: k = 5 (flip 5%SNPs with largest discriminative power) 3,992,219 variants from Chr10; 1,054,447 variants from Chr21 1,588,903 (39.8%) rare variants; 413,343 (39.2%) rare variants AMIA 2018 | amia.org

Re-identification Risk (power of LRT test) Under rare first order: RF’s power increases to 1.0 (100% re-identification risk [1]) when 1000 rare SNPs queried, RTF remains around 0.3 (low re-identification risk [1]). Under Other three query orders: SF’s power increase to 1.0 (100% re-identification risk [1]) when 1000 rare SNPs queried, RTF remains around 0.3 (low re- identification risk [1]). Reference: 1. Raisaro, Jean Louis, et al. "Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks." Journal of the American Medical Informatics Association (2017) Chr1 & Chr21 for random & typical user AMIA 2018 | amia.org

The percentage of flipped rare SNPs Rare first: RF’s power increases to 1.0 when 1000 rare SNPs queried, RTF remains around 0.3. Other three: SF’s power increase to 1.0 when 1000 rare SNPs queried, RTF remains around 0.3. AMIA 2018 | amia.org

The percentage of flipped rare SNPs Rare first: RF’s power increases to 1.0 when 1000 rare SNPs queried, RTF remains around 0.3. Other three: SF’s power increase to 1.0 when 1000 rare SNPs queried, RTF remains around 0.3. AMIA 2018 | amia.org

The percentage of flipped SNPs The percentage of flipped rare SNPs under different query patterns AMIA 2018 | amia.org

The percentage of flipped SNPs The percentage of flipped rare SNPs under different query patterns AMIA 2018 | amia.org

Secure-Beacon Workflow AMIA 2018 | amia.org

Secure-Beacon Interface AMIA 2018 | amia.org

AMIA 2018 | amia.org

 Acknowledgements NIH R01HG007078 and U01EB023685 NSF CNS1408874 Indiana University Initiative of Precision Health AMIA 2018 | amia.org

AMIA is the professional home for more than 5,400 informatics professionals, representing frontline clinicians, researchers, public health experts and educators who bring meaning to data, manage information and generate new knowledge across the research and healthcare enterprise. AMIA 2018 | amia.org

Email me at: diybu(at)indiana.edu Thank you! Email me at: diybu(at)indiana.edu