Real-time Protection for Open Beacon Network Open Data and Big Science S07 Diyue Bu, Xiaofeng Wang, Haixu Tang School of Informatics, Computing and Engineering Indiana University Bloomington
Disclosure I and my spouse/partner have no relevant relationships with commercial interests to disclose. AMIA 2018 | amia.org
Learning Objectives After participating in this session the learner should be better able to: Have an idea about the Beacon network. Know the potential attacks on the Beacon network. Know the mechanism of real-time flipping mitigation method. AMIA 2018 | amia.org
Presentation Outline Background Introduction Beacon Network Attacks on Beacon Network Existing Mitigation Methods Proposed Method: Real-time Flipping (RTF) Method Experiments & Results Secure-Beacon Implementation AMIA 2018 | amia.org
The Beacon Network https://beacon-network.org/ “The Beacon Network is a search engine across the world's public beacons. It enables global discovery of genetic mutations, federated across a large and growing network of shared genetic datasets.” Reference: Global Alliance for Genomics and Health. A federated ecosystem for sharing genomic, clinical data. Science 352, 1278–1280 (2016). What's a Beacon? Beacon is a genetic mutation sharing platform developed by the Global Alliance for Genomics and Health. A beacon is web service that any institution can implement to share genetic data. A beacon answers questions of the form "Do you have information about the following mutation?" and responds with one of "Yes" or "No", among potentially more information. A site offering this service is called a "beacon". This open web service is designed both to be technically simple while providing data generators options for distributing data through proportional safeguards. AMIA 2018 | amia.org
Attacks on Beacon Network Shringarpure and Bustamante (SB) Attack [1] “Optimal” Attack [2] Inference attack using linkage disequilibrium (LD) & Markov chain model [3] Reference: 1. Shringarpure, Suyash S., and Carlos D. Bustamante. "Privacy risks from genomic data-sharing beacons." The American Journal of Human Genetics 97.5 (2015): 631-646. 2. Raisaro, Jean Louis, et al. "Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks." Journal of the American Medical Informatics Association (2017) 3. Nora von Thenen, et al. Inference Attacks Against Genomic Data-Sharing Beacons. GenoPri17 linkage disequilibrium AMIA 2018 | amia.org
SB (LRT) Attack Given the responses of n queries H0: The queried victim’s genome is not in the target database. H1: The queried victim’s genome is in the target database. The power of the test: re-identification risk of an individual genome in a genomic database Reference: Shringarpure, Suyash S., and Carlos D. Bustamante. "Privacy risks from genomic data-sharing beacons." The American Journal of Human Genetics 97.5 (2015): 631-646. Power of test: indicates the confidence of the attackers can conclude that the victim (with queried variants) is present in the target database the probability of correctly reject the null hypothesis over multiple tests AMIA 2018 | amia.org
Attacks on Beacon Network Shringarpure and Bustamante (SB) Attack [1] An inference attack based on log-likelihood ratio test “Optimal” Attack [2] Query variants in rare-first order Inference attack using linkage disequilibrium (LD) & Markov chain model [3] Reference: 1. Shringarpure, Suyash S., and Carlos D. Bustamante. "Privacy risks from genomic data-sharing beacons." The American Journal of Human Genetics 97.5 (2015): 631-646. 2. Raisaro, Jean Louis, et al. "Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks." Journal of the American Medical Informatics Association (2017) 3. Nora von Thenen, et al. Inference Attacks Against Genomic Data-Sharing Beacons. GenoPri17 linkage disequilibrium AMIA 2018 | amia.org
Previous Mitigation Methods Random Flipping (RF) Method [1] Randomly mask a proportion (ℇ) of rare SNPs Query Budget Method [1] remove individual’s genome information if high re-identification risk detected Reference: 1. Raisaro, Jean Louis, et al. "Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks." Journal of the American Medical Informatics Association (2017) Rare SNPs: carried by only one individual in the database Mask/flip SNPs: a mechanism used by several mitigation method, which is to switch the answer of a query from Yes to No. We note that we do not switch the answer of a query from No to Yes, under which case the search result may confuse the researchers AMIA 2018 | amia.org
Previous Mitigation Methods Strategic Flipping (SF) Method [1] Mask k percent of variants with largest discriminative power Eliminating Random Positions & Biased Randomized Response [2] Pitfall: flip out-of-target variants Reference: 1. Wan, Zhiyu, et al. "Controlling the signal: Practical privacy protection of genomic data sharing through Beacon services." BMC medical genomics 2. Al Aziz, Md Momin, et al. "Aftermath of bustamante attack on genomic beacon service." BMC medical genomics AMIA 2018 | amia.org
Real-time Flipping (RTF) Method Mask different proportion of rare SNPs from each individual More efficient masking Real-time performance Better utility Safer environment Lower re-identification risk Utility: # of correctly answered queries/total # of queries The smaller the pvalue, the larger the noise AMIA 2018 | amia.org
Experiments Phase 3 of 1000 Genomes Project Beacon database: 1,235 non-relative individuals Control group: 300 genomes Perform LRT attack in the order of Random Rare-first (“optimal” attack) [1] Discriminative-first [2] Typical user (statistics from Beacon Browser logs) [1] Reference: 1. Raisaro, Jean Louis, et al. “Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks.” Journal of the American Medical Informatics Association (2017) 2. Wan, Zhiyu, et al. "Controlling the signal: Practical privacy protection of genomic data sharing through Beacon services." BMC medical genomics Assume afs are known, ExAC Control group is used to validate to power of LRT attack AMIA 2018 | amia.org
Experiments Statistics Beacon database 5,046,666 variants from Chr10 and Chr21 2,002,246 (39.7%) rare variants Parameters Random flipping method: ℇ = 0.15 (flip 15% rare SNPs) Strategic flipping method: k = 5 (flip 5%SNPs with largest discriminative power) 3,992,219 variants from Chr10; 1,054,447 variants from Chr21 1,588,903 (39.8%) rare variants; 413,343 (39.2%) rare variants AMIA 2018 | amia.org
Re-identification Risk (power of LRT test) Under rare first order: RF’s power increases to 1.0 (100% re-identification risk [1]) when 1000 rare SNPs queried, RTF remains around 0.3 (low re-identification risk [1]). Under Other three query orders: SF’s power increase to 1.0 (100% re-identification risk [1]) when 1000 rare SNPs queried, RTF remains around 0.3 (low re- identification risk [1]). Reference: 1. Raisaro, Jean Louis, et al. "Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks." Journal of the American Medical Informatics Association (2017) Chr1 & Chr21 for random & typical user AMIA 2018 | amia.org
The percentage of flipped rare SNPs Rare first: RF’s power increases to 1.0 when 1000 rare SNPs queried, RTF remains around 0.3. Other three: SF’s power increase to 1.0 when 1000 rare SNPs queried, RTF remains around 0.3. AMIA 2018 | amia.org
The percentage of flipped rare SNPs Rare first: RF’s power increases to 1.0 when 1000 rare SNPs queried, RTF remains around 0.3. Other three: SF’s power increase to 1.0 when 1000 rare SNPs queried, RTF remains around 0.3. AMIA 2018 | amia.org
The percentage of flipped SNPs The percentage of flipped rare SNPs under different query patterns AMIA 2018 | amia.org
The percentage of flipped SNPs The percentage of flipped rare SNPs under different query patterns AMIA 2018 | amia.org
Secure-Beacon Workflow AMIA 2018 | amia.org
Secure-Beacon Interface AMIA 2018 | amia.org
AMIA 2018 | amia.org
Acknowledgements NIH R01HG007078 and U01EB023685 NSF CNS1408874 Indiana University Initiative of Precision Health AMIA 2018 | amia.org
AMIA is the professional home for more than 5,400 informatics professionals, representing frontline clinicians, researchers, public health experts and educators who bring meaning to data, manage information and generate new knowledge across the research and healthcare enterprise. AMIA 2018 | amia.org
Email me at: diybu(at)indiana.edu Thank you! Email me at: diybu(at)indiana.edu