Part II: Potential Genetic Privacy Risks

Slides:



Advertisements
Similar presentations
Lecture 2 Strachan and Read Chapter 13
Advertisements

SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
MALD Mapping by Admixture Linkage Disequilibrium.
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
1 FSTL4 and SEMA5A are associated with alcohol dependence: meta- analysis of two genome-wide association studies Kesheng Wang, PhD Department of Biostatistics.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Human Genetics Overview.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display Human Genetics Concepts and Applications Seventh Edition.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Allele. Alternate form of a gene gene variant autosome.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
Experimental Design and Data Structure Supplement to Lecture 8 Fall
The 1 st Competition on Critical Assessment of Data Privacy and Protection The privacy workshop is jointly sponsored by iDASH (U54HL108460) and the collaborating.
Methods in genome wide association studies. Norú Moreno
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Time to Encrypt our DNA? Stuart Bradley Humbert, M., Huguenin, K., Hugonot, J., Ayday, E., Hubaux, J. (2015). De-anonymizing genomic databases using phenotypic.
De-anonymizing Genomic Databases Using Phenotypic Traits Humbert et al. Proceedings on Privacy Enhancing Technologies 2015 (2) :
PCR Y.Martinez, LSHS, 2014 DIRECTIONS: COPY NOTES IN ORANGE.
Audrey R. Chapman, Ph.D. October 22, 2014
Brendan Burke and Kyle Steffen. Important New Tool in Genomic Medicine GWAS is used to estimate disease risk and test SNPs( the most common type of genetic.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Single Nucleotide Polymorphisms (SNPs
SNPs and complex traits: where is the hidden heritability?
Metagenomic Species Diversity.
Genomic Analysis: GWAS
Press report 13/10/ publications selected.
Common variation, GWAS & PLINK
Ø Novel approaches for linkage mapping in dairy cattle
MOLECULAR MARKERS.
All rights Reserved Cengage/NGL/South-Western © 2016.
Gil McVean Department of Statistics
Constrained Hidden Markov Models for Population-based Haplotyping
Genetic Research in Addicted Individuals and their Families
upstream vs. ORF binding and gene expression?
All rights Reserved Cengage/NGL/South-Western © 2016.
Quantitative traits Lecture 13 By Ms. Shumaila Azam
Genome Wide Association Studies using SNP
Genetic Privacy in the Era of Personal Genomics
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Gene Hunting: Design and statistics
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
THE HUMAN GENOME Molecular Genetics.
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Genome-wide Associations
Genome-wide Association Studies
Beyond GWAS Erik Fransen.
Linking Genetic Variation to Important Phenotypes
By Michael Fraczek and Caden Boyer
Applications of DNA Analysis
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Chapter 7 Multifactorial Traits
Exercise: Effect of the IL6R gene on IL-6R concentration
Part III: Relevant ethics and regulations for protecting genetic privacy Good morning! Today I am going to talk about a software program recently developed.
Genetic Privacy in the Era of Personal Genomics
Genetics & Heredity.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Medical genomics BI420 Department of Biology, Boston College
Perspectives from Human Studies and Low Density Chip
Medical genomics BI420 Department of Biology, Boston College
Rodney White Conference on Financial Decisions and Asset Markets
BF528 - Whole Genome Sequencing and Genomic Variation
GWAS-eQTL signal colocalisation methods
Introduction to Genetic Association Studies
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Part II: Potential Genetic Privacy Risks Good morning! Today I am going to talk about a software program recently developed by me and my colleagues, PEAKS, for de novo sequencing and protein identification. 1

Genetic Privacy

Genetic Privacy Breaching genetic privacy techniques: identity or trait attacks by aggregating data such as genotypes, genome- wide association study (GWAS) statistics, DNA sequences, and metadata. Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

Infringement of Genetic Privacy Identity attack Personal identifiable information Trait attack Trait information associated with genomic information (e.g. genotypes/sequences) Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

Genetic Privacy Risks Identity tracing attacks Attribute disclosure attacks via DNA Completion attacks Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

Identity Tracing attacks The goal of identity tracing attacks is to uniquely identify an anonymous DNA sample using quasi-identifiers – residual pieces of information that are embedded in the dataset. Searching with meta-data Identity tracing by genealogical triangulation Identity tracing by phenotypic prediction Identity tracing by side-channel leaks Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

Searching with meta-data Unrestricted demographic information conveys substantial power for identity tracing. Health Insurance Portability and Accountability Act (HIPAA) Privacy rule Pedigree structures contain rich information, especially when large kinships are available. Another vulnerability of pedigrees is combining demographic quasi-identifiers across records to boost identity tracing despite HIPAA protections. Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

Identity tracing by genealogical triangulation Genetic genealogy attracts millions of individuals interested in their ancestry or in discovering distant relatives. One potential route of identity tracing is surname inference from Y-chromosome data The main limitation of surname inference is that haplotype matching relies on comparing Y chromosome Short Tandem Repeats (Y-STRs).  An open research question is the utility of non Y chromosome markers for genealogical triangulation. Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

Identity tracing by phenotypic prediction Predictions of visible phenotypes from genetic data could serve as quasi-identifiers for identity tracing. Twin studies have estimated high heritabilities for various visible traits such as height and facial morphology. Age prediction is possible from DNA specimens derived from blood samples. But the applicability of these DNA-derived quasi- identifiers for identity tracing has yet to be demonstrated. Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

Identity tracing by side-channel leaks Side-channel attacks exploit quasi-identifiers that are unintentionally encoded in the database building blocks and structure rather than the actual data that is meant to be public. The mechanism to generate database accession numbers can also leak personal information Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

Attribute disclosure attacks via DNA (ADAD) ADAD attack:  The adversary gains access to the DNA sample of the target. He or she uses the identified DNA to search genetic databases with sensitive attributes. A match between the identified DNA and the database links the person and the attribute. The simplest scenario The summary statistic scenario The gene expression scenario Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

ADAD: the simplest scenario The adversary can simply match the genotype data that is associated with the identity of the individual and the genotype data that is associated with the attribute. Such an attack requires only a small number of autosomal single nucleotide polymorphisms (SNPs). ADAD is a theoretical vulnerability of virtually any individual level DNA-derived omics dataset such as RNA- seq and personal proteomics. Genome-wide association studies (GWAS) are highly vulnerable to ADAD. Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

ADAD: the summary statistic scenario With the target genotypes in the case group, the allele frequencies will be positively biased towards the target genotypes compared to the allele frequencies of the general population. The actual risk of ADAD has been the subject of intense debate. Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

ADAD: the gene expression scenario NIH’s Gene Expression Omnibus (GEO) publicly hold hundreds of thousands of gene expression profiles. ADAD technique: The method starts with a training step that employs a standard expression quantitative trait loci (eQTL) analysis with a reference dataset.  Next, the algorithm scans the public expression profiles.  Last, the algorithm matches the target’s genotype with the inferred allelic distributions of each expression profile and tests the hypothesis that the match is random. This ADAD technique has the potential for relatively high accuracy in ideal conditions.

Completion attacks Genotype imputation: Jim Watson’s predisposition for Alzheimer’s disease from the ApoE locus despite masking of this gene In the basic setting, the adversary obtains access to a single genetic dataset of a known individual. He then exploits this information to estimate genetic predispositions for relatives whose genetic information is inaccessible. This attack is taking advantage of self-identified genetic datasets from OpenSNP.org Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

Completion attacks In the advanced setting,  the adversary has access to the genealogical and genetic information of multiple relatives of the target. The algorithm finds relatives of the target that donated their DNA to the reference panel and that reside on a unique genealogical path that includes the target. Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409.

Homer’s attack Homer’s attack motivated the NIH to move the genotype and phenotype data from public domain to controlled access through dbGaP. This paper use high-density single nucleotide polymorphism (SNP) genotyping microarrays to demonstrate the ability to accurately and robustly determine whether individuals are in a complex genomic DNA mixture. Homer N, Szelinger S, Redman M, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 2008;