GenoGuard: Protecting Genomic Data against Brute-Force Attacks

Slides:



Advertisements
Similar presentations
Lecture 2 Strachan and Read Chapter 13
Advertisements

SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
Your Family Health History
Introducing genes Genetics is the study of inherited traits and their variations. Genetics is not genealogy! Genealogy is the study of family relationships.
Heredity and Genetics. Heredity 1. Is it possible for two parents with blue eyes to have a brown eyed child? 2. Is it possible for two parents with brown.
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Honey Encryption: Security Beyond the Brute-Force Bound
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
CATALYST Recall and Review: – What are chromosomes? – What are genes? – What are alleles? How do these terms relate to DNA? How do these terms relate to.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Ahmed Osama Research Assistant. Presentation Outline Winc- Nile University- Privacy Preserving Over Network Coding 2  Introduction  Network coding 
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Click to edit Master subtitle style 2/23/10 Time and Space Optimization of Document Content Classifiers Dawei Yin, Henry S. Baird, and Chang An Computer.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
6.4 Traits, Genes, and Alleles KEY CONCEPT Genes encode proteins that produce a diverse range of traits.
Understanding The Basis For Biotechnology Research.
Time to Encrypt our DNA? Stuart Bradley Humbert, M., Huguenin, K., Hugonot, J., Ayday, E., Hubaux, J. (2015). De-anonymizing genomic databases using phenotypic.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Genetics: Introductory Notes. Principal Points: Genetics can be divided into 4 subdisciplines – Transmission genetics – passage of genes from generation.
Contribution of second order evolution to evolutionary algorithms Virginie LEFORT July 11 th.
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
GenoGuard: Protecting Genomic Data against Brute-Force Attacks Zhicong Huang, Erman Ayday, Jacques Fellay, Jean-Pierre Hubaux, Ari Juels Presented by Chuong.
Let’s see what you know! 23,000, microscope, nucleus, chromosomes, divide, DNA, proteins, code, genes __________.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
LCA1 Erman Ayday, Jean Louis Raisaro and Jean-Pierre Hubaux Privacy-Enhancing Technologies for Medical Tests and Personalized Medicine Laboratory for Computer.
1 Mendelian Genetics. 2 Gregor Mendel The Father of Genetics.
Dr. Kenneth Stanley September 11, 2006
Constrained Hidden Markov Models for Population-based Haplotyping
Every living organism inherits a blueprint for life from its parents.
Genetics Definitions Definition Key Word
Genetics Vocabulary Name: ________
Basics of Genetic Algorithms (MidTerm – only in RED material)
Fundamental Concepts for Genetics
The same gene can have many versions.
Unit 5 “Mendelian Genetics”
Unit 5 “Mendelian Genetics”
The same gene can have many versions.
The same gene can have many versions.
The same gene can have many versions.
The student is expected to: 6A identify components of DNA, and describe how information for specifying the traits of an organism is carried in the DNA.
The same gene can have many versions.
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
Genomics for Regional Development
Hiding Information, Encryption, and Bypasses
GENETIC ALGORITHMS & MACHINE LEARNING
genetic variation is meaningful only in the context of a population
Mendel’s Laws of Heredity
The same gene can have many versions.
The same gene can have many versions.
The same gene can have many versions.
Basics of Genetic Algorithms
CATALYST Recall and Review: How do these terms relate to DNA?
Section 6.4 “Traits & Genes”.
Searching for solutions: Genetic Algorithms
By: Mandy Butler, Ying-Tsu Loh and Cheryl Ann Peterson
The same gene can have many versions.
Shuhua Xu, Wei Huang, Ji Qian, Li Jin 
The same gene can have many versions.
Reminder The AP Exam registration is open in Naviance. The Exam is on Monday, May 13. I’ll let you know when the next test/homework will be.
The same gene can have many versions.
SNPs and CNPs By: David Wendel.
Investigation 2 Part 2 Vocabulary
Fundamental Concepts for Genetics
Analysis of protein-coding genetic variation in 60,706 humans
BC Science Connections 10
The same gene can have many versions.
MENDEL AND THE GENE IDEA Section A: Gregor Mendel’s Discoveries
The same gene can have many versions.
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
Presentation transcript:

GenoGuard: Protecting Genomic Data against Brute-Force Attacks 36th IEEE Symposium on Security and Privacy May 18, 2015 Zhicong Huang1, Erman Ayday2, Jacques Fellay3, Jean-Pierre Hubaux1, Ari Juels4 1School of Computer and Communication Sciences, EPFL 2Bilkent University 3School of Life Sciences, EPFL 4Cornell Tech (Jacobs)

The Genomic Avalanche Is Coming…

Things are Moving CS giants start proposing genome-related services Google Genomics (API to store, process, explore, and share DNA data) IBM Research (computational genomics) Microsoft Research (genomic research in collaboration with Sanger Center) Apple (the ResearchKit program) Global Alliance for Genomics & Health Definition of a common framework for effective, responsible and secure sharing of genomic and clinical data Security Working Group: security infrastructure policy and technology http://genomicsandhealth.org/our-work/working-groups/security-working-group/work-products

Background: Genomics

Genomics Background Single Nucleotide Variant (SNV) 4 million SNVs per individual A subset of 50 million SNVs that have been discovered Major allele (0), minor allele (1) Correlations between SNVs Genotype data (to be protected) Consider a pair of chromosomes (out of 23 pairs in human genome) For an SNV position, encoded as the number of minor alleles (0, 1, or 2)

Genomic Privacy Can the protection survive longer? High sensitivity Predisposition to disease Genetic discrimination: Denial of access to health insurance, education, and employment. Long-term data value But attackers’ computing power keeps increasing GATTACA, 1997 Movie Can the protection survive longer?

Background: Honey Encryption [1] [1] A. Juels, T. Ristenpart. Honey Encryption: Security Beyond the Brute-Force Bound. EUROCRYPT, 2014.

Honey Encryption Messages: Passwords: Conventional Encryption Gene_Q eiKangLpkandlf Passwords: Correct password Wrong password Conventional Encryption ddUoIOkesLhKnb Honey Encryption Passwords: Correct password Wrong password Messages: Gene_Q Gene_R The threat of brute-force attacks is mitigated by using honey encryption.

Distribution-Transforming Encoder (DTE) Encrypt the seed Encoder 00 “Gene_Q” p = 1/4 01 “Gene_R” p = 1/4 Uniform Non-uniform 10 “Gene_S” p = 1/2 11 Message Probability 1/4 1/2 Seed space Decoder Message space pm: original message distribution pd: DTE message distribution (the probability of getting a message by decoding a randomly picked seed)

Distribution-Transforming Encoder (DTE) 00 “Gene_Q” p = 1/4 01 “Gene_R” p = 1/4 Uniform Non-uniform 10 “Gene_S” p = 1/2 11 Seed space Decoder Message space pd = pm

GenoGuard

DTE on Genome Sequences n is the number of SNVs Probability of a sequence M=(m1 , m2 , … , mn), where mi is from the set {0, 1, 2}: To encode the sequence : divide seed space in a traversal of the sequence Subsequence: (m1, m2, …, mn-1)

Example pd = pm [0, 2L - 1] Number of SNVs: n = 3 P(m1 = 0) = 0.6 Sequence: M = (0, 2, 1) P(m1 = 0) = 0.6 P(m2 = 2 | m1 = 0) = 0.1 P(m3 = 1 | M1,2 = (0, 2)) = 0.3 [0, 2L - 1] L-bit representation

Finite-Precision DTE pm ≈ pd pd under L-bit representation pm Probability increases Probability decreases pd under L-bit representation Probability pm Probability pm ≈ pd Message Message

A Simple Brute-Force Attack One correct password among 1000 passwords Compute the probability of each decrypted sequence Conventional encryption GenoGuard

Defense against Attacks with Phenotypic Traits (1) Ancestry (The data is taken from the HapMap project1) Principal component 1 Principal component 2 Asian European African 1An international project for finding genetic variation with human disease

Defense against Attacks with Phenotypic Traits (2) Decrypt as European (red symbols “+”) Ancestry Different DTEs for different ancestries Wrong password Wrong sequence Yet, consistent ancestry Other traits  Privacy loss quantification Principal component 1 Principal component 2

Performance

Performance of Encryption (Decryption) Chromosome Time (seconds) Chromosome length (# of SNVs) 100’000 Python Cluster with 22 nodes 3.40GHz Intel Xeon CPU E31270 64-bit Linux Debian system Encoding time Number of SNVs Linear cost depending on the number of SNVs 0.5 ms / SNV Decoding time Password-based encryption (decryption) time

Conclusion and Future Work GenoGuard provides protection of genomic data against brute-force attacks A privacy-preserving solution by taking into account the special characteristics of genomic data Future investigation Extension to other sensitive sequential data More investigation for privacy erosion under data model evolution Source code: https://github.com/acs6610987/GenoGuard To learn more about genomic privacy A website for the research community: https://genomeprivacy.org