Download presentation
Presentation is loading. Please wait.
Published byAshley Cooper Modified over 6 years ago
1
GenoGuard: Protecting Genomic Data against Brute-Force Attacks
36th IEEE Symposium on Security and Privacy May 18, 2015 Zhicong Huang1, Erman Ayday2, Jacques Fellay3, Jean-Pierre Hubaux1, Ari Juels4 1School of Computer and Communication Sciences, EPFL 2Bilkent University 3School of Life Sciences, EPFL 4Cornell Tech (Jacobs)
2
The Genomic Avalanche Is Coming…
3
Things are Moving CS giants start proposing genome-related services
Google Genomics (API to store, process, explore, and share DNA data) IBM Research (computational genomics) Microsoft Research (genomic research in collaboration with Sanger Center) Apple (the ResearchKit program) Global Alliance for Genomics & Health Definition of a common framework for effective, responsible and secure sharing of genomic and clinical data Security Working Group: security infrastructure policy and technology
4
Background: Genomics
5
Genomics Background Single Nucleotide Variant (SNV)
4 million SNVs per individual A subset of 50 million SNVs that have been discovered Major allele (0), minor allele (1) Correlations between SNVs Genotype data (to be protected) Consider a pair of chromosomes (out of 23 pairs in human genome) For an SNV position, encoded as the number of minor alleles (0, 1, or 2)
6
Genomic Privacy Can the protection survive longer? High sensitivity
Predisposition to disease Genetic discrimination: Denial of access to health insurance, education, and employment. Long-term data value But attackers’ computing power keeps increasing GATTACA, 1997 Movie Can the protection survive longer?
7
Background: Honey Encryption [1]
[1] A. Juels, T. Ristenpart. Honey Encryption: Security Beyond the Brute-Force Bound. EUROCRYPT, 2014.
8
Honey Encryption Messages: Passwords: Conventional Encryption
Gene_Q eiKangLpkandlf Passwords: Correct password Wrong password Conventional Encryption ddUoIOkesLhKnb Honey Encryption Passwords: Correct password Wrong password Messages: Gene_Q Gene_R The threat of brute-force attacks is mitigated by using honey encryption.
9
Distribution-Transforming Encoder (DTE)
Encrypt the seed Encoder 00 “Gene_Q” p = 1/4 01 “Gene_R” p = 1/4 Uniform Non-uniform 10 “Gene_S” p = 1/2 11 Message Probability 1/4 1/2 Seed space Decoder Message space pm: original message distribution pd: DTE message distribution (the probability of getting a message by decoding a randomly picked seed)
10
Distribution-Transforming Encoder (DTE)
00 “Gene_Q” p = 1/4 01 “Gene_R” p = 1/4 Uniform Non-uniform 10 “Gene_S” p = 1/2 11 Seed space Decoder Message space pd = pm
11
GenoGuard
12
DTE on Genome Sequences
n is the number of SNVs Probability of a sequence M=(m1 , m2 , … , mn), where mi is from the set {0, 1, 2}: To encode the sequence : divide seed space in a traversal of the sequence Subsequence: (m1, m2, …, mn-1)
13
Example pd = pm [0, 2L - 1] Number of SNVs: n = 3 P(m1 = 0) = 0.6
Sequence: M = (0, 2, 1) P(m1 = 0) = 0.6 P(m2 = 2 | m1 = 0) = 0.1 P(m3 = 1 | M1,2 = (0, 2)) = 0.3 [0, 2L - 1] L-bit representation
14
Finite-Precision DTE pm ≈ pd pd under L-bit representation pm
Probability increases Probability decreases pd under L-bit representation Probability pm Probability pm ≈ pd Message Message
15
A Simple Brute-Force Attack
One correct password among 1000 passwords Compute the probability of each decrypted sequence Conventional encryption GenoGuard
16
Defense against Attacks with Phenotypic Traits (1)
Ancestry (The data is taken from the HapMap project1) Principal component 1 Principal component 2 Asian European African 1An international project for finding genetic variation with human disease
17
Defense against Attacks with Phenotypic Traits (2)
Decrypt as European (red symbols “+”) Ancestry Different DTEs for different ancestries Wrong password Wrong sequence Yet, consistent ancestry Other traits Privacy loss quantification Principal component 1 Principal component 2
18
Performance
19
Performance of Encryption (Decryption)
Chromosome Time (seconds) Chromosome length (# of SNVs) 100’000 Python Cluster with 22 nodes 3.40GHz Intel Xeon CPU E31270 64-bit Linux Debian system Encoding time Number of SNVs Linear cost depending on the number of SNVs 0.5 ms / SNV Decoding time Password-based encryption (decryption) time
20
Conclusion and Future Work
GenoGuard provides protection of genomic data against brute-force attacks A privacy-preserving solution by taking into account the special characteristics of genomic data Future investigation Extension to other sensitive sequential data More investigation for privacy erosion under data model evolution Source code: To learn more about genomic privacy A website for the research community:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.