Presentation is loading. Please wait.

Presentation is loading. Please wait.

De-anonymizing Genomic Databases Using Phenotypic Traits Humbert et al. Proceedings on Privacy Enhancing Technologies 2015 (2) :99-114.

Similar presentations


Presentation on theme: "De-anonymizing Genomic Databases Using Phenotypic Traits Humbert et al. Proceedings on Privacy Enhancing Technologies 2015 (2) :99-114."— Presentation transcript:

1 De-anonymizing Genomic Databases Using Phenotypic Traits Humbert et al. Proceedings on Privacy Enhancing Technologies 2015 (2) :99-114

2 Synopsis This paper investigates the potential threat of de-anonymizing genomic databases with the purpose of acquiring sensitive information concerning an individual’s traits (e.g. susceptibility to disease). They implement an Identification attack and a Perfect Match attack. The attacker also attempts to predict the individual’s susceptibility to Alzheimer's disease. Lastly, the influence of the number of included phenotypic traits on the attack performance is noted. My focus is the validity of the authors’ concluding statements.

3 Focus Two statements made by the authors are examined closer later in the presentation. They are: “our results demonstrate that the more distinguishable two individuals are, the more successful the perfect matching is. This leads us to conclude that the matching risk will continuously increase with the progress of genomic knowledge.” “our results demonstrate the serious de-anonymization threat currently posed to individuals sharing their SNPs in genomic databases” In order to understand the context of these statements a quick overview of the two attack types and the respective results is given.

4 Genotype & Phenotype A gene is a region of DNA that encodes the production of a specific protein. The expression of this protein is observed as a trait in the individual. Genotype = set of genes. Phenotype = set of characteristics. The attack exploits this inherent link. In particular they focus on SNPs and their occurrence in relation to various characteristics.

5 Identification Attack This is done by the attacker observing a single person and to the best of their ability recording the person’s observable phenotypic traits. The attacker gains access to a genomic database and tries to match the person’s phenotype with the correct genotype. The genotypes are ranked based on the probability of the phenotype given the genotype. With the goal being that the highest ranking is the person’s genotype. How is the performance measured? The number of times the top ranking genotype was the correct match. Results: 13% in supervised case, 5% in unsupervised case (pop 80) Results: 52% in supervised case, 44% in unsupervised case (pop 10) –unlikely?

6 Perfect Match Attack The attacker has access to a genomic database and the collection of corresponding phenotypes. The objective is to match every genotype with one phenotype. This can be visualized as a weighted bipartite graph with the edges between a given genotype and phenotype vertex representing the log-likelihood between the them. In this case they used the Blossom algorithm to maximize the sum of the weighted edges. How is performance measured? By the ratio of correctly matched pairs. Results: 16% in supervised case, 8% in unsupervised case (pop 80) Results: 65% in supervised case, 58% in unsupervised case (pop 10)

7 Conclusions There are two quotes that concern the future and current threat level of the discussed attacks. Firstly in relation to the potential future weakness: “our results demonstrate that the more distinguishable two individuals are, the more successful the perfect matching is. This leads us to conclude that the matching risk will continuously increase with the progress of genomic knowledge.” A few points should be considered in regard to this quote: 1.The authors do not mention the probable increase in number of genomes being sequenced and shared that is associated with genomic research development. It is seen in their perfect matching results that the greater the genome database the lower the matching performance. Will this cancel out the advantage of the increase in genomic knowledge? 2.The likelihood of an Identification Attack vs Perfect Match.

8 “serious” “currently” Secondly they describe the situation as: “our results demonstrate the serious de-anonymization threat currently posed to individuals sharing their SNPs in genomic databases” This is the core focus of my presentation. I consider the attack performance as currently too low to be nervous. It is possible that future attempts after further genomic development will increase this risk. Claim: Even if the attack success was higher it is unclear if this will effect discrimination based on the results.

9 Certainty of Identification Unlike other attacks the attack cannot be repeated until a desirable verified outcome is achieved, because how does one verify that they have the right match. Following from this, since the purpose of the attack is not purely for identification means, as suggested it could be used to discriminate against the individual. Intuitively the accuracy of the match may have to be considerably higher than it is because the majority of people will be less likely to ‘discriminate’ if they are unsure about the match.

10 Legal barriers The skill and effort required to carry out such an attack would deter a lot of people. Counterpoint: if this was provided as a service, almost like a background check then people would not need the skills to do the job themselves and essentially it would be easy access to this sensitive information if you had the money for it. Counter-counter: With various laws surrounding the privacy of these genomes, would such a service ever be openly available to the general public without having legal action taken? Note: How could any information obtained illegally be used in any obvious way, where reasoning for the actions needs to be explicitly stated e.g. many types of insurance.

11 Motivation It may be idealistic to think that people are generally becoming more accepting of each others differences, but I think this plays a part in the threat of attack. It would be different if exposing this information had a clear personal gain to the attacker but the attack appears to only effect the person being de-anonymized. If a ‘vigilante’ attacker were to de-anonymize multiple genomes and make them public then anyone viewing this information may unknowingly ‘discriminate’. I cannot see a reason one might do this?

12 Last remarks Ultimately I am not concerned about this implementation method posing a current threat to genomic privacy. This is not to say that future developments in both the attack and genomic research will not increase the threat. This is entirely possible but does not override the conclusions made in this paper concerning the current threat level.


Download ppt "De-anonymizing Genomic Databases Using Phenotypic Traits Humbert et al. Proceedings on Privacy Enhancing Technologies 2015 (2) :99-114."

Similar presentations


Ads by Google