De-anonymizing Genomic Databases Using Phenotypic Traits Humbert et al. Proceedings on Privacy Enhancing Technologies 2015 (2) :99-114.

Slides:



Advertisements
Similar presentations
Good Evaluation. Good Evaluation Should … be simple be fair be purposeful be related to the curriculum assess skills & strategies set priorities use multiple.
Advertisements

Accident and Incident Investigation
BADMINTON SKILLS AND TECHNIQUES
Social Learning / Imitation
Chapter 4 Design Approaches and Methods
What are the Ethics of Research?
ICT Ethics 2 ICT 139.
Risk Analysis & Management. Phases Initial Risk Assessment Risk Analysis Risk Management and Mitigation.
Jack Holbrook Inquiry-based Teaching/Learning (IBSE)
Fifth Workshop on Link Analysis, Counterterrorism, and Security. or Antonio Badia David Skillicorn.
Chapter 4 Job Analysis Discuss the nature of job analysis, including what it is and how it’s used. Use at least three methods of collecting job analysis.
Herman Aguinis, University of Colorado at Denver Prentice Hall, Inc. © 2006 Measuring Results and Behaviors: Overview  Measuring Results  Measuring Behaviors.
Planning Value of Planning What to consider when planning a lesson Learning Performance Structure of a Lesson Plan.
What are competencies – some definitions ……… Competencies are the characteristics of an employee that lead to the demonstration of skills & abilities,
Project Workshops Results and Evaluation. General The Results section presents the results to demonstrate the performance of the proposed solution. It.
Science and Engineering Practices
Developing the Marketing Plan
TAYLOR HOWARD The Employment Interview: A Review of Current Studies and Directions for Future Research.
Simple brief By: Ayat Farhat
Planning an Internal Audit JM García Merced. Brainstorm.
Developing a Partner Reward Strategy – to build competitive advantage Peter Scott Consulting
Multiple testing correction
1 A Presentation of ‘Bayesian Models for Gene Expression With DNA Microarray Data’ by Ibrahim, Chen, and Gray Presentation By Lara DePadilla.
Copyright  2010 Pearson Education Canada / J A McLachlan Chapter Nine Making Ethical Decisions.
Goal Setting The foundation of a plan for success includes goal setting and the achievement of goals.
The Audit Process Tahera Chaudry March Clinical audit A quality improvement process that seeks to improve patient care and outcomes through systematic.
Report Exemplar. Step 1: Purpose State the purpose of your investigation. Pose an appropriate comparison investigative question and do not forget to include.
Classroom Assessment A Practical Guide for Educators by Craig A
Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
S7: Audit Planning. Session Objectives To explain the need for planning To explain the need for planning To outline the essential elements of planning.
A COMPETENCY APPROACH TO HUMAN RESOURCE MANAGEMENT
The student will demonstrate an understanding of how scientific inquiry and technological design, including mathematical analysis, can be used appropriately.
Audit Planning. Session Objectives To explain the need for planning To outline the essential elements of planning process To finalise the audit approach.
How to read a scientific paper
Learning and Motivation Dr. K. A. Korb University of Jos.
CHAPTER 1 Understanding RESEARCH
Review: Alternative Approaches II What three approaches did we last cover? What three approaches did we last cover? Describe one benefit of each approach.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Applications in Acquisition Decision-Making Process.
Study Designs for Clinical and Epidemiological Research Carla J. Alvarado, MS, CIC University of Wisconsin-Madison (608)
An overview of multi-criteria analysis techniques The main role of the techniques is to deal with the difficulties that human decision-makers have been.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
TEACHER EFFECTIVENESS INITIATIVE VALUE-ADDED TRAINING Value-Added Research Center (VARC)
1 Math 413 Mathematics Tasks for Cognitive Instruction October 2008.
SCIENCE PROCESS SKILLS
Scientific Methods and Terminology. Scientific methods are The most reliable means to ensure that experiments produce reliable information in response.
Unit II – Leadership Skills Chapter 2 - Leadership Section 1 – Leadership Behavior and Styles.
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
Time to Encrypt our DNA? Stuart Bradley Humbert, M., Huguenin, K., Hugonot, J., Ayday, E., Hubaux, J. (2015). De-anonymizing genomic databases using phenotypic.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Decision-Making. Decision Making ▪Decision Making - is choosing among two or more alternatives (choices) ▪Begins with identification of a problem and.
The Top 5 Tips for a Successful Job Interview. 1. Prepare and Over-Prepare Have a thorough knowledge of the organization and position for which you are.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
GCSE CHILD DEVELOPMENT. Summary of Assessment Unit 1 Written Paper 1½ hours (40% final mark, one tier only) Unit 2 Controlled Assessment – Child Study.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Prepared By: Razif Razali 1 TMK 264: COMPUTER SECURITY CHAPTER SIX : ADMINISTERING SECURITY.
INFOMGP Student names and numbers Papers’ references Title.
Who will need to attend? This is Due to several reasons: A new training program needs to be implemented 1. Increase in Returns 2. Numerous Customer Complaint.
WHAT IS IT? ANSWER THIS ON A MINI- WHITEBOARD WITHOUT LOOKING IT UP. Psychokenesis (PK)
Unit 3 Seminar.  Used to predict acceptable or unacceptable behavior  Helps to assess level of skills/knowledge/ characteristics applicants have  Reduce.
INQUIRY: CODING IN BC MONTESSORI SCHOOLS. BACKGROUND CODING IN SCHOOLS MONTESSORI PRINCIPLES IMPLEMENTATION AGENDA.
Investigate Plan Design Create Evaluate (Test it to objective evaluation at each stage of the design cycle) state – describe - explain the problem some.
R ESEARCH PLANNING. Research planning is a process through which we transform our ideas into a well-planned, ethical and realistic research project.
Written Examination Prep
Classroom Assessment A Practical Guide for Educators by Craig A
Coaching.
Research methods (2013) Other research methods paper going on the website Inferential statistics pack.
Presentation transcript:

De-anonymizing Genomic Databases Using Phenotypic Traits Humbert et al. Proceedings on Privacy Enhancing Technologies 2015 (2) :99-114

Synopsis This paper investigates the potential threat of de-anonymizing genomic databases with the purpose of acquiring sensitive information concerning an individual’s traits (e.g. susceptibility to disease). They implement an Identification attack and a Perfect Match attack. The attacker also attempts to predict the individual’s susceptibility to Alzheimer's disease. Lastly, the influence of the number of included phenotypic traits on the attack performance is noted. My focus is the validity of the authors’ concluding statements.

Focus Two statements made by the authors are examined closer later in the presentation. They are: “our results demonstrate that the more distinguishable two individuals are, the more successful the perfect matching is. This leads us to conclude that the matching risk will continuously increase with the progress of genomic knowledge.” “our results demonstrate the serious de-anonymization threat currently posed to individuals sharing their SNPs in genomic databases” In order to understand the context of these statements a quick overview of the two attack types and the respective results is given.

Genotype & Phenotype A gene is a region of DNA that encodes the production of a specific protein. The expression of this protein is observed as a trait in the individual. Genotype = set of genes. Phenotype = set of characteristics. The attack exploits this inherent link. In particular they focus on SNPs and their occurrence in relation to various characteristics.

Identification Attack This is done by the attacker observing a single person and to the best of their ability recording the person’s observable phenotypic traits. The attacker gains access to a genomic database and tries to match the person’s phenotype with the correct genotype. The genotypes are ranked based on the probability of the phenotype given the genotype. With the goal being that the highest ranking is the person’s genotype. How is the performance measured? The number of times the top ranking genotype was the correct match. Results: 13% in supervised case, 5% in unsupervised case (pop 80) Results: 52% in supervised case, 44% in unsupervised case (pop 10) –unlikely?

Perfect Match Attack The attacker has access to a genomic database and the collection of corresponding phenotypes. The objective is to match every genotype with one phenotype. This can be visualized as a weighted bipartite graph with the edges between a given genotype and phenotype vertex representing the log-likelihood between the them. In this case they used the Blossom algorithm to maximize the sum of the weighted edges. How is performance measured? By the ratio of correctly matched pairs. Results: 16% in supervised case, 8% in unsupervised case (pop 80) Results: 65% in supervised case, 58% in unsupervised case (pop 10)

Conclusions There are two quotes that concern the future and current threat level of the discussed attacks. Firstly in relation to the potential future weakness: “our results demonstrate that the more distinguishable two individuals are, the more successful the perfect matching is. This leads us to conclude that the matching risk will continuously increase with the progress of genomic knowledge.” A few points should be considered in regard to this quote: 1.The authors do not mention the probable increase in number of genomes being sequenced and shared that is associated with genomic research development. It is seen in their perfect matching results that the greater the genome database the lower the matching performance. Will this cancel out the advantage of the increase in genomic knowledge? 2.The likelihood of an Identification Attack vs Perfect Match.

“serious” “currently” Secondly they describe the situation as: “our results demonstrate the serious de-anonymization threat currently posed to individuals sharing their SNPs in genomic databases” This is the core focus of my presentation. I consider the attack performance as currently too low to be nervous. It is possible that future attempts after further genomic development will increase this risk. Claim: Even if the attack success was higher it is unclear if this will effect discrimination based on the results.

Certainty of Identification Unlike other attacks the attack cannot be repeated until a desirable verified outcome is achieved, because how does one verify that they have the right match. Following from this, since the purpose of the attack is not purely for identification means, as suggested it could be used to discriminate against the individual. Intuitively the accuracy of the match may have to be considerably higher than it is because the majority of people will be less likely to ‘discriminate’ if they are unsure about the match.

Legal barriers The skill and effort required to carry out such an attack would deter a lot of people. Counterpoint: if this was provided as a service, almost like a background check then people would not need the skills to do the job themselves and essentially it would be easy access to this sensitive information if you had the money for it. Counter-counter: With various laws surrounding the privacy of these genomes, would such a service ever be openly available to the general public without having legal action taken? Note: How could any information obtained illegally be used in any obvious way, where reasoning for the actions needs to be explicitly stated e.g. many types of insurance.

Motivation It may be idealistic to think that people are generally becoming more accepting of each others differences, but I think this plays a part in the threat of attack. It would be different if exposing this information had a clear personal gain to the attacker but the attack appears to only effect the person being de-anonymized. If a ‘vigilante’ attacker were to de-anonymize multiple genomes and make them public then anyone viewing this information may unknowingly ‘discriminate’. I cannot see a reason one might do this?

Last remarks Ultimately I am not concerned about this implementation method posing a current threat to genomic privacy. This is not to say that future developments in both the attack and genomic research will not increase the threat. This is entirely possible but does not override the conclusions made in this paper concerning the current threat level.