From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Topics 1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008); 2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008); 3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)
Topics 1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008); 2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008); 3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)
Genome-wide association studies
Source: Hardy et al. Genomewide Association Studies and Human Disease. N Eng J Med, 360: ; 17 (2009)
Genome-wide association studies
Source: Hardy et al. Genomewide Association Studies and Human Disease. N Eng J Med, 360: ; 17 (2009)
Human Genome Research Over Time Source: Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);
Linkage Analysis Source: genome.wellcome.ac.uk
Human Genome Research Over Time Information source: Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);
Initial Lessons 1. “Candidate gene” approach inadequate
Initial Lessons 2. Mutations that cause disease often change protein structure Hemoglobin subunit beta mutation in sickle-cell disease.
Initial Lessons 3. Loci often have many rare disease-causing alleles
Initial Lessons 4. 90% of sites of genetic variation are common variants in the population
Common disease – common variant (CDCV) Common polymorphisms (minor allele freq > 1%) contributes to susceptibility to disease.
Common disease – common variant (CDCV) Common polymorphisms (minor allele freq > 1%) contributes to susceptibility to disease. We can use GWAS to see how common variants contribute to disease. Gives us ideas on which positions to investigate.
Tag SNPs Source: The International HapMap Consortium The International HapMap Project Nature Vol /
Tag SNPs Source: The International HapMap Consortium The International HapMap Project Nature Vol /
Tag SNPs Source: The International HapMap Consortium The International HapMap Project Nature Vol /
GWAS – General Lessons Learned 1. GWAS work : ~two dozen reproducible associations : >150
GWAS – General Lessons Learned 1. GWAS work : ~two dozen reproducible associations : > Effect-sizes are modest for common variants (mostly increases by )
GWAS – General Lessons Learned 1. GWAS work : ~two dozen reproducible associations : > Effect-sizes are modest for common variants (mostly increases by ) 3. Power to detect associations has been low
GWAS – General Lessons Learned 1. GWAS work : ~two dozen reproducible associations : > Effect-sizes are modest for common variants (mostly increases by ) 3. Power to detect associations has been low 4. Association studies have identified regions rather than causal genes
GWAS – General Lessons Learned 1. GWAS work : ~two dozen reproducible associations : > Effect-sizes are modest for common variants (mostly increases by ) 3. Power to detect associations has been low 4. Association studies have identified regions rather than causal genes 5. A single locus may contain more than one risk variant
GWAS – General Lessons Learned 1. GWAS work : ~two dozen reproducible associations : > Effect-sizes are modest for common variants (mostly increases by ) 3. Power to detect associations has been low 4. Association studies have identified regions rather than causal genes 5. A single locus may contain more than one risk variant 6. A single locus may contain both common and rare variants
GWAS – General Lessons Learned 1. GWAS work : ~two dozen reproducible associations : > Effect-sizes are modest for common variants (mostly increases by ) 3. Power to detect associations has been low 4. Association studies have identified regions rather than causal genes 5. A single locus may contain more than one risk variant 6. A single locus may contain both common and rare variants 7. There is great variation between ethnic groups
Sample size required For P < 10 −8. Source: Altshuler et al.
Sample size required For P < 10 −8. Source: Altshuler et al.
GWAS – Common Diseases: Lessons Learned 1. The risk for loci already identified by GWAS is currently underestimated due to currently unknown mutations.
GWAS – Common Diseases: Lessons Learned 1. The risk for loci already identified by GWAS is currently underestimated due to currently unknown mutations. 2. Many more disease loci remain to be found. (low statistical power with studies so far)
GWAS – Common Diseases: Lessons Learned 1. The risk for loci already identified by GWAS is currently underestimated due to currently unknown mutations. 2. Many more disease loci remain to be found. (low statistical power with studies so far) 3. Some loci will only contain rare variants (won’t be found using common polymorphisms)
Disease Risk VS Disease Mechanism Primary value of genetic mapping is not risk prediction but gaining knowledge about mechanisms of disease.
GWAS: The Path Ahead 1.Increased sample sizes: 1000 cases,1000 controls, 20% variant, 1.3 increase in risk 1% power 5000 cases, 5000 controls 98% power
GWAS: The Path Ahead 1.Increased sample sizes: 1000 cases,1000 controls, 20% variant, 1.3 increase in risk 1% power 5000 cases, 5000 controls 98% power 2.Different ancestry groups
GWAS: The Path Ahead 1.Increased sample sizes: 1000 cases,1000 controls, 20% variant, 1.3 increase in risk 1% power 5000 cases, 5000 controls 98% power 2.Different ancestry groups 3.Find rare mutations in suspect loci 1000 genomes project
Topics 1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008); 2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008); 3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)
C-Reactive Protein (CRP) Elevated levels of CRP lead to increased risk of ischemic heart disease and cerebrovascular disease Studies of >40,000 people with ~4,000 with disease Followed for years Measured levels of CRP Genotyping for four CRP polymorphisms
Results Increased CRP levels CRP PolymorphismsIncreased CRP levels Increased likelihood of disease
Results Increased CRP levels CRP PolymorphismsIncreased CRP levels Increased likelihood of disease
Zacho et al.
Results Increased CRP levels CRP PolymorphismsIncreased CRP levels Increased likelihood of disease
Increased CRP levels lead to increased disease risk Zacho et al.
Increased CRP levels lead to increased disease risk
Increased CRP levels lead to increased disease risk
Increased CRP levels lead to increased disease risk
Results Increased CRP levels CRP PolymorphismsIncreased CRP levels Increased likelihood of disease ?
Zacho et al.
Results Increased CRP levels CRP PolymorphismsIncreased CRP levels Increased likelihood of disease
Zacho et al.
Possible issues with this study CRP polymorphisms could lead to higher plasma levels of less active CRP (unlikely, polymorphisms not near coding region) Limitations of the four individual studies Variability with race (only white participants studied) Potential lack of statistical power
Conclusion Genetic variants that lead to increased CRP levels do not lead to an increased risk of heart-disease (and cerebrovascular disease) Increased CRP levels are likely to be a marker rather than cause for disease.
Topics 1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008); 2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008); 3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)
Statistical methods to evaluate markers in genetic testing 1.ROC (receiver operating characteristic) curves 2.Logistic regression
Genetic testing for the public Sources: 23andme.com decodeme.com navigenics.com
Classification based statistics Evaluates how well one can distinguish between cases and controls.
Disease YES NO Diagnostic Test Negative Positive
True Positive Disease YES NO Diagnostic Test Negative Positive
True Positive False Positive Disease YES NO Diagnostic Test Negative Positive
True Positive False Positive False Negative Disease YES NO Diagnostic Test Negative Positive
True Positive False Positive False Negative True Negative Disease YES NO Diagnostic Test Negative Positive
True Positive False Positive False Negative True Negative Disease YES NO Diagnostic Test Negative Positive Sensitivity = TP TP + FN
True Positive False Positive False Negative True Negative Disease YES NO Diagnostic Test Negative Positive Sensitivity = TP TP + FN With this test, how many people that are actually ill will I catch?
True Positive False Positive False Negative True Negative Disease YES NO Diagnostic Test Negative Positive Sensitivity = TP TP + FN Specificity = TN TN + FP
True Positive False Positive False Negative True Negative Disease YES NO Diagnostic Test Negative Positive Sensitivity = TP TP + FN Specificity = TN TN + FP With this test, will I tell too many people they might be ill?
ROC curves Source: medcalc.be Important measure: area under the curve (AUC)
Odds Ratios (risk analysis) The odds of an event occurring in one group The odds of an event occurring in the control group
Odds Ratios (risk analysis) The odds of an event occurring in one group The odds of an event occurring in the control group event less likely in first group < 1 < event more likely in first group equal likelihood
Take-home message OR “Strong association (low p-value) does not guarantee effective discrimination between cases and controls (classification). Excellent classification (high AUC) does not guarantee good prediction of actual risk” - Jakobsdottir et al.
Source: newscientist.com