SNP Discovery and Analysis Application to Association Studies

Slides:



Advertisements
Similar presentations
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Advertisements

Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
SNP Resources: Finding SNPs Discovery and Databases Mark J. Rieder, PhD SeattleSNPs Workshop March 20-21, 2006.
Medical Resequencing Debbie Nickerson Department of Genome Sciences University of Washington.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Assessing the Impact of Candidate Gene Variation on Quantitative Phenotypes Dana C. Crawford, PhD University of Washington March 21, 2006.
Picking SNPs Application to Association Studies Dana Crawford, PhD SeattleSNPs PGA University of Washington March 20, 2006.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences NIEHS SNPs Workshop Jan 10-11,
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD Robert J. Livingston, PhD NIEHS Variation Workshop January 30-31, 2005.
Medical Resequencing: Future Innovations and Sequence-Based Association Analysis Genotype-Based Warfarin Dose Prediction Mark Rieder Department of Genome.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
A Pharmacogenomic Approach to Understanding the Warfarin Drug Response Mark J. Rieder, PhD.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Tools For Association Studies: Quantitative Trait Analysis Mark J. Rieder, PhD Department of Genome Sciences CWRU, April 11, 2008.
Selecting TagSNPs in Candidate Genes for Genetic Association Studies Shehnaz K. Hussain, PhD, ScM Assistant Professor Department of Epidemiology, UCLA.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Genome Variations & GWAS
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Association of polymorphisms in the cytochrome P450 CYP2C9 with warfarin dose requirement and risk of bleeding complications Mark Bleackley MEDG 505 March.
Pharmacogenomics Case study 1: Warfarin. Warfarin overview  Warfarin is an anticoagulant drug which inhibits vitamin K 2,3-epoxide reductase.  Warfarin.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Genetic Variations Lakshmi K Matukumalli. Human – Mouse Comparison.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
New Insights on Warfarin: How CYP 2C9 & VKORC1 Information May Improve Benefit-Risk Ratio Brian F. Gage, MD, MSc Associate Professor of Medicine, Washington.
Molecular & Genetic Epi 217 Association Studies
Pharmacogenetics & Pharmacogenomics Personalized Medicine.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Gene Hunting: Linkage and Association
SeattleSNPs Variation Discovery Resource Materials prepared by: Mary E. Mangan, PhD Updated: Q Version 1.
Personalized Medicine Dr. M. Jawad Hassan. Personalized Medicine Human Genome and SNPs What is personalized medicine? Pharmacogenetics Case study – warfarin.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
SNP Discovery and Genotyping Workshop
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Common variation, GWAS & PLINK
Of Sea Urchins, Birds and Men
Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM Epidemiology 243: Molecular.
Gene Hunting: Design and statistics
Case Study #2 Session 1, Day 3, Liu
Pharmacogenomics Genes and Drugs.
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Selecting a Maximally Informative Set of Single-Nucleotide Polymorphisms for Association Analyses Using Linkage Disequilibrium  Christopher S. Carlson,
Presentation transcript:

SNP Discovery and Analysis Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD SeattleSNPs PGA Morehouse University May 2, 2005 Thank you and I am pleased to be here today to share with you our work on genetic determinants of warfarin dosing, specifically we looked at the recently discovered gene vitamin K epoxide reductase (VKORC1) and its affect on normal warfarin dosing in clinical patients. I will begin by giving some background on the clinical uses of warfarin, some of the genetic variants in the cytochrome P450 2C9 gene which affect warfarin metabolism and then finally I’ll describe our work VKORC1 hapltoypes which predict warfarin dose across the normal clinical range.

Practical Aspects of SNP Association Studies SNP Discovery: Where do I find SNPs to use in my association studies? (e.g. databases, direct resequencing) SNP Selection: How do I choose SNPs that are informative? (i.e. assessing SNP correlation - linkage disequilibrium) SNP Associations: What analyses can I perform after genotyping these SNPs? (e.g. single SNP data, haplotype data) SNP Replication/Function: How is function predicted or assessed. (e.g. nonsynonymous SNPs, conserved non-coding regions (CNS) transcription factor binding sites, gene expression)

SeattleSNPs Program for Genomic Applications: Overview Aim 1: To establish a variation discovery resource capable of comprehensive resequencing of candidate genes related to HLBS. Biological Focus: Inflammation Genes and Pathways: Coagulation, Complement, Cytokines Interacting Partners

SNPs in Candidate Genes SeattleSNPs SNPs in Candidate Genes Average Gene Size - 26.5 kb ~ Compare 2 haploid - 1 in 1,200 bp ~130 SNPs (200 bp) - 15,000,000 SNPs ~ 44 SNPs > 0.05 MAF (600 bp) - 6,000,000 SNPs

SeattleSNPs PGA: Candidate Gene SNP Resource 4.6 Mb in 47 individuals = 216 Mb total sequence Define sequence diversity - catalogue all SNPs Select “optimal” tagSNPs sets Determine haplotype structure Provide necessary baseline data for association studies

Warfarin Pharmacogenetics Background Warfarin characteristics Pharmacokinetics/Pharmacodynamics Discovery of VKORC1 VKORC1 - SNP Discovery VKORC1 - SNP Selection (tagSNPs) VKORC1 - SNP Testing SNP/Haplotype Inference Haplotype Inference, Testing VKORC1 - SNP Replication/Function

Pharmacogenomics as a Model for Association Studies Clear genotype-phenotype link intervention variable response Pharmacokinetics - 5x variation Quantitative intervention and response drug dose, response time, metabolism rate, etc. Target/metabolism of drug generally known gene target that can be tested directly with response Reduce variability and identify outliers. Prospective testing Personalized Medicine

Warfarin Background Commonly prescribed oral anti-coagulant In 2003, 21.2 million prescriptions were written for warfarin (Coumadin) Prescribed following MI, atrial fibrillation, stroke, venous thrombosis, prosthetic heart valve replacement, and following major surgery Difficult to determine effective dosage Narrow therapeutic range - Monitoring of prothrombin time (INR) - 2.0 - 3.0 Large inter-individual variation Very effective rat poison! First let me begin by giving a brief background on warfarin and its signficance in clinical medicine. Point 1, point 2 - and at any given time in the US about 2 million people are on chronic therapeutic warfarin, and this Is generally prescribed for, point 3, and also prophalatically following major surgery. Two important issues come when warfarin is used in the clinic - first a patient must be brought to a stable maintenance dose - and there is a large inter-individual variation here - 1 - 10 mg/d - patients are generally started at 5 mg/d and then titrated from that point. Once the maintenace dose has been achieved the issue the comes to keeping them with their very narrow therapeutic range. And this improtant of course because excess dosing leads to hemorrhage and low dose can not provide protection against clotting. This whole dosing procedure then is tightly monitored using the prothrombin or bleeding time as measured by the INR and kept with the normalized range of 2-3.

Add warfarin dose distribution Ave: 5.2 mg/d n = 186 European-American 30x dose variability Add warfarin dose distribution However, as I mentioned achieving the proper maintenance dose can be difficult due to the large inter-individual range of dose from about 1-10. Here is the data from the UW European American clinical patients we looked at this the study. Of course physician can determine the warfarin dose empirically, but really we would like to understand the pharmacokinetic and pharmacodynamic genetic influences which underlie this distribution. 20-25% Patient/Clinical/Environmental Factors Pharmacokinetic/Pharmacodynamic - Genetic

Warfarin inhibits the vitamin K cycle Epoxide Reductase  -Carboxylase (GGCX) Warfarin Inactivation CYP2C9 Pharmacokinetic Vitamin K-dependent clotting factors (FII, FVII, FIX, FX, Protein C/S/Z)

Warfarin Metabolism (Pharmacokinetics) Major pathway for termination of pharmacologic effect is through metabolism of S-warfarin in the liver by CYP2C9 CYP2C9 SNPs alter warfarin metabolism: CYP2C9*1 (WT) - normal CYP2C9*2 (Arg144Cys) - low/intermediate CYP2C9*3 (Ile359Leu) - low CYP2C9 alleles occur at a significant minor allele frequency European: *2 - 10.7% *3 - 8.5 % Asian: *2 - 0% *3 - 1-2% African-American: *2 - 2.9% *3 - 0.8% The other important factor is the CYP2C9 variant alleles occur at an appreciable frequency in the general population.

Effect of CYP2C9 Genotype on Anticoagulation-Related Outcomes (Higashi et al., JAMA 2002) WARFARIN MAINTENANCE DOSE TIME TO STABLE ANTICOAGULATION CYP2C9-WT ~90 days *2 or *3 carriers take longer to reach stable anticoagulation CYP2C9-Variant ~180 days N 127 28 4 18 3 5 mg warfarin/day And clear practical implication for the genetic effects of CYP2C9. - Variant alleles have significant clinical impact - Still large variability in warfarin dose (15-fold) in *1/*1 “controls”?

Analysis of Independent Predictors of Warfarin Dose Adapted from Gage et al., Thromb Haemost, 2004 Variable Change in Warfarin Dose P value Target INR, per 0.5 increase 21% <0.0005 BMI, per SD 14% <0.0001 Ethnicity (African-American, [Asian]) 13%, [ 10-15%] 0.003 Age, per decade 13% <0.0001 Gender, Female 12% <0.0001 Drugs (Amiodarone) 24% 0.007 CYP2C9*2, per allele 19% <0.0001 CYP2C9*3, per allele 30% <0.0001 ~ 30% of the variability in warfarin dose is explained by these factors What other candidate genes are influencing warfarin dosing?

Warfarin acts as a vitamin K antagonist Pharmacodynamic Epoxide Reductase CYP2C9 Inactivation  -Carboxylase (GGCX) Vitamin K-dependent clotting factors (FII, FVII, FIX, FX, Protein C/S/Z)

New Target Protein for Warfarin Epoxide Reductase (VKORC1)  -Carboxylase (GGCX) And encoded for the gene now named VKORC1. In this work by Rost et al that should the overt warfarin resistance was due to non-synonymous mutation in VKORC1 - that is patients needing doses at 25-50 mg/d had clear predisposing mutations. Interestingly, NO nonsynomymous mutations were found in control chromosomes. Clotting Factors (FII, FVII, FIX, FX, Protein C/S/Z) Rost et al. & Li, et al., Nature (2004) 5 kb - chr 16

Warfarin Resistance VKORC1 Polymorphisms Point out that normal warfarin dose is 5 mg/d Rost, et. al. Nature (2004) Rare non-synonymous mutations in VKORC1 causative for warfarin resistance (15-35 mg/d) NO non-synonymous mutations found in ‘control’ chromosomes (n = ~400)

VKORC1 nonsynonymous coding Common VKORC1 non-coding SNPs? Inter-Individual Variability in Warfarin Dose: Genetic Liabilities SENSITIVITY CYP2C9 coding SNPs - *3/*3 RESISTANCE VKORC1 nonsynonymous coding SNPs Frequency Common VKORC1 non-coding SNPs? 0.5 5 15 Warfarin maintenance dose (mg/day)

SNP Discovery: Resequencing VKORC1 PCR amplicons --> Resequencing of the complete genomic region 5 Kb upstream and each of the 3 exons and intronic segments; ~11 Kb SeattleSNPs PGA - pga.gs.washington.edu (24 African-Am./23 Europeans) Warfarin treated clinical patients (UWMC): 186 European Other populations: 96 European, 96 African-Am., 120 Asian

SNP Discovery: Resequencing Results Summary of PGA samples (European, n = 23) Total: 13 SNPs identified 10 common/3 rare (<5% MAF) Clinical Samples (European patients n = 186) Total: 28 SNPs identified 10 common/18 rare (<5% MAF) 15 - intronic/regulatory 7 - promoter SNPs 2 - 3’ UTR SNPs 3 - synonymous SNPs 1 - nonsynonymous - single heterozygous indiv. - highest warfarin dose = 15.5 mg/d How does the comprehensive SNP discovery compare to what was known for this gene?

SNP Discovery: dbSNP database -NCBI SNP database

SNP Discovery: dbSNP database SeattleSNPs Resequencing 28 SNPs --> 15 SNPs gene region 10 dbSNPs 8/10 confirmations 3 frequency/genotype data 7 new dbSNP entries generated by SeattleSNPs resequencing 8 dbSNPs/15 SNPs (~50%)

SNP Discovery: dbSNP database Nickerson and Kruglyak, Nature Genetics, 2001 Mar 2005 - 5.0 million (validated - 1/600 bp) 5.0/10.0 = 50% of all common SNPs (validated)!

{ GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC SNP discovery is dependent on your sample population size { GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC 2 chromosomes 0.0 0.2 0.3 0.4 0.5 0.1 1.0 Minor Allele Frequency (MAF) Fraction of SNPs Discovered 2 48 24 16 8 96

{ SNP Discovery: dbSNP database SeattleSNPs dbSNP (Perlegen/HapMap) Minor Allele Freq. (MAF) dbSNP (Perlegen/HapMap) 25% { 75% 50% Minor Allele Freq. (MAF) Rarer and population specific SNPs are found by resequencing

dbSNP: Increasing numbers of SNPs now have genotype data HapMap Phase II Perlegen Perlegen Data

Current State of dbSNP Many SNPs left to validate and characterize.

Development of a genome-wide SNP map: How many SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp Mar 2005 - 5.0 million (validated - 1/600 bp) 5.0/10.0 = 50% of all common SNPs validated! Coming Soon! 5.0 million validated SNPs with genotypes!

SNP Discovery: dbSNP database dbSNP Issues: Not comprehensive catalog (50% of SNPs) Is the data confirmed? (50% are validated) Information about allele frequency/population (50%) No information about SNP correlations (linkage disequilibrium) genotyping efficiency

Does common variation in VKORC1 have a role in determining SNP Selection: Using Linkage Disequilibrium Common SNPs VKORC1 - 28 total - 10 SNPs > 10% MAF Evaluate linkage disequilibrium (non-random association of genotype data) Warfarin Dose (mg/d) Frequency Does common variation in VKORC1 have a role in determining warfarin dose?

SNP Selection: Using Linkage Disequilibrium Site 1 Site 2 Site 1 Site 2 Maternal C A C : 50% A : 50% T G T : 50% G : 50% Paternal Possible 2-site comb. Expected Freq. Observed Freq. C A 0.5 X 0.5 = 0.25 0.50 * C G 0.5 X 0.5 = 0.25 0.01 T A 0.5 X 0.5 = 0.25 0.01 T G 0.5 X 0.5 = 0.25 0.48 * * Sites Correlated

SNP Selection: Using Linkage Disequilibrium SNP discovery data (i.e. population of samples with genotypes) Find all correlated SNPs to minimize the total number of SNPs Maintains genetic information (correlations) for that locus LD_Select - SNP tagging/binning algorithm - based on LD (r2), not haplotypes Carlson, et al. AJHG (2004)

SNP Selection: VG/LD_Select on the Web pga.gs.washington.ed/VG2

SNP Selection: tagSNP Data

SNP Selection: VKORC1 tagSNPs

SNP Testing: VKORC1 tagSNPs Five Bins to Test 381, 3673, 6484, 6853, 7566 2653, 6009 861 5808 9041 C/C C/T T/T e.g. Bin 1 - SNP 381 Bin 1 - p < 0.001 Bin 2 - p < 0.02 Bin 3 - p < 0.01 Bin 4 - p < 0.001 Bin 5 - p < 0.001 SNP x SNP interactions - haplotype analysis?

VKORC1 Summary: SNP Discovery/SNP Selection VKORC1 candidate gene for warfarin dose response SNP discovery performed using PCR/resequencing to catalog common SNPs 28 SNPs found 10 common SNPs SNP discovery using dbSNP 8/10 dbSNPs confirmed 7 new SNPs added SNP Selection using linkage disequilibrium 10 common SNPs (> 10% MAF) 5 informative SNPs for genotyping

Haplotypes in Genetic Association Studies Two main approaches with haplotypes: Haplotypes Pick tagSNPs Genotype samples Pick tagSNPs Infer haplotypes Test for association

Haplotypes in Genetic Association Studies How can you get haplotypes? What information do you get from haplotypes? How do you use haplotypes to find tagSNPs? How do you use haplotypes to test for associations?

Haplotypes – The Definition “…a unique combination of genetic markers present in a chromosome.” pg 57 in Hartl & Clark, 1997

Constructing Haplotypes A G T T G G C C C/T, A/G C/C, A/G T/T, G/G C/T, A/A Collect pedigrees Somatic cell hybrids Human Rodent Hybrid SNP 1 SNP 2 C/T A/G Allele-specific PCR

Constructing Haplotypes Examples of Haplotype Inference Software: EM Algorithm Haploview http://www.broad.mit.edu/mpg/haploview/index.php Arlequin http://lgb.unige.ch/arlequin/ PHASE v2.1 http://www.stat.washington.edu/stephens/software.html HAPLOTYPER http://www.people.fas.harvard.edu/~junliu/Haplo/docMain.htm

Haplotypes in SeattleSNPs >200 genes re-sequenced in inflammation response 2 populations: European- and African-Americans PHASEv2.0 results posted on website Interactive tool (VH1) to visualize and sort haplotypes http://pga.gs.washington.edu

Haplotypes in SeattleSNPs

Haplotypes in SeattleSNPs

Haplotypes in SeattleSNPs

Haplotypes in SeattleSNPs

Haplotypes in SeattleSNPs

Haplotypes in SeattleSNPs

Haplotypes in SeattleSNPs

Haplotypes in SeattleSNPs

Haplotypes in SeattleSNPs

Haplotypes in SeattleSNPs

Haplotypes in SeattleSNPs

Haplotypes in Genetic Association Studies Two main approaches with haplotypes: Haplotypes Pick tagSNPs Genotype samples Recombination Natural selection Population history Population demography Haplotype block definition Pick tagSNPs Infer haplotypes Test for association

Measuring Pair-wise SNP Correlations SNP correlation described by linkage disequilibrium (LD) Pair-wise measures of LD: D´ and r2 D = pAB - pApB; D´ = D/Dmax Recombination r2 = D2 f(A1)f(A2)f(B1)f(B2) Power

Using LD and Haplotypes to Pick tagSNPs r2 is inversely related to power 1/r2 1,000 cases 1,250 cases 1,000 controls r2=1.0 1,250 controls r2 = 0.80 D´ is related to recombination history D´ = 1 no recombination D´ < 1 historical recombination Example: LDSelect Example: Haplotype “blocks”

Represent most chromosomes Haplotype “Blocks” Daly et al Nat. Genet. (2001) Daly et al 2001 Strong LD Few Haplotypes Represent most chromosomes

D´ [Gabriel et al Science (2002)] Block Definitions Daly et al Nat. Genet. (2001) Daly et al 2001 D´ [Gabriel et al Science (2002)]

Block Definitions Four-gamete test: <4 haplotypes, D´=1 block To answer this question, we decided to identify blocks using all common SNPs and coding SNPs. To form blocks, we implemented the four-gamete test in the software by Zhang and Jin known as HaploBlockFinder. In this test, each pair of SNPs is tested for evidence of historical recombination. That is, for two sites, A and B, with two alleles each, if all four haplotypes are present in the population, it is considered evidence of recombination. If less than four haplotypes are present in the population, there is no evidence of recombination. A block is defined as SNPs that do not show evidence of historical recombination, and the boundaries are marked by SNPs that have evidence of recombination. I should note that this method to define blocks is very stringent, and that there are other methods used to define blocks, one of which you will hear more about later this afternoon. Using the four gamete test to define blocks, we find that for some genes, using coding variation yields similar blocks to using all common variation. However, for most genes, we observed fewer blocks when coding SNPs were used compared with all common SNPs. Overall, fewer blocks were observed as the SNP density decreased. <4 haplotypes, D´=1 block 4 haplotypes, D´<1 boundary

Haplotype Blocks and tagSNPs Identifying blocks and tagSNPs: Manually Algorithms HaploBlockFinder Haploview

Haplotype Blocks and tagSNPs IL1B: 19 SNPs (MAF >5%) 4 “common” haplotypes

Haplotype Blocks and tagSNPs Identifying blocks and tagSNPs: Manually Algorithms HaploBlockFinder HaploView

HaploBlockFinder Output VKORC1 European-Americans

HaploBlockFinder Output Haplotype blocks LD matrix VKORC1 European-Americans

LD and tagSNPs using Haploview VKORC1 European-Americans PHASEv2.1 data

Minimal set of tagSNPs based on r2

Where to Find Tagging Software HaploBlockFinder http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi Haploview http://www.broad.mit.edu/personal/jcbarret/haplo/ LDSelect http://droog.gs.washington.edu/ldSelect.html SNPtagger http://www.well.ox.ac.uk/~xiayi/haplotype/index.html TagIT http://popgen.biol.ucl.ac.uk/software.html tagSNPs http://www-rcf.usc.edu/~stram/tagSNPs.html

Haplotypes, TagSNPs, and Caveats Haplotypes are inferred Block-like structure assumed for some software Different block definitions Block boundaries sensitive to marker density Genotype savings may not be great (recombination)

Haplotypes in Genetic Association Studies Two main approaches with haplotypes: Haplotypes Pick tagSNPs Genotype samples Pick tagSNPs Infer haplotypes Test for association Genetic diversity of sample Multi-SNP analysis

Multi-SNP testing: Haplotypes Five tagSNPs (10 total SNPs) 186 warfarin patients (European) PHASE v2.1 9 haplotypes/5 common (>5%)

Multi-SNP testing: Haplotypes Test for association between haplotype and warfarin dose using multiple linear regression Adjusted for all significant covariates: age, sex, amiodarone, CYP2C9 genotype

Multi-SNP testing: Haplotypes Explore the evolutionary relationship across haplotypes (381, 3673, 6484, 6853, 7566) 5808 CCGATCTCTG-H1 A CCGAGCTCTG-H2 861 TCGGTCCGCA-H7 TAGGTCCGCA-H8 B 9041 TACGTTCGCG-H9 VKORC1 haplotypes cluster into divergent clades Patients can be assigned a clade diplotype: e.g. Patient 1 - H1/H2 = A/A Patient 2 - H1/H7 = A/B Patient 3 - H7/H9 = B/B

Independent of INR levels across all groups VKORC1 clade diplotypes show a strong association with warfarin dose Low High A/A A/B B/B * † All patients 2C9 WT patients 2C9 VAR patients AA AB BB (n = 181) (n = 124) (n = 57) Independent of INR levels across all groups

Multi-SNP testing: Haplotypes European - mean ~ 5 mg/d African-American - higher ~ 6.0-7.0 mg/d Asian - lower ~ 3.0-3.5 mg/d Hypothesis: VKORC1 haplotypes contribute to racial variability in warfarin dosing. “Control” populations: 120 Europeans 96 African-Americans 120 Asian

Multi-SNP testing: Haplotypes Explore the evolutionary relationship across populations European (CEPH) Clade Distribution B (58%) A (37%) Asian (Han) Clade Distribution Low dose phenotype A (89%) B (11%) African-American Clade Distribution High dose phenotype A (14%) B (47%) Other (39%) Clade A = Low Clade B = High

Common Errors in Association Studies Bell and Cardon (2001) Small sample size Subgroup analysis and multiple testing Random error Poorly matched control group Failure to attempt study replication Failure to detect LD with adjacent loci Overinterpreting results and positive publication bias Unwarranted ‘candidate gene’ declaration after identifying association in arbitrary genetic region e.g., Second case/control study Gene expression studies

SNP Replication: VKORC1 Univ. of Washington n = 185 * † All patients 2C9 WT patients 2C9 VAR patients AA AB BB Washington University n = 386 Brian Gage Howard McCleod Charles Eby All patients 2C9 WT patients 2C9 VAR patients AA AB BB † * In a patient population nearly double in size our results were nearly identical. This is done at a completely independent site, different physicians 21% variance in dose explained

mechanism SNP Function: VKORC1 Expression No nonsynonymous SNPs Several SNPs are present in evolutionarily conserved non-coding regions No nonsynonymous SNPs - mRNA expression in human liver cell lines

SNP Function: VKORC1 Expression Expression in human liver tissue (n = 53) shows a graded change in expression.

VKORC1 SNP alters liver-specific binding site Biological plausibility leads us to believe that VKROC1 is the and that it is SNPs within this gene that are leading to expression changes. However we thought it would be interesting to perform a bioinformatic experiment to see how far a SNP association with warfarin dose could extend.

SNP Discovery and Analysis Application to Association Studies Summary Databases and resources available for SNP discovery Software for tagSNP selection available Both single and multi-SNP analysis are useful Replication required by several journals

SeattleSNPs Genotyping Service Free genotyping (BeadArray or SNPlex) Emphasis on young investigators Research related to heart, lung, blood, or sleep disorders Moderate to large population samples Apply at pga.gs.washington.edu Due: May 15th, 2005