Presentation is loading. Please wait.

Presentation is loading. Please wait.

SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,

Similar presentations


Presentation on theme: "SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,"— Presentation transcript:

1 SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20, 2005

2 Practical Aspects of SNP Association Studies 1.SNP Discovery: Where do I find SNPs to use in my association studies? (e.g. databases, direct resequencing) 2.SNP Selection: How do I choose SNPs that are informative? (i.e. assessing SNP correlation - linkage disequilibrium) 3.SNP Associations: What analyses can I perform after genotyping these SNPs? (e.g. single SNP data, haplotype data) 4.SNP Replication/Function: How is function predicted or assessed. (e.g. nonsynonymous SNPs, conserved non-coding regions (CNS) transcription factor binding sites, gene expression)

3 SeattleSNPs Program for Genomic Applications: Overview Aim 1: To establish a variation discovery resource capable of comprehensive resequencing of candidate genes related to HLBS. Biological Focus: Inflammation Biological Focus: Inflammation Genes and Pathways: Coagulation, Complement, Cytokines Interacting Partners Interacting Partners

4 SNPs in Candidate Genes Average Gene Size - 26.5 kb ~ Compare 2 haploid - 1 in 1,200 bp ~130 SNPs (200 bp) - 15,000,000 SNPs ~ 44 SNPs > 0.05 MAF (600 bp) - 6,000,000 SNPs SeattleSNPs

5 SeattleSNPs PGA: Candidate Gene SNP Resource 4.9 Mb in 47 individuals = 230 Mb total sequence Define sequence diversity - catalogue all SNPs Select “optimal” tagSNPs sets Determine haplotype structure Provide necessary baseline data for association studies

6 Warfarin Pharmacogenetics 1.Background Warfarin characteristics Pharmacokinetics/Pharmacodynamics Discovery of VKORC1 2.VKORC1 - SNP Discovery 3.VKORC1 - SNP Selection (tagSNPs) 4.VKORC1 - SNP Testing SNP/Haplotype Inference Haplotype Inference, Testing 5.VKORC1 - SNP Replication/Function

7 Pharmacogenomics as a Model for Association Studies Reduce variability and identify outliers. Prospective testing Personalized Medicine Clear genotype-phenotype link intervention variable response Pharmacokinetics - 5x variation Quantitative intervention and response drug dose, response time, metabolism rate, etc. Target/metabolism of drug generally known gene target that can be tested directly with response

8 Warfarin Background Commonly prescribed oral anti-coagulant In 2003, 21.2 million prescriptions were written for warfarin (Coumadin  ) Prescribed following MI, atrial fibrillation, stroke, venous thrombosis, prosthetic heart valve replacement, and following major surgery Difficult to determine effective dosage - Narrow therapeutic range - Monitoring of prothrombin time (INR) - 2.0 - 3.0 - Large inter-individual variation

9 Add warfarin dose distribution Patient/Clinical/Environmental Factors Ave: 5.2 mg/d n = 186 European-American 30x dose variability Pharmacokinetic/Pharmacodynamic - Genetic

10 Vitamin K-dependent clotting factors (FII, FVII, FIX, FX, Protein C/S/Z) Epoxide Reductase  -Carboxylase (GGCX) Warfarin inhibits the vitamin K cycle Warfarin Inactivation CYP2C9 Pharmacokinetic

11 Warfarin Metabolism (Pharmacokinetics) Major pathway for termination of pharmacologic effect Major pathway for termination of pharmacologic effect is through metabolism of S-warfarin in the liver by CYP2C9 CYP2C9 SNPs alter warfarin metabolism: CYP2C9 SNPs alter warfarin metabolism: CYP2C9*1 (WT) - normal CYP2C9*2 (Arg144Cys) - low/intermediate CYP2C9*3 (Ile359Leu) - low CYP2C9 alleles occur at a significant minor allele frequency CYP2C9 alleles occur at a significant minor allele frequency European: *2 - 10.7% *3 - 8.5 % Asian: *2 - 0% *3 - 1-2% African-American: *2 - 2.9% *3 - 0.8%

12 Effect of CYP2C9 Genotype on Anticoagulation-Related Outcomes (Higashi et al., JAMA 2002) WARFARIN MAINTENANCE DOSE N 127 28 4 18 3 5 mg warfarin/day - Variant alleles have significant clinical impact - Still large variability in warfarin dose (15-fold) in *1/*1 “controls”? TIME TO STABLE ANTICOAGULATION CYP2C9-WT ~90 days *2 or *3 carriers take longer to reach stable anticoagulation CYP2C9-Variant ~180 days

13 Analysis of Independent Predictors of Warfarin Dose Variable Change in Warfarin Dose P value Target INR, per 0.5 increase21%<0.0005 BMI, per SD 14%<0.0001 Ethnicity (African-American, [Asian])13%, [ 10-15%] 0.003 Age, per decade13% <0.0001 Gender, Female12%<0.0001 Drugs (Amiodarone)24% 0.007 CYP2C9*2, per allele19%<0.0001 CYP2C9*3, per allele 30%<0.0001 Adapted from Gage et al., Thromb Haemost, 2004 ~ 30% of the variability in warfarin dose is explained by these factors What other candidate genes are influencing warfarin dosing?

14 Vitamin K-dependent clotting factors (FII, FVII, FIX, FX, Protein C/S/Z) Epoxide Reductase  -Carboxylase (GGCX) Warfarin acts as a vitamin K antagonist Warfarin Inactivation CYP2C9 Pharmacodynamic

15 New Target Protein for Warfarin Epoxide Reductase  -Carboxylase (GGCX) Clotting Factors (FII, FVII, FIX, FX, Protein C/S/Z) Rost et al. & Li, et al., Nature (2004) (VKORC1) 5 kb - chr 16

16 Warfarin Resistance VKORC1 Polymorphisms Rare non-synonymous mutations in VKORC1 causative for warfarin resistance (15-35 mg/d) NO NO non-synonymous mutations found in ‘control’ chromosomes (n = ~400) Rost, et. al. Nature (2004)

17 Warfarin maintenance dose (mg/day) Inter-Individual Variability in Warfarin Dose: Genetic Liabilities SENSITIVITY CYP2C9 coding SNPs - *3/*3 RESISTANCE VKORC1 nonsynonymous coding SNPs 0.5515 Frequency Common VKORC1 non-coding SNPs?

18 SNP Discovery: Resequencing VKORC1 PCR amplicons --> Resequencing of the complete genomic region 5 Kb upstream and each of the 3 exons and intronic segments; ~11 Kb SeattleSNPs PGA - pga.gs.washington.edu (24 African-Am./23 Europeans) Warfarin treated clinical patients (UWMC): 186 European Other populations: 96 European, 96 African-Am., 120 Asian

19 Summary of PGA samples (European, n = 23) Total: 13 SNPs identified 10 common/3 rare (<5% MAF) Clinical Samples (European patients n = 186) Total: 28 SNPs identified 10 common/18 rare (<5% MAF) 15 - intronic/regulatory 7 - promoter SNPs 2 - 3’ UTR SNPs 3 - synonymous SNPs 1 - nonsynonymous - single heterozygous indiv. - highest warfarin dose = 15.5 mg/d How does the comprehensive SNP discovery compare to what was known for this gene? SNP Discovery: Resequencing Results

20 dbSNP -NCBI SNP database SNP Discovery: dbSNP database

21 SeattleSNPs Resequencing 28 SNPs --> 15 SNPs gene region 10 dbSNPs 8/10 confirmations 3 frequency/genotype data 7 new dbSNP entries generated by SeattleSNPs resequencing 8 dbSNPs/15 SNPs (~50%) SNP Discovery: dbSNP database (VKORC1)

22 SNP Discovery: dbSNP database Nickerson and Kruglyak, Nature Genetics, 2001 Mar 2005 - 5.0 million (validated - 1/600 bp) 5.0/10.0 = 50% of all common SNPs (validated)!

23 SNP discovery is dependent on your sample population size 0.00.20.30.40.50.1 0.0 0.5 1.0 Minor Allele Frequency (MAF) Fraction of SNPs Discovered 2 48 24 16 8 96 GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC { 2 chromosomes

24 Rarer and population specific SNPs are found by resequencing SNP Discovery: dbSNP database Minor Allele Freq. (MAF) dbSNP (Perlegen/HapMap) SeattleSNPs Minor Allele Freq. (MAF) {75%50%25%

25 PerlegenData dbSNP: Increasing numbers of SNPs now have genotype data HapMap Phase II Perlegen

26 Current State of dbSNP Many SNPs left to validate and characterize.

27 Development of a genome-wide SNP map: How many SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp Mar 2005 - 5.0 million (validated - 1/600 bp) 5.0/10.0 = 50% of all common SNPs validated! Coming Soon! 5.0 million validated SNPs with genotypes!

28 dbSNP Issues: Not comprehensive catalog (50% of SNPs) Is the data confirmed? (50% are validated) Information about allele frequency/population (50%) No information about SNP correlations (linkage disequilibrium) genotyping efficiency SNP Discovery: dbSNP database

29 Common SNPs VKORC1 - 28 total - 10 SNPs > 10% MAF Evaluate linkage disequilibrium (non-random association of genotype data) Does common variation in VKORC1 have a role in determining warfarin dose? Warfarin Dose (mg/d) Frequency SNP Selection: Using Linkage Disequilibrium

30 T G 0.5 X 0.5 = 0.25 0.48 * C : 50% T : 50% A : 50% G : 50% Site 1 Site 2 C A 0.5 X 0.5 = 0.25 0.50 * C G 0.5 X 0.5 = 0.25 0.01 T A 0.5 X 0.5 = 0.25 0.01 C T A G Site 1 Site 2 Maternal Paternal * Sites Correlated Possible 2-site comb. Expected Freq.Observed Freq. SNP Selection: Using Linkage Disequilibrium

31 SNP discovery data (i.e. population of samples with genotypes) Find all correlated SNPs to minimize the total number of SNPs Maintains genetic information (correlations) for that locus LD_Select - SNP tagging/binning algorithm - based on LD (r 2 ), not haplotypes Carlson, et al. AJHG (2004)

32 SNP Selection: VG/LD_Select on the Web pga.gs.washington.ed/VG2

33 SNP Selection: tagSNP Data

34 SNP Selection: VKORC1 tagSNPs

35 Five Bins to Test 1. 381, 3673, 6484, 6853, 7566 2. 2653, 6009 3. 861 4. 5808 5. 9041 Bin 1 - p < 0.001 Bin 2 - p < 0.02 Bin 3 - p < 0.01 Bin 4 - p < 0.001 Bin 5 - p < 0.001 C/CC/TT/T e.g. Bin 1 - SNP 381 SNP x SNP interactions - haplotype analysis? SNP Testing: VKORC1 tagSNPs

36 VKORC1 Summary: SNP Discovery/SNP Selection 1.VKORC1 candidate gene for warfarin dose response 2.SNP discovery performed using PCR/resequencing to catalog common SNPs 28 SNPs found 10 common SNPs 3.SNP discovery using dbSNP 8/10 dbSNPs confirmed 7 new SNPs added 4.SNP Selection using linkage disequilibrium 10 common SNPs (> 10% MAF) 5 informative SNPs for genotyping

37 Haplotypes in Genetic Association Studies Two main approaches with haplotypes: HaplotypesPick tagSNPsGenotype samples Pick tagSNPs Infer haplotypesTest for association

38 Haplotypes in Genetic Association Studies 1. How can you get haplotypes? 2.What information do you get from haplotypes? 3.How do you use haplotypes to find tagSNPs? 4.How do you use haplotypes to test for associations?

39 Haplotypes – The Definition “…a unique combination of genetic markers present in a chromosome.” pg 57 in Hartl & Clark, 1997

40 Constructing Haplotypes C TA GC TA G T TG GT TG G C CA GC CA G C/T, A/G C/C, A/G T/T, G/G C/T, A/A C/C, A/G Collect pedigreesSomatic cell hybrids Human Rodent Hybrid SNP 1 SNP 2 C/TA/G Allele-specific PCR

41 Constructing Haplotypes Examples of Haplotype Inference Software: EM Algorithm Haploview http://www.broad.mit.edu/mpg/haploview/index.php Arlequin http://lgb.unige.ch/arlequin/ PHASE v2.1 http://www.stat.washington.edu/stephens/software.html HAPLOTYPER http://www.people.fas.harvard.edu/~junliu/Haplo/docMain.htm

42 Haplotypes in SeattleSNPs >200 genes re-sequenced in inflammation response 2 populations: European- and African-Americans PHASEv2.0 results posted on website Interactive tool (VH1) to visualize and sort haplotypes http://pga.gs.washington.edu

43 Haplotypes in SeattleSNPs

44

45

46

47

48

49

50

51

52

53

54 Haplotypes in Genetic Association Studies Two main approaches with haplotypes: Haplotypes Pick tagSNPs Genotype samples Pick tagSNPs Infer haplotypesTest for association Recombination Natural selection Population history Population demography Haplotype block definition

55 Measuring Pair-wise SNP Correlations SNP correlation described by linkage disequilibrium (LD) Pair-wise measures of LD: D´ and r 2 D = p AB - p A p B ; D´ = D/D max Recombination r 2 = D 2 f(A 1 )f(A 2 )f(B 1 )f(B 2 ) Power

56 r 2 is inversely related to power 1/r 2 1,000 cases1,250 cases 1,000 controls r 2 =1.01,250 controlsr 2 = 0.80 D´ is related to recombination history D´ = 1no recombination D´ < 1historical recombination Example: LDSelect Example: Haplotype “blocks” Using LD and Haplotypes to Pick tagSNPs

57 Haplotype “Blocks” Strong LD Few Haplotypes Represent most chromosomes Daly et al 2001 Daly et al Nat. Genet. (2001)

58 Block Definitions Daly et al 2001 D ´ [Gabriel et al Science (2002)] Daly et al Nat. Genet. (2001)

59 Block Definitions AB ab Ab aB Four-gamete test: A B ab <4 haplotypes, D´=1block 4 haplotypes, D´<1boundary

60 Haplotype Blocks and tagSNPs Identifying blocks and tagSNPs: Manually Algorithms – Haploview

61 Haplotype Blocks and tagSNPs IL1B: 19 SNPs (MAF >5%) 4 “common” haplotypes tagSNPs

62 Haplotype Blocks and tagSNPs Identifying blocks and tagSNPs: Manually Algorithms – HaploView

63

64 LD and tagSNPs using Haploview VKORC1 European-Americans PHASEv2.1 data

65

66

67

68 Minimal set of tagSNPs based on r 2

69

70 Where to Find Tagging Software HaploBlockFinder http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi LDSelect http://droog.gs.washington.edu/ldSelect.html SNPtagger http://www.well.ox.ac.uk/~xiayi/haplotype/index.html TagIT http://popgen.biol.ucl.ac.uk/software.html tagSNPs http://www-rcf.usc.edu/~stram/tagSNPs.html Haploview http://www.broad.mit.edu/personal/jcbarret/haplo/

71 Haplotypes, TagSNPs, and Caveats Haplotypes are inferred Block-like structure assumed for some software Different block definitions Block boundaries sensitive to marker density Genotype savings may not be great (recombination)

72 Haplotypes in Genetic Association Studies Two main approaches with haplotypes: HaplotypesPick tagSNPsGenotype samples Pick tagSNPs Infer haplotypesTest for association Genetic diversity of sample Multi-SNP analysis

73 Five tagSNPs (10 total SNPs) 186 warfarin patients (European) PHASE v2.1 9 haplotypes/5 common (>5%) Multi-SNP testing: Haplotypes

74 Test for association between haplotype and warfarin dose using multiple linear regression Adjusted for all significant covariates: age, sex, amiodarone, CYP2C9 genotype

75 CCGATCTCTG-H1 CCGAGCTCTG-H2 TAGGTCCGCA-H8 TACGTTCGCG-H9 (381, 3673, 6484, 6853, 7566) 5808 9041 861 B A VKORC1 haplotypes cluster into divergent clades Patients can be assigned a clade diplotype: e.g. Patient 1 - H1/H2 = A/A Patient 2 - H1/H7 = A/B Patient 3 - H7/H9 = B/B Explore the evolutionary relationship across haplotypes TCGGTCCGCA-H7 Multi-SNP testing: Haplotypes

76 VKORC1 clade diplotypes show a strong association with warfarin dose Low High A/A A/B B/B * † † * * All patients2C9 WT patients2C9 VAR patients AAABBB AA AB BB AAABBB (n = 181)(n = 124)(n = 57) Independent of INR levels across all groups

77 European - mean ~ 5 mg/d African-American - higher ~ 6.0-7.0 mg/d Asian - lower ~ 3.0-3.5 mg/d Hypothesis: VKORC1 haplotypes contribute to racial variability in warfarin dosing. “Control” populations: 120 Europeans 96 African-Americans 120 Asian Multi-SNP testing: Haplotypes

78 Asian (Han) Clade Distribution Low dose phenotype A (89%) B (11%) African-American Clade Distribution High dose phenotype A (14%) B (47%) Other (39%) European (CEPH) Clade Distribution B (58%) A (37%) Clade A = Low Clade B = High Explore the evolutionary relationship across populations Multi-SNP testing: Haplotypes

79 Small sample size Subgroup analysis and multiple testing Random error Poorly matched control group Failure to attempt study replication Failure to detect LD with adjacent loci Overinterpreting results and positive publication bias Unwarranted ‘candidate gene’ declaration after identifying association in arbitrary genetic region Common Errors in Association Studies Bell and Cardon (2001) e.g., Second case/control study Gene expression studies

80 * † † * * All patients2C9 WT patients2C9 VAR patients AAABBB AAABBB AAABBB Univ. of Washington n = 185 All patients2C9 WT patients2C9 VAR patients AAABBB AAABBB AAABBB † † * † * 21% variance in dose explained Washington University n = 386 Brian Gage Howard McCleod Charles Eby SNP Replication: VKORC1

81 SNP Function: VKORC1 Expression mechanism No nonsynonymous SNPs Several SNPs are present in evolutionarily conserved non-coding regions - mRNA expression in human liver cell lines

82 SNP Function: VKORC1 Expression Expression in human liver tissue (n = 53) shows a graded change in expression.

83 VKORC1 SNP alters liver-specific binding site

84 Databases and resources available for SNP discovery Software for tagSNP selection available Both single and multi-SNP analysis are useful Replication required by several journals SNP Discovery and Analysis Application to Association Studies Summary

85 SeattleSNPs Genotyping Service Free genotyping (BeadArray or SNPlex) Emphasis on young investigators Research related to heart, lung, blood, or sleep disorders Moderate to large population samples Apply at pga.gs.washington.edu Due: October 15 th, 2005

86 SNP Typing Formats Microtiter Plates - Fluorescence Size Analysis by Electrophoresis Arrays - Custom or Universal eg. Taqman - Good for a few markers - lots of samples - PCR prior to genotyping eg. SNPlex - Intermediate Multiplexing reduces costs - Genotype directly on genomic DNA - new paradigm for high throughput eg. Illumina, ParAllele, Affymetrics - Highly multiplexed - 1,500 SNPs and beyond (500K+) Low Medium High Scale

87 Taqman Genotyping with fluorescence-based homogenous assays (single-tube assay) = 1 SNP/ tube

88 SNP Typing Formats Microtiter Plates - Fluorescence Size Analysis by Electrophoresis Arrays - Custom or Universal eg. Taqman - Good for a few markers - lots of samples - PCR prior to genotyping eg. SNPlex - Intermediate Multiplexing reduces costs - Genotype directly on genomic DNA - new paradigm for high throughput eg. Illumina, ParAllele, Affymetrics - Highly multiplexed - 1,500 SNPs and beyond (500K+) Low Medium High Scale

89 Technological Leap - No advance PCR Universal PCR after preparing multiple regions for analysis - Several based on primer specific on genomic DNA followed by PCR of the ligated products - different strategies and different readouts. SNPlex, Illumina, Parallele Also, reduced representation - Affymetrix - cut with restriction enzyme, then ligate linkers and amplify from linkers and follow by chip hybridization to read out.

90 9. Characterize on Capillary Sequencer Detection SNP 1 SNP 2

91 SNP Typing Formats Microtiter Plates - Fluorescence Size Analysis by Electrophoresis Arrays - Custom or Universal eg. Taqman - Good for a few markers - lots of samples - PCR prior to genotyping eg. SNPlex - Intermediate Multiplexing reduces costs - Genotype directly on genomic DNA - new paradigm for high throughput eg. Illumina, ParAllele, Affymetrics - Highly multiplexed - 1,500 SNPs and beyond (500K+) Low Medium High Scale

92 Locus 1 Specific Sequence cTag1 sequenceTag1 sequence Substrate Bead or Chip Tag 1 Tag 2 Tag 3 Tag 4 Chip ArrayBead Array Multiplexed Genotyping - Universal Tag Readouts Locus 2 Specific Sequence cTag2 sequenceTag2 sequence Substrate Bead or Chip C T A G Multiplex ~1,000 SNPs Not dependent on primary PCR Illumina ParAllele Affymetrics

93 Illumina Platform 96 Multi-array Matrix matches standard microtiter plates ~ 1,500 SNPs typed per matrix for 96 samples

94 Affymetrix’s 100K Chip http://www.affymetrix.com/products/arrays/specific/100k.affx Optimized for 250-2000bp

95 High Throughput Chip Formats

96 Defining the scale of the genotyping project is key to selecting an approach: 5 to 10 SNPs in a candidate gene - Many approaches (expensive ~ 0.60 per SNP/genotype) 48 ( to 96) SNPs in a handful of candidate genes (~ 0.25 per SNP/genotype) 384 0 1,536 SNPs (~0.15 - 0.08 per SNP/genotype) 10,000 cSNPs - defined format (~0.05 per SNP/genotype) 100,000 Genic SNPs - defined format (~0.005 per SNP/genotype 500,000 SNPs defined format (~0.004 per SNP/ genotype) 1000 individuals $6,000 $12,000 $57,600-122,880 $500,000 $2,000,000

97 Acknowledgements Allan Rettie, Medicinal Chemistry Alex Reiner Dave Veenstra Dave Blough Ken Thummel Noel Hastings Maggie Ahearn Josh Smith Chris Baier Peggy Dyer-Robertson Washington University Brian Gage Howard McLeod Charles Eby Joyce You - Hong Kong

98


Download ppt "SNP Discovery and Analysis: Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD Deborah Nickerson, PhD SeattleSNPs PGA July 19-20,"

Similar presentations


Ads by Google