Presentation is loading. Please wait.

Presentation is loading. Please wait.

HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium.

Similar presentations


Presentation on theme: "HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium."— Presentation transcript:

1 HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

2 Goals of this segment Briefly summarize HapMap design and current status Discuss the application of HapMap to all aspects of association study design, analysis and interpretation

3 HapMap Project High-density SNP genotyping across the genome provides information about –SNP validation, frequency, assay conditions –correlation structure of alleles in the genome A freely-available public resource to increase the power and efficiency of genetic association studies to medical traits All data is freely available on the web for application in study design and analyses as researchers see fit

4 HapMap Samples 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan, Nigeria (YRI) 90 individuals (30 trios) of European descent from Utah (CEU) 45 Han Chinese individuals from Beijing (CHB) 45 Japanese individuals from Tokyo (JPT)

5 HapMap progress PHASE I – completed, described in Nature paper * 1,000,000 SNPs successfully typed in all 270 HapMap samples * ENCODE variation reference resource available PHASE II – data generation complete, data released this past Monday * >3,500,000 SNPs typed in total !!!

6 ENCODE-HAPMAP variation project Ten “typical” 500kb regions 48 samples sequenced All discovered SNPs (and any others in dbSNP) typed in all 270 HapMap samples Current data set – 1 SNP every 279 bp A much more complete variation resource by which the genome-wide map can evaluated

7 Completeness of dbSNP Vast majority of common SNPs are contained in or highly correlated with a SNP in dbSNP

8 Recombination hotspots are widespread and account for LD structure 7q21

9 Utility of LD in association study “If I’m a causal variant, what is relevant to my detection in association studies is how well correlated I am with one of the SNPs or haplotypes examined in the study.”

10 Coverage of Phase II HapMap (estimated from ENCODE data) From Table 6 – “A Haplotype Map of the Human Genome”, Nature Panel %r 2 > 0.8 max r 2 YRI 810.90 CEU 940.97 CHB+JPT 940.97

11 Coverage of Phase II HapMap (estimated from ENCODE data) From Table 6 – “A Haplotype Map of the Human Genome”, Nature Panel %r 2 > 0.8 max r 2 YRI 810.90 CEU 940.97 CHB+JPT 940.97 Percentage of deeply ascertained common variants highly correlated with a HapMap SNP

12 Coverage of Phase II HapMap (estimated from ENCODE data) From Table 6 – “A Haplotype Map of the Human Genome”, Nature Panel %r 2 > 0.8 max r 2 YRI 810.90 CEU 940.97 CHB+JPT 940.97 Average maximum correlation between a deeply ascertained variant and a neighboring HapMap SNP

13 Coverage of Phase II HapMap (estimated from ENCODE data) Vast majority of common variation (MAF >.05) captured by Phase II HapMap Panel %r 2 > 0.8 max r 2 YRI81%0.90 CEU94%0.97 CHB+JPT94%0.97

14 Applying the HapMap Study design - tagging Study coverage evaluation Study analysis - improving association testing Study interpretation –Comparison of multiple studies –Connection to genes/genomic features –Integration with expression and other functional data Other uses of HapMap data –Admixture, LOH, selection

15 Tagging from HapMap Since HapMap describes the majority of common variation in the genome, choosing non-redundant sets of SNPs from HapMap offers considerable efficiency without power loss in association studies

16

17 Pairwise tagging Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 high r 2 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA After Carlson et al. (2004) AJHG 74:106

18 Pairwise Tagging Efficiency Table 7 Number of selected tag SNPs to capture all observed common SNPs in the Phase I HapMap for the three analysis panels using pairwise tagging at different r 2 thresholds YRICEUCHB+JPT Pairwiser 2 ≥ 0.5324,865178,501159,029 r 2 ≥ 0.8 474,409293,835259,779 r 2 = 1604,886447,579434,476 Tag SNPs were picked to capture common SNPs in release 16c.1 for every 7,000 SNP bin using Haploview. Tagging Phase I HapMap offers 2-5x gains in efficiency

19 Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6 Use of haplotypes can improve genotyping efficiency Tags: SNP 1 SNP 3 2 in total Test for association: SNP 1 captures 1+2 SNP 3 captures 3+5 “AG” haplotype captures SNP 4+6 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA A CCCCCC A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 tags in multi-marker test should be conditional on significance of LD in order to avoid overfitting

20 Efficiency and power Relative power (%) Average marker density (per kb) tag SNPs random SNPs P.I.W. de Bakker et al. (2005) Nat Genet Advance Online Publication 23 Oct 2005 ~300,000 tag SNPs needed to cover common variation in whole genome in CEU

21 How to pick tag SNPs? What is the genetic hypothesis? Which variants do you want to test for a role in disease? –functional annotation (coding SNPs) –allele frequency (HapMap ascertainment) –previously implicated associations Go to http://www.hapmap.org – DCC supported interactive tagginghttp://www.hapmap.org Export HapMap data into tools such as Tagger, Haploview (www.broad.mit.edu/mpg)

22 Will tag SNPs picked from HapMap apply to other population samples? Population differences add very little inefficiency Platform presentation: Paul de Bakker (#223: Sat 9.30) CEU Whites from Los Angeles, CA Whites from Los Angeles, CA Botnia, Finland CEU Utah residents with European ancestry (CEPH)

23 Applying the HapMap Study design - tagging Study coverage evaluation Study analysis - improving association testing Study interpretation –Comparison of multiple studies –Connection to genes/genomic features –Integration with expression and other functional data Other uses of HapMap data –Admixture, LOH, selection

24 Genome-wide association coverage If genome-wide products are typed on the HapMap sample panel, the SNPs on HapMap not included in the panel provide an evaluation for the coverage of the product –ENCODE (deep ascertainment) –Phase II (dense, genome-wide)

25 Association tests with fixed markers Tests of association: SNP 1 SNP 3 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA A C C C = SNP on whole-genome product (~1 - 5% common variation directly assayed)

26 Association tests with fixed markers Tests of association: SNP 1 SNP 3 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 high r 2 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA A C C C

27 Association tests with fixed markers Tests of association: SNP 1 SNP 3 SNPs actually tested: SNP 1 SNP 3 SNP 2 SNP 5 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 high r 2 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC T CCCCCC A C C C GGGG AAAA

28 Genome-wide products can capture most common variation Example: 500K data generated by Affymetrix and recently submitted to HapMap DCC

29 More on this topic Platform presentations tomorrow morning 8 AM sharp: –Peer –Jorgenson –Lazarus –As well as several detailed posters!

30 Applying the HapMap Study design - tagging Study coverage evaluation Study analysis - improving association testing Study interpretation –Comparison of multiple studies –Connection to genes/genomic features –Integration with expression and other functional data Other uses of HapMap data –Admixture, LOH, selection

31 Can incorporating tests of haplotypes of SNPs on the genome-wide product improve this coverage?

32 Improving association power using data from HapMap Tests of association: SNP 1 SNP 3 SNPs actually tested: SNP 1 SNP 3 SNP 2 SNP 5 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC T CCCCCC A C C C GGGG AAAA

33 Improving association power using data from HapMap Tests of association: SNP 1 SNP 3 SNPs actually tested: SNP 1 SNP 3 SNP 2 SNP 5 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC T CCCCCC A C C C GGGG AAAA

34 Improving association power using data from HapMap Tests of association: SNP 1 SNP 3 “AG haplotype” SNPs actually tested: SNP 1 SNP 3 SNP 2 SNP 5 SNP 4 SNP 6 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC GGGG AAAA

35 Haplotypes increase coverage

36 Applying the HapMap Study design - tagging Study coverage evaluation Study analysis - improving association testing Study interpretation –Connection to genes/genomic features –Comparison of multiple association studies –Integration with expression and other functional data Other uses of HapMap data –Admixture, LOH, selection

37 Integration with genomic features Positive association to a SNP on HapMap enables detailed interpretation: –How many other SNPs are in LD with this SNP? –What genes are in LD with this SNP? –What coding variants and putative functional variants are in LD with this SNP? Potential to improve power by modifying Bayesian priors of each association test based on this information

38 Example: Complement Factor H - AMD Original SNP hit in Affy 100K experiment – rs380390 Extent and structure of LD from HapMap aids in the fine mapping phase of project Klein et al Science 2005

39 Example: Complement Factor H - AMD rs380390

40 Example: Complement Factor H - AMD rs380390

41 Meta-analysis of association studies When different marker sets are used to study association (candidate gene or genome-wide), results can be readily integrated when all markers are typed on HapMap samples

42

43 Example: DTNBP1 and schizophrenia Multiple studies have described modest association to schizophrenia Most studies have examined small numbers of non-overlapping sets of SNPs HapMap data can be used to determine whether these association finding Derek Morris, Mousumi Mutsuddi (WCPG meeting)

44 Extensive LD across DTNBP1 Phase II HapMap - 186 SNPs 180 kb

45 Phylogeny of DTNBP1 tag SNPs 4 (G  A), 5 (C  T) 2 (A  G)7 (C  T) 10 (A  T) 3 (G  A) AGGCCTGGATCAAGGCCAAGATTAAAGCCT AGGCCA 2453107 Ancestral haplotype 6% 33% 42% 8% 11%

46 Associated alleles reported AGGCCTAGGCCTGGATCAAGGCCAAGATTAAAGCCT AGGCCA 2453107 Tag SNPs Straub 2002 Van den Oord 2003

47 Associated alleles reported AGGCCTAGGCCTGGATCAAGGCCAAGATTAAAGCCT AGGCCA 2453107 Tag SNPs Straub 2002 Van den Oord 2003 Schwab 2003

48 Associated alleles reported AGGCCTAGGCCTGGATCAAGGCCAAGATTAAAGCCT AGGCCA 2453107 Tag SNPs Straub 2002 Van den Oord 2003 Van den Bogaert 2003 Funke 2004 Schwab 2003

49 Associated alleles reported AGGCCTAGGCCTGGATCAAGGCCAAGATTAAAGCCT AGGCCA 2453107 Tag SNPs Straub 2002 Van den Oord 2003 Van den Bogaert 2003 Funke 2004 Schwab 2003 Williams 2004 Bray 2005

50 Associated alleles reported AGGCCTAGGCCTGGATCAAGGCCAAGATTAAAGCCT AGGCCA 2453107 Tag SNPs Straub 2002 Van den Oord 2003 Van den Bogaert 2003 Funke 2004 Schwab 2003 Williams 2004 Bray 2005 Kirov 2004

51 Inconsistent findings No consistently associated SNP/haplotype pattern across studies All studies (European-derived populations) had allele/haplotype frequencies compatible with HapMap-CEU sample HapMap can successfully relate associations from diverse marker sets

52 Other Applications – Structural Variation 3 papers coming out in the next month describe use of HapMap data to identify large, common deletion polymorphisms LD around these polymorphisms permits their assessment with tag SNPs/haplotypes in genome-wide association studies

53 Other Applications – Admixture Scanning HapMap data provides a rich source of highly differentiated SNPs for design of admixture panels Fine mapping of admixture signals can be focused on the full set of highly differentiated alleles in any region of the genome

54 Other Applications – LOH HapMap identifies –Regions of extended LD that may manifest themselves as unusually long stretches of homozygosity in individual samples –The catalog of large deletion variants on the HapMap will differentiate between LOH that is potentially de novo and causal, and that which is simply commonly segregating in the population LOH analysis cognizant of HapMap patterns under development

55 Early results encouraging At this meeting –Arking and colleagues describe identification of variant altering QT-interval –Herbert and colleagues describe a novel gene for obesity –Wijmenga and colleagues describe a novel gene for celiac disease


Download ppt "HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium."

Similar presentations


Ads by Google