Haplotypes and imputed genotypes in diverse human populations Noah Rosenberg April 29, 2009
Human Genome Diversity Cell Line Panel 525,910 single-nucleotide polymorphisms in 29 populations M Jakobsson et al. (2008) Nature 451:
How do we measure and compare haplotype diversity across populations? Imputation in diverse populations Overview
Which populations and genomic sites have more haplotype diversity? X0XX0X000X00X X0XXX00XX0X00000X XX00X0XX 000X0XX000000XXX000XX0000 0X00X00XX0X00000X0000X0XX 0X000X000X00X X0X000X00X X00XX00XX0X00000X X000XX000000XXX000XX X00XX0X00000X0000X0XX 0X00X00XX0X00000X0000X0XX 0X0X XX X XX XX000000XXX000XX0000 0X000XX000000XXX000XX0000 X0XX0X000X0XX X X00XX0X00000X X0X0X000X00X X0XX 0X00000XX0X00000X0000X0XX 0X0X0XX000000XXX000XXX0XX Population 1 Population 2
Which populations and genomic sites have more haplotype diversity? XXXXXX XXXXXXXX XXXXX XXXXXXXX XXXXXXXXX XXXX XXXX XXXXXXX XXXXXXXX XXXXXXXX XXXXXXXXX XXXX XXX XXXXXXX XXXXXXXX XXXXXXXX XXXXX XXXXXXXX XXXXXXXX XXXXXXXXXXXX Population 1 Population 2
Which populations and genomic sites have more haplotype diversity? XXXXXXXX XXXXXXXX XXXX XXXX XXXXXXX XXXXXXXX XXXXXXXXX XXXXXXXX XXXXX XXXXXXXX Population 1 XXXXXX XXXXX XXXXXXXXX XXXXXXXX XXXX XXX XXXXXXX XXXXXXXX XXXXXXXX XXXXXXXXXXXX Population 2 P Scheet, M Stephens (2006) AJHG 78:
Which populations and genomic sites have more haplotype diversity? XXXXXXXX XXXXXXXX XXXX XXXX XXXXXXX XXXXXXXX XXXXXXXXX XXXXXXXX XXXXX XXXXXXXX Population 1 Blue
Which populations and genomic sites have more haplotype diversity? 111X1X XXXXXXXX XXXXXXXX XXXX XXXX XXXXXXX XXXXXXXX XXXXXXXXX XXXXXXXX XXXXX XXXXXXXX Population 1 Blue Green
Which populations and genomic sites have more haplotype diversity? 111X1X XXXXXXXX XXXXXXXX XXXX XXXX XXXXXXX XXXXXXXX XXXXXXXXX XXXXXXXX XXXXX XXXXXXXX Population 1 Blue Green Orange
Which populations and genomic sites have more haplotype diversity? 111X1X XXXXXXXX XXXXXXXX XXXX XXXX XXXXXXX XXXXXXXX XXXXXXXXX XXXXXXXX XXXXX XXXXXXXX Population 1 Blue Green Orange Pink
Which populations and genomic sites have more haplotype diversity? 111X1X XXXXXXXX XXXXXXXX XXXX XXXX XXXXXXX XXXXXXXX XXXXXXXXX XXXXXXXX XXXXX XXXXXXXX Population 1 Blue Green Orange Pink Yellow
Which populations and genomic sites have more haplotype diversity?
XXXXXXXX XXXXXXXX XXXX XXXX XXXXXXX XXXXXXXX XXXXXXXXX XXXXXXXX XXXXX XXXXXXXX Population 1 Less diversity XXXXXX XXXXX XXXXXXXXX XXXXXXXX XXXX XXX XXXXXXX XXXXXXXX XXXXXXXX XXXXXXXXXXXX Population 2 More diversity
Haplotype cluster frequencies for a “typical” genomic region M Jakobsson et al. (2008) Nature 451:
More haplotype diversity in Africa Africa Europe Middle East Asia Oceania America C Asia M Jakobsson et al. (2008) Nature 451:
Less haplotype homozygosity and more haplotype diversity in Africa M Jakobsson et al. (2008) Nature 451:
Genetic diversity declines with distance from Africa Haplotype heterozygosity
Haplotype clusters recover population structure Africa Middle East Europe Central/South Asia Oceania America East Asia M Jakobsson et al. (2008) Nature 451:
Haplotype clusters recover population structure M Jakobsson et al. (2008) Nature 451:
Low haplotype diversity in the lactase region in Europe Africa Europe Middle East Asia Oceania America C Asia M Jakobsson et al. (2008) Nature 451:
Haplotype cluster homozygosity as a test for selection Random region Lactase region M Jakobsson et al. (2008) Nature 451:
Haplotype clusters can be used to encode haplotypes pointwise for measurement of diversity Haplotype cluster diversity is greatest in Africa Low haplotype cluster diversity can potentially be used to detect selection Haplotype diversity – summary
Measuring haplotype diversity using haplotype clusters Imputation in diverse populations Overview
Study sample Genotyped positions Reference panel Imputed genotypes can be tested for disease association Genotypes can be imputed using a reference panel – but imperfectly
443 individuals in 29 populations from the Human Genome Diversity Panel Genotypes at >500,000 SNPs (Jakobsson et al. Nature 451: , 2008) 420 HapMap reference haplotypes of ~2,000,000 SNPs, omitting offspring in trios Randomly hide 15% genotypes in HGDP individuals and impute with MACH Measure the proportion of alleles imputed correctly Evaluating imputation accuracy in worldwide populations
Imputation accuracy is predicted by haplotype diversity Imputation accuracy L Huang et al. (2008) AJHG 84:
Imputation accuracy is greatest with a close reference panel L Huang et al. (2008) AJHG 84:
Highest-accuracy reference panels match geographic locations Africa Europe/ W Asia E Asia/ Oceania/ Americas L Huang et al. (2008) AJHG 84:
Instead of imputing based on separate HapMap panels, impute from mixtures Choose mixtures to have optimal size given specified ratios Imputation accuracy can be increased using HapMap mixtures L Huang et al. (2008) AJHG 84:
Imputation accuracy can be increased using HapMap mixtures L Huang et al. (2008) AJHG 84:
Strategies to improve imputation studies -Increased sample size -Improved imputation algorithms -Improved use of reference panels -Development of additional reference panels -Improved haplotyping -Use of additional data from relatives Summary – imputation accuracy
Imputation error and sample size inflation are greatest in Africa Imputation – summary Several strategies may be available for improving imputation, including use of mixtures
Rosenberg lab James Degnan Mike DeGiorgio Lucy Huang Mattias Jakobsson Trevor Pemberton Paul Scheet Zach Szpiech Jenna VanLiere Chaolong Wang Collaborators Goncalo Abecasis (Michigan) Raph Gibbs (NIA) John Hardy (UCL) Yun Li (Michigan) Sonja Scholz (NIA) Andy Singleton (NIA) Funding Alfred P. Sloan Foundation Burroughs Wellcome Fund National Institutes of Health U of M Rackham Graduate School [M DeGiorgio] U of M Center for Genetics in Health and Medicine [M Jakobsson]