Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry Oscar.

Slides:

Advertisements

Similar presentations

A Haplotype at STAT2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea Fernando L. Mendez, Joseph C.

Advertisements

The Structure of Common Genetic Variation in United States Populations

A Genomewide Admixture Mapping Panel for Hispanic/Latino Populations

The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,

Denisova Admixture and the First Modern Human Dispersals into Southeast Asia and Oceania David Reich, Nick Patterson, Martin Kircher, Frederick Delfin,

Itsik Pe’er, Yves R. Chretien, Paul I. W. de Bakker, Jeffrey C

Genomic Patterns of Homozygosity in Worldwide Human Populations

Barbara Arredi, Estella S

Three Genome-wide Association Studies and a Linkage Analysis Identify HERC2 as a Human Iris Color Gene Manfred Kayser, Fan Liu, A. Cecile J.W. Janssens,

Denisova Admixture and the First Modern Human Dispersals into Southeast Asia and Oceania David Reich, Nick Patterson, Martin Kircher, Frederick Delfin,

Chao Tian, David A. Hinds, Russell Shigeta, Sharon G

Population Genetic Structure of the People of Qatar

Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors Michael Dannemann, Aida M.

Estimated adult and child deaths from AIDS  2009

The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection

Demographic History of Oceania Inferred from Genome-wide Data

An Extensive Analysis of Y-Chromosomal Microsatellite Haplotypes in Globally Dispersed Human Populations Manfred Kayser, Michael Krawczak, Laurent Excoffier,

Comparing Algorithms for Genotype Imputation

A Genomewide Association Study of Skin Pigmentation in a South Asian Population Renee P. Stokowski, P.V. Krishna Pant, Tony Dadd, Amelia Fereday, David.

A Combined Linkage-Physical Map of the Human Genome

Estimating Kinship in Admixed Populations

Alessia Ranciaro, Michael C. Campbell, Jibril B

Chad Genetic Diversity Reveals an African History Marked by Multiple Holocene Eurasian Migrations Marc Haber, Massimo Mezzavilla, Anders Bergström, Javier.

Brian K. Maples, Simon Gravel, Eimear E. Kenny, Carlos D. Bustamante

Western & Central Europe

Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes Matthieu Deschamps, Guillaume Laval,

Variant Association Tools for Quality Control and Analysis of Large-Scale Sequence and Genotyping Array Data Gao T. Wang, Bo Peng, Suzanne M. Leal The.

A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants Andrew.

Ida Moltke, Matteo Fumagalli, Thorfinn S. Korneliussen, Jacob E

Towfique Raj, Manik Kuchroo, Joseph M

Robust Inference of Identity by Descent from Exome-Sequencing Data

Strong Maternal Khoisan Contribution to the South African Coloured Population: A Case of Gender-Biased Admixture Lluis Quintana-Murci, Christine Harmant,

Chad Genetic Diversity Reveals an African History Marked by Multiple Holocene Eurasian Migrations Marc Haber, Massimo Mezzavilla, Anders Bergström, Javier.

Matthieu Foll, Oscar E. Gaggiotti, Josephine T

Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies

Haplotypes at ATM Identify Coding-Sequence Variation and Indicate a Region of Extensive Linkage Disequilibrium Penelope E. Bonnen, Michael D. Story,

Brian P. McEvoy, Joanne M. Lind, Eric T. Wang, Robert K

A Genetic Landscape Reshaped by Recent Events: Y-Chromosomal Insights into Central Asia Tatiana Zerjal, R. Spencer Wells, Nadira Yuldasheva, Ruslan Ruzibakiev,

Highly Punctuated Patterns of Population Structure on the X Chromosome and Implications for African Evolutionary History Charla A. Lambert, Caitlin F.

Shuhua Xu, Wei Huang, Ji Qian, Li Jin

Molecular Analysis of the β-Globin Gene Cluster in the Niokholo Mandenka Population Reveals a Recent Origin of the βS Senegal Mutation Mathias Currat,

Brian P. McEvoy, Joanne M. Lind, Eric T. Wang, Robert K

A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals Brian L. Browning, Sharon.

James A. Lautenberger, J. Claiborne Stephens, Stephen J

Gad Kimmel, Ron Shamir The American Journal of Human Genetics

A Fast, Powerful Method for Detecting Identity by Descent

Spread of an Inactive Form of Caspase-12 in Humans Is Due to Recent Positive Selection Yali Xue, Allan Daly, Bryndis Yngvadottir, Mengning Liu, Graham.

Population Structure in Admixed Populations: Effect of Admixture Dynamics on the Pattern of Linkage Disequilibrium C.L. Pfaff, E.J. Parra, C. Bonilla,

Human Population Genetic Structure and Inference of Group Membership

Children (<15 years) estimated to be living with HIV as of end 2005

Joseph K. Pickrell The American Journal of Human Genetics

Regional HIV and AIDS statistics and features for women, 2004 and 2006

Identifying Darwinian Selection Acting on Different Human APOL1 Variants among Diverse African Populations Wen-Ya Ko, Prianka Rajan, Felicia Gomez, Laura.

L-GATOR: Genetic Association Testing for a Longitudinally Measured Quantitative Trait in Samples with Related Individuals Xiaowei Wu, Mary Sara McPeek

Xiang Wan, Can Yang, Qiang Yang, Hong Xue, Xiaodan Fan, Nelson L. S

Regional HIV and AIDS statistics and features, end of 2004

Complex History of Admixture between Modern Humans and Neandertals

Features of Evolution and Expansion of Modern Humans, Inferred from Genomewide Microsatellite Markers Lev A. Zhivotovsky, Noah A. Rosenberg, Marcus W.

Worldwide Population Analysis of the 4q and 10q Subtelomeres Identifies Only Four Discrete Interchromosomal Sequence Transfers in Human Evolution Richard.

Leslie S. Emery, Kevin M. Magnaye, Abigail W. Bigham, Joshua M

Yu Zhang, Tianhua Niu, Jun S. Liu

Markers for Mapping by Admixture Linkage Disequilibrium in African American and Hispanic Populations Michael W. Smith, James A. Lautenberger, Hyoung.

Matthew A. Saunders, Jeffrey M. Good, Elizabeth C. Lawrence, Robert E

A Haplotype at STAT2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea Fernando L. Mendez, Joseph C.

Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors Michael Dannemann, Aida M.

The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,

Haplotypes in the Dystrophin DNA Segment Point to a Mosaic Origin of Modern Human Diversity Ewa Ziętkiewicz, Vania Yotova, Dominik Gehl, Tina Wambach,

Population Genetic Structure of the People of Qatar

Bruce Rannala, Jeff P. Reeve The American Journal of Human Genetics

Spread of an Inactive Form of Caspase-12 in Humans Is Due to Recent Positive Selection Yali Xue, Allan Daly, Bryndis Yngvadottir, Mengning Liu, Graham.

Presentation transcript:

Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry Oscar Lao, Kate van Duijn, Paula Kersbergen, Peter de Knijff, Manfred Kayser The American Journal of Human Genetics Volume 78, Issue 4, Pages 680-690 (April 2006) DOI: 10.1086/501531 Copyright © 2006 The American Society of Human Genetics Terms and Conditions

Figure 1 Percentage of information explained when the number of markers that are ascertained from 8,491 SNPs by use of the genetic algorithm based on the informativeness of assignment index (In) is increased from 1 to 10, given four continental groups and the YCC panel (see main text for details). The 95% CI of each SNP combination was computed by resampling the same number of chromosomes from the populations and computing In 1,000 times. The American Journal of Human Genetics 2006 78, 680-690DOI: (10.1086/501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

Figure 2 STRUCTURE analysis of the YCC samples, with K=2, 3, or 4 groups, performed using genotypes of the 10 most informative SNPs ascertained using the genetic algorithm with the total YCC data. STRUCTURE analyses were computed using a model without admixture (A) and a model with admixture (B). Each analysis was repeated five times, after a Markov chain–Monte Carlo (MCMC) burning period of 50,000 and considering the next 200,000 MCMC iterations. In all five runs, good mixing was observed, and similar results were found in accordance with the model used. The natural logarithm of the estimated probability of the data (lnp) is as follows. In panel A, for K=2, lnp=−762.2; for K=3, lnp=−629.2; and, for K=4, lnp=−557.4. In panel B, for K=2, lnp=−764.9; for K=3, lnp=−631.2; and, for K=4, lnp=−559.5. The American Journal of Human Genetics 2006 78, 680-690DOI: (10.1086/501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

Figure 3 MDS plot based on the In matrix computed between pairs of populations by use of the genotypes of the 10 most informative SNPs in the 51 population samples from CEPH-HGDP. Four clusters of population can be identified: (i) sub-Saharan African populations, (ii) American populations, (iii) Eastern Asian and Oceanian populations, and (iv) European, Middle Eastern, North African, and Central/South Asian populations. The American Journal of Human Genetics 2006 78, 680-690DOI: (10.1086/501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

Figure 4 STRUCTURE analysis of the CEPH-HGDP samples, with K=2, 3, 4, or 5 groups, performed using genotypes of the 10 most informative SNPs ascertained using the genetic algorithm with the total YCC data. Two different STRUCTURE analyses were computed: a population model without admixture (A) and a population model with admixture (B). Each analysis was repeated five times after an MCMC burning period of 100,000 and considering the next 10,000 MCMC iterations. In all five runs, good mixing was observed, and similar results were found in accordance with the model used. The lnp, assuming K groups, is as follows. In panel A, for K=2, lnp=−11,801.2; for K=3, lnp=−10,977.3; for K=4, lnp=−10,279.2; and, for K=5, lnp=−10,324.9. In panel B, for K=2, lnp=−11,886.2; for K=3, lnp=−11,070.6; for K=4, lnp=−10,345.5; and, for K=5, lnp=−10,456.9. Cen. Af. Rep. = Central African Republic; S. Afr. = South Africa. The American Journal of Human Genetics 2006 78, 680-690DOI: (10.1086/501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

Figure 5 STRUCTURE analysis of each of the four groups detected in the HGDP-CEPH populations by previous STRUCTURE analysis (see main text) that considers models without admixture (A) and with admixture (B) and assumes K=2. A certain degree of population (sub)structure can be observed only in the case of American populations, but it disappears when three groups are considered (data not shown). Each analysis was repeated five times, after an MCMC burning period of 200,000 and considering the next 200,000 MCMC iterations. In all five runs, good mixing was observed, and similar results were found in accordance with the model used. The lnp, assuming K=2, is as follows. In panel A, for sub-Saharan Africa, lnp=−958.3; for America, lnp=−1,048.1; for East Asia and Oceania, lnp=−3,262.0; and, for Europe, the Middle East, Central/South Asia, and North-Africa, lnp=−5,321.5. In panel B, for sub-Saharan Africa, lnp=−946.7; for America, lnp=−1,057.4; for East Asia and Oceania, lnp=−3,263.5; and, for Europe, the Middle East, Central/South Asia, and North-Africa, lnp=−5,433.1. The American Journal of Human Genetics 2006 78, 680-690DOI: (10.1086/501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

Figure 6 BAPS 3.2 clustering results for K=2, 3, 4, and 5 groups in the HGDP-CEPH panel by use of the 10 most informative SNPs ascertained using the genetic algorithm with the YCC data. Each column represents an individual. The log (marginal likelihood) for K=2 groups is −11,687.5; for K=3, −10,832.6; for K=4, −10,164.8, and, for K=5, −10,024.32. The American Journal of Human Genetics 2006 78, 680-690DOI: (10.1086/501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

Figure 7 Sliding-window and haplotype analyses performed on the genomic region that includes SNP rs952718 and the ABCA12 gene. A, Sliding-window plot of the mean value observed for each window (the gene is represented by a black bar). B, Associated P value for comparison with an empirical distribution based on >10,000 genes (see main text). The P=.05 cutoff is represented by a black line. C, Bifurcation plots of the main core haplotypes in the three populations considered. D, Extended homozygosity versus genomic distance to the core haplotype. The region of the core haplotype was selected on the basis of the largest region that was statistically significant in the sliding-window analysis (from rs6758257 to rs6753310; see main text for details). The American Journal of Human Genetics 2006 78, 680-690DOI: (10.1086/501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

Figure 8 Sliding-window and haplotype analyses performed on the genomic region that includes SNP rs722869 and the VRK1 gene. A, Sliding-window plot of the mean value observed for each window (the gene is represented by a black bar). B, Associated P value for comparison with an empirical distribution based on >10,000 genes (see main text). The P=.05 cutoff is represented by a black line. C, Bifurcation plots of the main core haplotypes in the three populations considered. D, Extended homozygosity versus genomic distance to the core haplotype. The region of the core haplotype was selected on the basis of the largest region that was statistically significant in the sliding-window analysis (from rs1957137 to rs17191471; see main text for details). The American Journal of Human Genetics 2006 78, 680-690DOI: (10.1086/501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

Figure 9 Sliding-window and haplotype analyses performed on the genomic region that includes SNP rs1858465. A, Sliding-window plot of the mean value observed for each window. B, Associated P value for comparison with an empirical distribution based on >10,000 genes (see main text). The P=.05 cutoff is represented by a black line. C, Bifurcation plots of the main core haplotypes in the three populations considered. D, Extended homozygosity versus genomic distance to the core haplotype. The region of the core haplotype was selected on the basis of the largest region that was statistically significant in the sliding-window analysis (from rs2137476 to rs1398515; see main text for details). The American Journal of Human Genetics 2006 78, 680-690DOI: (10.1086/501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

Figure 10 Sliding-window and haplotype analyses performed on the genomic region that includes SNP rs1344870. A, Sliding-window plot of the mean value observed for each window. B, Associated P value for comparison with an empirical distribution based on >10,000 genes (see main text). The P=.05 cutoff is represented by a black line. C, Bifurcation plots of the main core haplotypes in the three populations considered. D, Extended homozygosity versus genomic distance to the core haplotype. The region of the core haplotype was selected on the basis of the largest region that was statistically significant in the sliding-window analysis (from rs2335092 to rs1898300; see main text for details). The American Journal of Human Genetics 2006 78, 680-690DOI: (10.1086/501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

Figure 11 Sliding-window and haplotype analyses performed on the genomic region that includes SNP rs1876482 (1 of the 10 most informative SNPs identified), which is located in the LOC442008 gene, by use of Perlegene data. A, Sliding-window plot of the mean value observed for each window (the gene is represented by a black bar). B, Associated P value for comparison with an empirical distribution based on >10,000 genes (see main text). The P=.05 cutoff is represented by a black line. C, Bifurcation plots of the main core haplotypes in the three populations considered. D, Extended homozygosity versus genomic distance to the core haplotype. The region of the core haplotype was selected on the basis of the largest region that was statistically significant in the sliding-window analysis (from rs12619554 to rs4832712; see main text for details). Note the high frequency of the third haplotype in the case of Asian populations and the slow decay of the EHH of that haplotype compared with the other haplotypes both within and between populations. The American Journal of Human Genetics 2006 78, 680-690DOI: (10.1086/501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions