Volume 150, Issue 3, Pages (August 2012)

Slides:



Advertisements
Similar presentations
Supplementary Figure S1 Distribution of observed (blue) and Poisson expected (red) standard deviation of human-chimpanzee divergence over different window.
Advertisements

Human Genetic Diversity By Nataliya Prokhnevska Stanley David Alexander Reese Ryan Haddock JuneHee Chang.
A Haplotype at STAT2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea  Fernando L. Mendez, Joseph C.
Signatures of Selection
Xiaoshu Chen, Jianzhi Zhang  Cell Systems 
The Structure of Common Genetic Variation in United States Populations
Michael Dannemann, Janet Kelso  The American Journal of Human Genetics 
The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region  Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,
Jianbin Wang, H. Christina Fan, Barry Behr, Stephen R. Quake  Cell 
Volume 129, Issue 2, Pages (April 2007)
Tracing the Route of Modern Humans out of Africa by Using 225 Human Genome Sequences from Ethiopians and Egyptians  Luca Pagani, Stephan Schiffels, Deepti.
Population Genetic Structure of the People of Qatar
Volume 26, Issue 7, Pages (April 2016)
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection
Model-free Estimation of Recent Genetic Relatedness
Demographic History of Oceania Inferred from Genome-wide Data
Volume 26, Issue 24, Pages (December 2016)
Evidence for Balancing Selection from Nucleotide Sequence Analyses of Human G6PD  Brian C. Verrelli, John H. McDonald, George Argyropoulos, Giovanni Destro-Bisol,
Volume 146, Issue 6, Pages (September 2011)
Hi-C Analysis in Arabidopsis Identifies the KNOT, a Structure with Similarities to the flamenco Locus of Drosophila  Stefan Grob, Marc W. Schmid, Ueli.
Haplotype Estimation Using Sequencing Reads
Alessia Ranciaro, Michael C. Campbell, Jibril B
Identification and Validation of Genetic Variants that Influence Transcription Factor and Cell Signaling Protein Levels  Ronald J. Hause, Amy L. Stark,
Brian K. Maples, Simon Gravel, Eimear E. Kenny, Carlos D. Bustamante 
Signatures of Purifying and Local Positive Selection in Human miRNAs
Alternative Splicing QTLs in European and African Populations
Gene-Expression Variation Within and Among Human Populations
Leslie S. Emery, Joseph Felsenstein, Joshua M. Akey 
Biased Gene Conversion Skews Allele Frequencies in Human Populations, Increasing the Disease Burden of Recessive Alleles  Joseph Lachance, Sarah A. Tishkoff 
CA3 Retrieves Coherent Representations from Degraded Input: Direct Evidence for CA3 Pattern Completion and Dentate Gyrus Pattern Separation  Joshua P.
Michael Dannemann, Janet Kelso  The American Journal of Human Genetics 
Rajiv C. McCoy, Jon Wakefield, Joshua M. Akey  Cell 
A 3.4-kb Copy-Number Deletion near EPAS1 Is Significantly Enriched in High-Altitude Tibetans but Absent from the Denisovan Sequence  Haiyi Lou, Yan Lu,
Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes  Matthieu Deschamps, Guillaume Laval,
Ida Moltke, Matteo Fumagalli, Thorfinn S. Korneliussen, Jacob E
Volume 85, Issue 4, Pages (February 2015)
Japanese Population Structure, Based on SNP Genotypes from 7003 Individuals Compared to Other Ethnic Groups: Effects on Population-Based Association Studies 
Volume 173, Issue 1, Pages e9 (March 2018)
Joseph Rodriguez, Jerome S. Menet, Michael Rosbash  Molecular Cell 
Robust Inference of Identity by Descent from Exome-Sequencing Data
Volume 155, Issue 4, Pages (November 2013)
Genomic Flatlining in the Endangered Island Fox
Catarina D. Campbell, Nick Sampas, Anya Tsalenko, Peter H
Matthieu Foll, Oscar E. Gaggiotti, Josephine T
Highly Punctuated Patterns of Population Structure on the X Chromosome and Implications for African Evolutionary History  Charla A. Lambert, Caitlin F.
Complex Signatures of Natural Selection at the Duffy Blood Group Locus
Shuhua Xu, Wei Huang, Ji Qian, Li Jin 
Jonathan K. Pritchard, Joseph K. Pickrell, Graham Coop  Current Biology 
Stephen V. David, Benjamin Y. Hayden, James A. Mazer, Jack L. Gallant 
by Benjamin Vernot, Serena Tucci, Janet Kelso, Joshua G
Selina Vattathil, Joshua M. Akey  Cell 
A 3.4-kb Copy-Number Deletion near EPAS1 Is Significantly Enriched in High-Altitude Tibetans but Absent from the Denisovan Sequence  Haiyi Lou, Yan Lu,
Volume 122, Issue 6, Pages (September 2005)
Volume 22, Issue 1, Pages (January 2012)
Identifying Darwinian Selection Acting on Different Human APOL1 Variants among Diverse African Populations  Wen-Ya Ko, Prianka Rajan, Felicia Gomez, Laura.
Trevor J. Pemberton, Chaolong Wang, Jun Z. Li, Noah A. Rosenberg 
Complex History of Admixture between Modern Humans and Neandertals
Yu Zhang, Tianhua Niu, Jun S. Liu 
Volume 25, Issue 10, Pages (May 2015)
Xiaoshu Chen, Jianzhi Zhang  Cell Systems 
Matthew A. Saunders, Jeffrey M. Good, Elizabeth C. Lawrence, Robert E
Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach  Marc A. Coram, Sophie I. Candille, Qing.
The Genomic Footprints of the Fall and Recovery of the Crested Ibis
A Haplotype at STAT2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea  Fernando L. Mendez, Joseph C.
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region  Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,
Brian C. Verrelli, Sarah A. Tishkoff 
Population Genetic Structure of the People of Qatar
Genomic signatures of extensive inbreeding in Isle Royale wolves, a population on the threshold of extinction by Jacqueline A. Robinson, Jannikke Räikkönen,
Presentation transcript:

Volume 150, Issue 3, Pages 457-469 (August 2012) Evolutionary History and Adaptation from High-Coverage Whole-Genome Sequences of Diverse African Hunter-Gatherers  Joseph Lachance, Benjamin Vernot, Clara C. Elbers, Bart Ferwerda, Alain Froment, Jean-Marie Bodo, Godfrey Lema, Wenqing Fu, Thomas B. Nyambo, Timothy R. Rebbeck, Kun Zhang, Joshua M. Akey, Sarah A. Tishkoff  Cell  Volume 150, Issue 3, Pages 457-469 (August 2012) DOI: 10.1016/j.cell.2012.07.009 Copyright © 2012 Elsevier Inc. Terms and Conditions

Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure 1 Genomic Variation in African Hunter-Gatherers and Other Global Populations (A) Hunter-gatherer populations sequenced in our study (five males per population). HapMap abbreviations of publicly available genomes are also listed. (B and C) Numbers indicate how many variants belong to each subset of populations. (D and E) Principal component analysis of 68 high-coverage genomes. Pygmy genomes are indicated by green, Hadza by blue, Sandawe by red, and non-hunter-gatherer by gray circles. (F) Neighbor joining tree based on pairwise identity-by-state matrix distances using high-coverage whole-genome sequences from 68 individuals. See also Table S1, Table S2, Figure S2, and Figure S4. Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure 2 Times until Most Recent Common Ancestry and Evidence of Archaic Introgression (A) TMRCA of top candidate regions (solid lines) and of all regions (dotted lines) for the Pygmy, Hadza, and Sandawe hunter-gatherer populations and two European populations. Note that TMRCA represents the estimated time of divergence between the anatomically modern human and candidate introgressed sequences (Supplemental Information). TMRCA values of top candidate regions are significantly older than for random genomic regions (Kruskal-Wallis test, p < 2.2 × 10−16), but TMRCA values of top candidate regions from each population are not significantly different (Kruskal-Wallis test, p = 1). (B and C) STRUCTURE plots showing the proportion of ancestry for each individual based on the most likely number of subpopulations (K = 2 for putatively introgressed regions in B, and K = 3 for random regions in C). For each population, a “virtual” genome was constructed by concatenating sequence from individuals containing the putatively introgressed sequence (B) or from arbitrary individuals (C). Pi, Hi, and Si denote the virtual genomes constructed for the Pygmy, Hadza, and Sandawe samples, respectively. (D) TMRCA values of top candidate regions for introgression unique to a single hunter-gatherer population are significantly lower than TMRCA values of regions shared between all hunter-gatherer populations (Wilcoxon rank sum test, p = 2.2 × 10−5). (E) Genomic distribution of the top 350 introgressed regions for the Pygmy, Hadza, and Sandawe populations and for two European populations in 2 Mb windows. Colors indicate whether windows contain introgressed regions from a single hunter-gatherer population (orange), multiple hunter-gatherer populations (blue), or hunter-gatherer and European populations (open black circle). Counts are for hunter-gatherer regions only. See also Figure S7. Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure 3 Characteristics of S∗ in Real and Simulated Data (A and B) Neanderthal variants are not enriched in top candidate regions for three African hunter-gatherer populations (A) but are enriched in top candidate regions from two European populations (B). (C) Box-and-whisker plot of TMRCA estimates for top 0.5% of 50 kb regions in simulated data, varying time of split with the archaic population from 300 kya to 1000 kya; introgression was simulated into Europeans (white boxes) and Yorubans (gray boxes). See also Figure S7. Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure 4 Divergent Genomic Regions between Hunter-Gatherers and Non-Hunter-Gatherers and Genomic Distributions of Ancestry Informative Markers Each dot represents a nonoverlapping 100 kb window. Colors correspond to different chromosomes. For each population, genes found in the top ten windows are listed in bold. If no genes are present in a top ten window, the nearest gene is listed in normal font. (A–C) Number of LSBL outliers (top 1%) per 100 kb window. (D–F) Number of AIMs per 100 kb window. The 3p14.3 AIM cluster spans multiple 100 kb windows and includes the genes HESX1, APPL1, ASB14, and DNAH12. See also Table S5 and Table S6. Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure 5 Pygmy AIMs, Allele Frequencies, and Height Associations (A) Pygmy ancestry informative markers (AIMs) (green lines) located on chromosome 3. Green shading in the LSBL plot indicates genomic regions with an excess of LSBL outliers near the 3p14.3 and 3p11.2 AIM clusters. Black triangles indicate SNPs genotyped in a larger sample of Pygmy and Bantu individuals. (B) Allele frequencies of Pygmy AIMs genotyped in a broad sample of Central African Pygmy and Bantu individuals. Significant associations (p < 0.05) with height are indicated for males (green asterisks) and for both sexes pooled together (black asterisks). Sample sizes are also listed (n = the number of genotyped individuals per population). See also Table S7. Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure S1 Quality Control, Related to Table 1 (A) Venn diagram showing the variants in dbSNP 131 (23.4M variants) and hunter-gatherer genomes (13.4M variants). All shapes are drawn to scale, and overlap between each of these two sets amounts to 7.9 million variants. (B) Fully called sites have lower error rates. Sites were binned according to whether they were fully called or called in a subset of 15 hunter-gatherer genomes. Discordance rates are for variant positions in technical replicates, and overestimate actual error rates. (C) Most variant sites are fully called in all 15 hunter-gatherers. (D) Summed chi-square statistics from Pygmy, Hadza, and Sandawe populations (departure from Hardy-Weinberg proportions filter). Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure S2 Variants from Whole-Genome Sequencing, Related to Figure 1 (A) Variants per sequenced genome for different populations. Non-hunter-gatherer populations analyzed are Yoruba (YRI), Asian (ASN, i.e., CHB and JPT), and European (CEU). “Novel” refers to variants absent from dbSNP131. (B) Power-law parameters for the number of variants observed in each sequenced genome. (C, E, G, and I) Histograms showing the number of variants per Mb. Each histogram contains 150 bins. (D, F, H, and J) Genomic distribution of variants per Mb. (C) and (D) refer to the pooled set of all 15 hunter-gatherer genomes. Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure S3 Signatures of Purifying Selection in Geographically Diverse Populations with Different Subsistence Patterns, Related to Table 2 To control for sample size, four diploid genomes were analyzed for each population. Hunter-gatherer (HG) populations are Pygmies, Hadza, and Sandawe. African agriculturalists and pastoralists (AP) are YRI, LWK, and MKK. Non-African agriculturalists (NA) are CEU, CHB, and JPT. (A) Mean derived allele frequencies (DAF) for intergenic and exon SNPs. (B) DAF in exon SNPs relative to intergenic SNPs. (C) PolyPhen-2 data indicating the proportion of nonsynonymous variants classified as benign, possibly damaging, or probably damaging. (D) Number of nonsynonymous variants per genome divided by the number of synonymous variants per genome. (E) Statistical tests between groups of populations. ∗∗ indicates p < 0.01 and ∗∗∗ indicates p < 10−4. Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure S4 PCA Plots, Related to Figure 1 (A–F) In each panel the x axis corresponds to PC1. The proportion of the variance explained by each PC is indicated along each axis, and individuals are represented by population name. Pygmies are labeled green, Hadza are labeled blue, and Sandawe are labeled red. Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure S5 Runs of Homozygosity, Related to Table 2 (A and B) In each panel cROH corresponds to the cumulative number of base pairs observed in runs of homozygosity. A run of homozygosity is defined as a 100kb region lacking heterozygote calls (A) or a 1Mb region lacking heterozygote calls (B). One genotyping error is tolerated per 50kb. Colors of points differ for Pygmies (red), Hadza (blue), and Sandawe (red). (C) Statistics for runs of homozygosity. Population means ± one standard deviation are listed. cROH refers to the cumulative number of base pairs found in runs of homozygosity (ROH). CV refers to the coefficient of variation (standard deviation / mean). Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure S6 Derived Allele Frequency Distributions for Hunter-Gatherer Populations, Related to Table 2 (A) Derived allele frequency (DAF) distributions for hunter-gatherer populations and null expectations from the neutral theory (infinite sites model with constant population size). (B–D) Allele frequency distributions for pairs of populations using whole-genome sequencing data. (E–G) Allele frequency distributions for pairs of populations using Illumina1M genotyping array data. In each panel, x- and y-axes correspond to the derived allele frequency in a particular population and the z-axis corresponds to the proportion of SNPs with a particular set of allele frequencies. Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions

Figure S7 S∗ Statistics Are Robust at Detecting Introgression, Related to Figures 2 and 3 (A) False Discovery Rate (FDR) for top 0.5% of simulated 50kb regions, ranked by S∗, under several demographic parameters. (B) Recombination rate of top candidate regions (red dashed line), and of all regions, in three hunter gatherer populations. (C) Box-and-whisker plot of TMRCA estimates for shared and unique top candidate regions in Pygmy, Hadza and Sandawe. (D) Tail of simulated S∗ distribution over 0%–3.8% introgressed sequence per individual. (E) Tail of observed S∗ distribution for 11 populations sequenced by Complete Genomics. Cell 2012 150, 457-469DOI: (10.1016/j.cell.2012.07.009) Copyright © 2012 Elsevier Inc. Terms and Conditions