Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia  Kevin J. Galinsky, Gaurav Bhatia, Po-Ru Loh, Stoyan Georgiev,

Slides:



Advertisements
Similar presentations
Previous Estimates of Mitochondrial DNA Mutation Level Variance Did Not Account for Sampling Error: Comparing the mtDNA Genetic Bottleneck in Mice and.
Advertisements

The Structure of Common Genetic Variation in United States Populations
Katarzyna Bryc, Eric Y. Durand, J
Combining Evidence of Natural Selection with Association Analysis Increases Power to Detect Malaria-Resistance Variants  George Ayodo, Alkes L. Price,
Genetic Landscape of Eurasia and “Admixture” in Uyghurs
Genome-wide Comparison of African-Ancestry Populations from CARe and Other Cohorts Reveals Signals of Natural Selection  Gaurav Bhatia, Nick Patterson,
Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool  Luca Pagani, Toomas Kivisild, Ayele Tarekegn,
Rapid Simulation of P Values for Product Methods and Multiple-Testing Adjustment in Association Studies  S.R. Seaman, B. Müller-Myhsok  The American Journal.
Tracing the Route of Modern Humans out of Africa by Using 225 Human Genome Sequences from Ethiopians and Egyptians  Luca Pagani, Stephan Schiffels, Deepti.
Katarzyna Bryc, Eric Y. Durand, J
Was ADH1B under Selection in European Populations?
The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection
Model-free Estimation of Recent Genetic Relatedness
Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits  Nicholas Mancuso, Huwenbo Shi, Pagé.
Comparing Algorithms for Genotype Imputation
Haplotype Estimation Using Sequencing Reads
Haplotypes and Linkage Disequilibrium at the Phenylalanine Hydroxylase Locus, PAH, in a Global Representation of Populations  Judith R. Kidd, Andrew J.
Genome-wide Comparison of African-Ancestry Populations from CARe and Other Cohorts Reveals Signals of Natural Selection  Gaurav Bhatia, Nick Patterson,
Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation- Maximization Algorithm for Unphased Diploid Genotype Data  Daniele.
Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry  Oscar.
Improved Heritability Estimation from Genome-wide SNPs
QTL Fine Mapping by Measuring and Testing for Hardy-Weinberg and Linkage Disequilibrium at a Series of Linked Marker Loci in Extreme Samples of Populations 
Rounak Dey, Ellen M. Schmidt, Goncalo R. Abecasis, Seunggeun Lee 
Elizabeth Theusch, Analabha Basu, Jane Gitschier 
Gene-Expression Variation Within and Among Human Populations
Arpita Ghosh, Fei Zou, Fred A. Wright 
Jacob E. Crawford, Ricardo Amaru, Jihyun Song, Colleen G
A 3.4-kb Copy-Number Deletion near EPAS1 Is Significantly Enriched in High-Altitude Tibetans but Absent from the Denisovan Sequence  Haiyi Lou, Yan Lu,
HYST: A Hybrid Set-Based Test for Genome-wide Association Studies, with Application to Protein-Protein Interaction-Based Association Analysis  Miao-Xin.
Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes  Matthieu Deschamps, Guillaume Laval,
Variant Association Tools for Quality Control and Analysis of Large-Scale Sequence and Genotyping Array Data  Gao T. Wang, Bo Peng, Suzanne M. Leal  The.
Transethnic Genetic-Correlation Estimates from Summary Statistics
Japanese Population Structure, Based on SNP Genotypes from 7003 Individuals Compared to Other Ethnic Groups: Effects on Population-Based Association Studies 
Xiangqing Sun, Robert Elston, Nathan Morris, Xiaofeng Zhu 
Maximizing the Power of Principal-Component Analysis of Correlated Phenotypes in Genome-wide Association Studies  Hugues Aschard, Bjarni J. Vilhjálmsson,
Genetic Evidence for Recent Population Mixture in India
Selection and Reduced Population Size Cannot Explain Higher Amounts of Neandertal Ancestry in East Asian than in European Human Populations  Bernard Y.
Ivan P. Gorlov, Olga Y. Gorlova, Shamil R. Sunyaev, Margaret R
Jeffrey Staples, Dandi Qiao, Michael H. Cho, Edwin K
A Weighted False Discovery Rate Control Procedure Reveals Alleles at FOXA2 that Influence Fasting Glucose Levels  Chao Xing, Jonathan C. Cohen, Eric Boerwinkle 
Matthieu Foll, Oscar E. Gaggiotti, Josephine T
Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies 
Brian P. McEvoy, Joanne M. Lind, Eric T. Wang, Robert K
Genotype Imputation with Millions of Reference Samples
The Population Reference Sample, POPRES: A Resource for Population, Disease, and Pharmacological Genetics Research  Matthew R. Nelson, Katarzyna Bryc,
Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent  Sharon R. Browning, Brian L. Browning  The.
Hugues Aschard, Bjarni J. Vilhjálmsson, Amit D. Joshi, Alkes L
Shuhua Xu, Wei Huang, Ji Qian, Li Jin 
Brian P. McEvoy, Joanne M. Lind, Eric T. Wang, Robert K
A 3.4-kb Copy-Number Deletion near EPAS1 Is Significantly Enriched in High-Altitude Tibetans but Absent from the Denisovan Sequence  Haiyi Lou, Yan Lu,
Huwenbo Shi, Gleb Kichaev, Bogdan Pasaniuc 
A Fast, Powerful Method for Detecting Identity by Descent
Response to Shen et al. The American Journal of Human Genetics
Xiaoquan Wen, Yeji Lee, Francesca Luca, Roger Pique-Regi 
Jared R. Kohler, David J. Cutler 
L-GATOR: Genetic Association Testing for a Longitudinally Measured Quantitative Trait in Samples with Related Individuals  Xiaowei Wu, Mary Sara McPeek 
Xiang Wan, Can Yang, Qiang Yang, Hong Xue, Xiaodan Fan, Nelson L. S
Complex History of Admixture between Modern Humans and Neandertals
Features of Evolution and Expansion of Modern Humans, Inferred from Genomewide Microsatellite Markers  Lev A. Zhivotovsky, Noah A. Rosenberg, Marcus W.
Leslie S. Emery, Kevin M. Magnaye, Abigail W. Bigham, Joshua M
Yu Zhang, Tianhua Niu, Jun S. Liu 
Tao Wang, Robert C. Elston  The American Journal of Human Genetics 
Iuliana Ionita-Laza, Seunggeun Lee, Vlad Makarov, Joseph D
Abraham's Children in the Genome Era: Major Jewish Diaspora Populations Comprise Distinct Genetic Clusters with Shared Middle Eastern Ancestry  Gil Atzmon,
Haplotypes and Linkage Disequilibrium at the Phenylalanine Hydroxylase Locus, PAH, in a Global Representation of Populations  Judith R. Kidd, Andrew J.
Enhanced Localization of Genetic Samples through Linkage-Disequilibrium Correction  Yael Baran, Inés Quintela, Ángel Carracedo, Bogdan Pasaniuc, Eran Halperin 
Functional Architectures of Local and Distal Regulation of Gene Expression in Multiple Human Tissues  Xuanyao Liu, Hilary K. Finucane, Alexander Gusev,
Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach  Marc A. Coram, Sophie I. Candille, Qing.
Beyond GWASs: Illuminating the Dark Road from Association to Function
A Transient Pulse of Genetic Admixture from the Crusaders in the Near East Identified from Ancient Genome Sequences  Marc Haber, Claude Doumet-Serhal,
Presentation transcript:

Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia  Kevin J. Galinsky, Gaurav Bhatia, Po-Ru Loh, Stoyan Georgiev, Sayan Mukherjee, Nick J. Patterson, Alkes L. Price  The American Journal of Human Genetics  Volume 98, Issue 3, Pages 456-472 (March 2016) DOI: 10.1016/j.ajhg.2015.12.022 Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 1 Running Time and Memory Requirements of FastPCA and Other Algorithms The CPU time and memory usage of FastPCA scale linearly with the number of individuals. On the other hand, smartpca and PLINK2-pca scale between quadratically and cubically, depending on whether computing the GRM (quadratic) or the eigendecomposition (cubic) is the rate-limiting step. The running time of flashpca scales quadratically (because it computes the GRM), but its memory usage scales linearly because it stores the normalized genotype matrix in memory. With 50k individuals, smartpca exceeded the time constraint (100 hr) and flashpca exceeded the memory constraint (40 GB). With 70k individuals, PLINK2-pca exceeded the memory constraint (40 GB). Run times are based on one core of a 2.26-GHz Intel Xeon L5640 processor; we caution that run time comparisons might vary by a small constant factor as a function of the computing environment. Numerical data are provided in Table S1 and the error bars indicate ± 1 SD. The American Journal of Human Genetics 2016 98, 456-472DOI: (10.1016/j.ajhg.2015.12.022) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 2 Accuracy of FastPCA and PLINK2-pca FastPCA and PLINK2-pca were run on simulated populations of varying divergence. The simulated data comprised 50k SNPs and 10k total individuals from six subpopulations derived from a single ancestral population. PCs computed by PLINK2-pca and FastPCA were compared to the true population PCs and to each other using the mean of explained variances (MEV) metric (see Material and Methods). FastPCA explained the same amount of true population variance as PLINK2-pca in all experiments, and the methods output nearly identical PCs (MEV > 0.999). The American Journal of Human Genetics 2016 98, 456-472DOI: (10.1016/j.ajhg.2015.12.022) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 3 Power of PC-Based Selection Statistic The allele frequency difference at selected SNPs was varied between two populations separated by varying FST. The significance threshold was set to 8.3 × 10−7 based on 60K SNPs tested. (A) With 50k samples, the power curves for FST = {0.05,0.02,0.01,0.005,0.002,0.001,0.0005} showed a phase change. (B) Varying the number of samples for FST = 0.001 demonstrated that this phase change was more gradual at smaller sample sizes. (C) Varying the number of samples at FST = 0.01 showed that the impact of sample size was less pronounced than at FST = 0.001. The American Journal of Human Genetics 2016 98, 456-472DOI: (10.1016/j.ajhg.2015.12.022) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 4 FastPCA Results on GERA Dataset FastPCA and SNPweights66 were run on the GERA cohort and the principal components from FastPCA were plotted. Individuals were colored by mapping Northwest European (NW), Southeast European (SE), and Ashkenazi Jewish (AJ) ancestry estimated by SNPweights to the red/green/blue color axes (see Material and Methods). PC1 and PC2 (top) separate the GERA cohort into northwest (NW), southeast (SE), and Ashkenazi Jewish (AJ) subpopulations. PC3 separates the AJ and SE individuals, and PC3 and PC4 (bottom) further separates the NW European individuals. The American Journal of Human Genetics 2016 98, 456-472DOI: (10.1016/j.ajhg.2015.12.022) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 5 Separation of Irish, Eastern European, and Northern European Individuals in GERA Data We report results of projecting POPRES67 individuals onto top PCs. The plot of PC3 versus PC4 (bottom) shows that the Northwest European (NW) individuals are further separated into Irish and Eastern European and Northern European populations. Projected populations were colored based on correspondence to the ancestry assignment from SNPweights,66 except that Irish and Eastern European individuals were colored purple and orange, respectively, to indicate additional population structure. The American Journal of Human Genetics 2016 98, 456-472DOI: (10.1016/j.ajhg.2015.12.022) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 6 Signals of Selection in the Top PCs of GERA Data We display Manhattan plots for selection statistics computed using each of the top four PCs. The gray line indicates the genome-wide significance threshold of 2.05 × 10−8 based on 2,435,924 hypotheses tested (α = 0.05, 608,981 SNPs × 4 PCs). The American Journal of Human Genetics 2016 98, 456-472DOI: (10.1016/j.ajhg.2015.12.022) Copyright © 2016 The American Society of Human Genetics Terms and Conditions