Genotype Imputation with Millions of Reference Samples

Slides:



Advertisements
Similar presentations
Presented by Qing Duan Dr. Yun Li group UNC at Chapel Hill
Advertisements

Fragile X and X-Linked Intellectual Disability: Four Decades of Discovery Herbert A. Lubs, Roger E. Stevenson, Charles E. Schwartz The American Journal.
Constrained Score Statistics Identify Genetic Variants Interacting with Multiple Risk Factors in Barrett’s Esophagus  James Y. Dai, Jean de Dieu Tapsoba,
A Haplotype at STAT2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea  Fernando L. Mendez, Joseph C.
Michael Dannemann, Janet Kelso  The American Journal of Human Genetics 
A Genomewide Admixture Mapping Panel for Hispanic/Latino Populations
Marc A. Coram, Huaying Fang, Sophie I. Candille, Themistocles L
Rapid Simulation of P Values for Product Methods and Multiple-Testing Adjustment in Association Studies  S.R. Seaman, B. Müller-Myhsok  The American Journal.
Chao Tian, David A. Hinds, Russell Shigeta, Sharon G
Model-Free Linkage Analysis with Covariates Confirms Linkage of Prostate Cancer to Chromosomes 1 and 4  Katrina A.B. Goddard, John S. Witte, Brian K.
Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia  Kevin J. Galinsky, Gaurav Bhatia, Po-Ru Loh, Stoyan Georgiev,
High-Resolution Genetic Maps Identify Multiple Type 2 Diabetes Loci at Regulatory Hotspots in African Americans and Europeans  Winston Lau, Toby Andrew,
Comparing Algorithms for Genotype Imputation
Henry R. Johnston, David J. Cutler 
Haplotype Estimation Using Sequencing Reads
Linkage Thresholds for Two-stage Genome Scans
Deep Whole-Genome Sequencing of 100 Southeast Asian Malays
Improved Heritability Estimation from Genome-wide SNPs
Brian K. Maples, Simon Gravel, Eimear E. Kenny, Carlos D. Bustamante 
The American Journal of Human Genetics 
The Structure of Linkage Disequilibrium at the DBH Locus Strongly Influences the Magnitude of Association between Diallelic Markers and Plasma Dopamine.
Rounak Dey, Ellen M. Schmidt, Goncalo R. Abecasis, Seunggeun Lee 
Weight Loss after Gastric Bypass Is Associated with a Variant at 15q26
10 Years of GWAS Discovery: Biology, Function, and Translation
Michael Dannemann, Janet Kelso  The American Journal of Human Genetics 
Variant Association Tools for Quality Control and Analysis of Large-Scale Sequence and Genotyping Array Data  Gao T. Wang, Bo Peng, Suzanne M. Leal  The.
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Kristina Allen-Brady, Peggy A. Norton, James M
Transethnic Genetic-Correlation Estimates from Summary Statistics
Japanese Population Structure, Based on SNP Genotypes from 7003 Individuals Compared to Other Ethnic Groups: Effects on Population-Based Association Studies 
Volume 173, Issue 1, Pages e9 (March 2018)
Random-Effects Model Aimed at Discovering Associations in Meta-Analysis of Genome- wide Association Studies  Buhm Han, Eleazar Eskin  The American Journal.
Sequencing the IL4 locus in African Americans implicates rare noncoding variants in asthma susceptibility  Gabe Haller, BA, Dara G. Torgerson, PhD, Carole.
A Weighted False Discovery Rate Control Procedure Reveals Alleles at FOXA2 that Influence Fasting Glucose Levels  Chao Xing, Jonathan C. Cohen, Eric Boerwinkle 
Studying Gene and Gene-Environment Effects of Uncommon and Common Variants on Continuous Traits: A Marker-Set Approach Using Gene-Trait Similarity Regression 
Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies 
Constrained Score Statistics Identify Genetic Variants Interacting with Multiple Risk Factors in Barrett’s Esophagus  James Y. Dai, Jean de Dieu Tapsoba,
A Three–Single-Nucleotide Polymorphism Haplotype in Intron 1 of OCA2 Explains Most Human Eye-Color Variation  David L. Duffy, Grant W. Montgomery, Wei.
10 Years of GWAS Discovery: Biology, Function, and Translation
Five Years of GWAS Discovery
Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent  Sharon R. Browning, Brian L. Browning  The.
Hugues Aschard, Bjarni J. Vilhjálmsson, Amit D. Joshi, Alkes L
Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test  Michael C. Wu, Seunggeun Lee, Tianxi Cai, Yun Li, Michael.
Highly Punctuated Patterns of Population Structure on the X Chromosome and Implications for African Evolutionary History  Charla A. Lambert, Caitlin F.
Dan-Yu Lin, Zheng-Zheng Tang  The American Journal of Human Genetics 
Limited Clinical Utility of Non-invasive Prenatal Testing for Subchromosomal Abnormalities  Kitty K. Lo, Evangelia Karampetsou, Christopher Boustred,
Erratum The American Journal of Human Genetics
Daniel J. Schaid, Shannon K. McDonnell, Scott J. Hebbring, Julie M
A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals  Brian L. Browning, Sharon.
Huwenbo Shi, Gleb Kichaev, Bogdan Pasaniuc 
A Common Genetic Variant in the Neurexin Superfamily Member CNTNAP2 Increases Familial Risk of Autism  Dan E. Arking, David J. Cutler, Camille W. Brune,
Gad Kimmel, Ron Shamir  The American Journal of Human Genetics 
Imputing Phenotypes for Genome-wide Association Studies
A Fast, Powerful Method for Detecting Identity by Descent
Stephen Leslie, Peter Donnelly, Gil McVean 
Wei Pan, Il-Youp Kwak, Peng Wei  The American Journal of Human Genetics 
L-GATOR: Genetic Association Testing for a Longitudinally Measured Quantitative Trait in Samples with Related Individuals  Xiaowei Wu, Mary Sara McPeek 
Yu Zhang, Tianhua Niu, Jun S. Liu 
A Definitive Haplotype Map as Determined by Genotyping Duplicated Haploid Genomes Finds a Predominant Haplotype Preference at Copy-Number Variation Events 
Efficient Computation of Significance Levels for Multiple Associations in Large Studies of Correlated Data, Including Genomewide Association Studies 
Iuliana Ionita-Laza, Seunggeun Lee, Vlad Makarov, Joseph D
Sarah E. Medland, Dale R. Nyholt, Jodie N. Painter, Brian P
Enhanced Localization of Genetic Samples through Linkage-Disequilibrium Correction  Yael Baran, Inés Quintela, Ángel Carracedo, Bogdan Pasaniuc, Eran Halperin 
Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms  Carl A. Anderson, Fredrik H. Pettersson,
Giulio Genovese, Robert E. Handsaker, Heng Li, Eimear E
Genotype-Imputation Accuracy across Worldwide Human Populations
Harold A. Nieuwboer, René Pool, Conor V. Dolan, Dorret I
Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach  Marc A. Coram, Sophie I. Candille, Qing.
A Haplotype at STAT2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea  Fernando L. Mendez, Joseph C.
Zuoheng Wang, Mary Sara McPeek  The American Journal of Human Genetics 
Presentation transcript:

Genotype Imputation with Millions of Reference Samples Brian L. Browning, Sharon R. Browning  The American Journal of Human Genetics  Volume 98, Issue 1, Pages 116-126 (January 2016) DOI: 10.1016/j.ajhg.2015.11.020 Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 1 Genotype Imputation Accuracy for Beagle v.4.1, Minimac3, and Impute2 Genotype imputation accuracy when imputing genotypes from reference panels of increasing size. The 1000 Genomes Project data for chromosome 20 were divided into a reference panel with 2,452 sequenced individuals and an imputation target with 52 individuals genotyped on the Illumina Omni2.5 array and having all other sequenced variants masked. The UK10K Project data for chromosome 20 was used to impute the 503 designated European samples from the 1000 Genomes Project. The target samples were genotyped on the Illumina Omni2.5 array and had all other sequenced variants masked. The three largest reference panels have 10 Mb of simulated sequence data for 50,000, 100,000, and 200,000 individuals. For each simulated reference panel, the imputation target was 1,000 simulated individuals genotyped for 3,333 markers in the 10 Mb region, corresponding to a genome-wide array with 1M SNPs. Imputed genotypes were binned according to the minor allele count of the marker in the reference panel. The squared correlation between the imputed minor-allele dose and the true minor-allele dosage is reported for the genotypes in each minor allele count bin. The horizontal axis in each panel is on a log scale. Impute2 was not run with the 100,000 and 200,000 member reference panels because of memory constraints. When running Impute2 with 50,000 reference samples, the 10 Mb region was broken into six 1.67 Mb windows with a 250 kb buffer appended to the end of each window in order to avoid exceeding the available computer memory. The American Journal of Human Genetics 2016 98, 116-126DOI: (10.1016/j.ajhg.2015.11.020) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 2 Memory Use and Computation Time for Beagle v.4.1 and Minimac3 for VCF Reference Data Three reference panels in VCF format with 50,000, 100,000, and 200,000 individuals and 10 Mb of simulated sequence data were used to impute genotypes in 1,000 individuals genotyped on a SNP array with 3,333 markers in the 10 Mb region, corresponding to a genome-wide array with 1M SNPs. Beagle v.4.1 was run with 12 computational threads, and Minimac3 was run with one computational thread. CPU time includes the sum of the computation time consumed by each computational thread. The American Journal of Human Genetics 2016 98, 116-126DOI: (10.1016/j.ajhg.2015.11.020) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 3 Memory Use and Computation Time for Beagle v.4.1 and Minimac3 for Pre-processed Reference Data Three reference panels with 50,000, 100,000, and 200,000 individuals and 10 Mb of simulated sequence data were used to impute genotypes in 1,000 individuals genotyped on a SNP array with 3,333 markers in the 10 Mb region, corresponding to a genome-wide array with 1M SNPs. Reference data are in bref format (Beagle) and m3vcf format (Minimac3). Beagle v.4.1 was run with 12 computational threads, and Minimac3 was run with 1 computational thread. CPU time includes the sum of the computation time consumed by each computational thread. The American Journal of Human Genetics 2016 98, 116-126DOI: (10.1016/j.ajhg.2015.11.020) Copyright © 2016 The American Society of Human Genetics Terms and Conditions

Figure 4 Memory Use and Computation Time for Beagle v.4.1 for Millions of Reference Samples Beagle’s memory requirements and computation time for imputing 10 Mb of simulated sequence data from binary reference files having one million to five million reference samples, each with 1,294,053 markers. The simulated imputation target was 1,000 individuals genotyped on a 1M SNP array (3,333 markers in the 10 Mb region). CPU time includes the sum of the computation time consumed by each computational thread. All Beagle analyses used 12 computational threads. The wall clock computation time required to prepare each binary reference file was approximately four to five times greater than the wall clock imputation time reported in this figure. The American Journal of Human Genetics 2016 98, 116-126DOI: (10.1016/j.ajhg.2015.11.020) Copyright © 2016 The American Society of Human Genetics Terms and Conditions