Brian K. Maples, Simon Gravel, Eimear E. Kenny, Carlos D. Bustamante 

Slides:



Advertisements
Similar presentations
A Haplotype at STAT2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea  Fernando L. Mendez, Joseph C.
Advertisements

Sensitivity Analysis of the MGMT-STP27 Model and Impact of Genetic and Epigenetic Context to Predict the MGMT Methylation Status in Gliomas and Other.
The Structure of Common Genetic Variation in United States Populations
Katarzyna Bryc, Eric Y. Durand, J
Michael Dannemann, Janet Kelso  The American Journal of Human Genetics 
Sensitivity Analysis of the MGMT-STP27 Model and Impact of Genetic and Epigenetic Context to Predict the MGMT Methylation Status in Gliomas and Other.
Imputation-based local ancestry inference in admixed populations
Itsik Pe’er, Yves R. Chretien, Paul I. W. de Bakker, Jeffrey C
Genomic Patterns of Homozygosity in Worldwide Human Populations
Katarzyna Bryc, Eric Y. Durand, J
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection
Model-free Estimation of Recent Genetic Relatedness
Daniel Greene, Sylvia Richardson, Ernest Turro 
Alicia R. Martin, Christopher R. Gignoux, Raymond K
Comparing Algorithms for Genotype Imputation
Henry R. Johnston, David J. Cutler 
Haplotype Estimation Using Sequencing Reads
Linkage Thresholds for Two-stage Genome Scans
Estimating Kinship in Admixed Populations
David H. Spencer, Kerry L. Bubb, Maynard V. Olson 
Thomas Willems, Melissa Gymrek, G
Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation- Maximization Algorithm for Unphased Diploid Genotype Data  Daniele.
Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry  Oscar.
The Genetic Legacy of the Indian Ocean Slave Trade: Recent Admixture and Post- admixture Selection in the Makranis of Pakistan  Romuald Laso-Jadart, Christine.
Deep Whole-Genome Sequencing of 100 Southeast Asian Malays
Alternative Splicing QTLs in European and African Populations
Gene-Expression Variation Within and Among Human Populations
Biased Gene Conversion Skews Allele Frequencies in Human Populations, Increasing the Disease Burden of Recessive Alleles  Joseph Lachance, Sarah A. Tishkoff 
XMCPDT Does Have Correct Type I Error Rates
Variant Association Tools for Quality Control and Analysis of Large-Scale Sequence and Genotyping Array Data  Gao T. Wang, Bo Peng, Suzanne M. Leal  The.
Ida Moltke, Matteo Fumagalli, Thorfinn S. Korneliussen, Jacob E
Volume 173, Issue 1, Pages e9 (March 2018)
SNP Arrays in Heterogeneous Tissue: Highly Accurate Collection of Both Germline and Somatic Genetic Information from Unpaired Single Tumor Samples  Guillaume.
Population Genetic Inference from Personal Genome Data: Impact of Ancestry and Admixture on Human Genomic Variation  Jeffrey M. Kidd, Simon Gravel, Jake.
Sequencing the IL4 locus in African Americans implicates rare noncoding variants in asthma susceptibility  Gabe Haller, BA, Dara G. Torgerson, PhD, Carole.
Robust Inference of Identity by Descent from Exome-Sequencing Data
Mamoru Kato, Yusuke Nakamura, Tatsuhiko Tsunoda 
Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies 
Brian P. McEvoy, Joanne M. Lind, Eric T. Wang, Robert K
Genotype Imputation with Millions of Reference Samples
Pier Francesco Palamara, Laurent C. Francioli, Peter R
Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent  Sharon R. Browning, Brian L. Browning  The.
Privacy Risks from Genomic Data-Sharing Beacons
Highly Punctuated Patterns of Population Structure on the X Chromosome and Implications for African Evolutionary History  Charla A. Lambert, Caitlin F.
Complex Signatures of Natural Selection at the Duffy Blood Group Locus
Shuhua Xu, Wei Huang, Ji Qian, Li Jin 
Pier Francesco Palamara, Todd Lencz, Ariel Darvasi, Itsik Pe’er 
Brian P. McEvoy, Joanne M. Lind, Eric T. Wang, Robert K
A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals  Brian L. Browning, Sharon.
Leonardo Arbiza, Srikanth Gottipati, Adam Siepel, Alon Keinan 
Gad Kimmel, Ron Shamir  The American Journal of Human Genetics 
Exploring Population Admixture Dynamics via Empirical and Simulated Genome-wide Distribution of Ancestral Chromosomal Segments  Wenfei Jin, Sijia Wang,
A Fast, Powerful Method for Detecting Identity by Descent
Xiaoquan Wen, Yeji Lee, Francesca Luca, Roger Pique-Regi 
Identifying Darwinian Selection Acting on Different Human APOL1 Variants among Diverse African Populations  Wen-Ya Ko, Prianka Rajan, Felicia Gomez, Laura.
Trevor J. Pemberton, Chaolong Wang, Jun Z. Li, Noah A. Rosenberg 
Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data  Zihuai.
Complex History of Admixture between Modern Humans and Neandertals
Madhumathi Rao, MD, V.S. Balakrishnan, MD, PhD 
Leslie S. Emery, Kevin M. Magnaye, Abigail W. Bigham, Joshua M
Yu Zhang, Tianhua Niu, Jun S. Liu 
A Definitive Haplotype Map as Determined by Genotyping Duplicated Haploid Genomes Finds a Predominant Haplotype Preference at Copy-Number Variation Events 
Enhanced Localization of Genetic Samples through Linkage-Disequilibrium Correction  Yael Baran, Inés Quintela, Ángel Carracedo, Bogdan Pasaniuc, Eran Halperin 
Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms  Carl A. Anderson, Fredrik H. Pettersson,
Giulio Genovese, Robert E. Handsaker, Heng Li, Eimear E
Genotype-Imputation Accuracy across Worldwide Human Populations
A Haplotype at STAT2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea  Fernando L. Mendez, Joseph C.
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
Beyond GWASs: Illuminating the Dark Road from Association to Function
Presentation transcript:

RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry Inference  Brian K. Maples, Simon Gravel, Eimear E. Kenny, Carlos D. Bustamante  The American Journal of Human Genetics  Volume 93, Issue 2, Pages 278-288 (August 2013) DOI: 10.1016/j.ajhg.2013.06.020 Copyright © 2013 The American Society of Human Genetics Terms and Conditions

Figure 1 The LAI Algorithm To illustrate the working of RFMix, we consider a single admixed chromosome from an individual with ancestry from two diverged populations. (A) For building reference panels, samples are collected from proxy populations related to the ancestral populations. Phased chromosomes are divided into windows of equal size on the basis of genetic distance. (B) For each window, a random forest is trained to distinguish ancestry by using the reference panels. (C) Considering the admixed chromosome, each tree in the random forest generates a fractional vote for each ancestry by following the path through the tree corresponding to the admixed sequence. (D) These votes are summed, producing posterior ancestry probabilities within each window. These posterior probabilities are used for determining the most likely sequence of ancestry across windows via MAP inference (black line) or via max marginalization of the forward-backward posterior probabilities (not shown). (E) The local ancestries inferred by MAP across the admixed chromosome. The American Journal of Human Genetics 2013 93, 278-288DOI: (10.1016/j.ajhg.2013.06.020) Copyright © 2013 The American Society of Human Genetics Terms and Conditions

Figure 2 Comparison of Diploid Ancestry Error between LAMP-HAP and RFMix Inferences for Simulated Latinos across a Range of Real-World Scenarios We simulated ten Latino individuals as described in the text and used reference panels composed of 30 ideal samples (A). We then simulated an additional 30 Latino individuals and used reference panels composed of three ideal samples (B), 30 proxy samples (C), or three proxy samples (D). The 0th EM iteration of RFMix refers to the initial round of learning and inference shown in Figure 1. The American Journal of Human Genetics 2013 93, 278-288DOI: (10.1016/j.ajhg.2013.06.020) Copyright © 2013 The American Society of Human Genetics Terms and Conditions

Figure 3 Phase Correction with Local-Ancestry Information Fraction of SNP pairs that are within contiguous heterozygous ancestry regions and that are phased correctly with respect to each other as a function of the genetic distance between them. Blue and red lines correspond to simulated Latino and African American samples, respectively. Dashed and solid lines correspond to phasings that have and have not been corrected with local ancestry, respectively. The horizontal line at 0.5 marks the expected performance of random phasing. The American Journal of Human Genetics 2013 93, 278-288DOI: (10.1016/j.ajhg.2013.06.020) Copyright © 2013 The American Society of Human Genetics Terms and Conditions

Figure 4 Accuracy of LAI of Subcontinental Admixtures with a Data Set Integrating SNP Array Data and Exome and Whole-Genome Sequence Data We inferred ancestry for simulated admixed individuals. Low- and high-confidence call sets were generated with posterior-probability thresholds of 50%, 90%, 99%, 99.9%, or 99.99%. We show the accuracy in the resulting call sets as a function of the proportion of the genome that did not meet the threshold. The American Journal of Human Genetics 2013 93, 278-288DOI: (10.1016/j.ajhg.2013.06.020) Copyright © 2013 The American Society of Human Genetics Terms and Conditions

Figure 5 Integrating Exome and Whole-Genome Sequence Data with SNP Array Data We generated haploid ancestry call sets by using posterior-probability thresholds of 50%, 90%, 99%, 99.9%, or 99.99% and simulated individuals, as in Figure 4, for different ancestry pairs and data sets. The following abbreviations are used: WGS, 1000 Genomes integrated data set; OMNI, 1000 Genomes OMNI 2.5M data set; Affy, the data set composed of the subset of WGS sites present on the Affy 6.0 SNP array. The American Journal of Human Genetics 2013 93, 278-288DOI: (10.1016/j.ajhg.2013.06.020) Copyright © 2013 The American Society of Human Genetics Terms and Conditions

Figure 6 Effect of Reference-Panel Size on Performance of Subcontinental Admixture Deconvolution with the 1000 Genomes Integrated Whole-Genome Call Set We simulated admixed individuals as we did for Figures 4 and 5 and inferred local ancestry by using different reference-panel sizes that correspond to roughly 1/2, 1/4, and 1/8 of the original panel. Error bars represent the SEM. Because the same admixed individuals are used in simulations with different panel sizes, errors for different sample sizes are correlated. The American Journal of Human Genetics 2013 93, 278-288DOI: (10.1016/j.ajhg.2013.06.020) Copyright © 2013 The American Society of Human Genetics Terms and Conditions

Figure 7 Native American Ancestry in African Americans For the HapMap African American genome, the proportion inferred to have Native American ancestry (blue) is compared to the proportion inferred in a simulated population with 0.5% of Native American ancestry (green). For comparison, we estimated false-positive (“FP”) rates on the basis of simulation with and without Native American ancestry. To ensure that the false-positive rates correspond to a realistic situation, we simulated individuals by using segments of CEU, YRI, and NAT ancestry and performed inference by using TSI, YRI, and NAT reference panels. The American Journal of Human Genetics 2013 93, 278-288DOI: (10.1016/j.ajhg.2013.06.020) Copyright © 2013 The American Society of Human Genetics Terms and Conditions