MStruct: A New Admixture Model for Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations Suyash Shringarpure and Eric.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Lab 3 : Exact tests and Measuring of Genetic Variation.
Lab 3 : Exact tests and Measuring Genetic Variation.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Evaluation of a new tool for use in association mapping Structure Reinhard Simon, 2002/10/29.
1 Lesson 25 – Evolutionary Processes Life Science.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Population genetics Bio341 Mutation - the ultimate source of all genetic variation Recombination - shuffles existing alleles Selection - different alleles.
Patterns of population structure and admixture among human populations Katarzyna Bryc OEB 275br February 19, 2013.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Admixture Mapping Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Computational Statistical Genetics.
MStruct: Structure under mutations Suyash Shringarpure and Eric Xing Carnegie Mellon University mStruct: Inference of population structure in the presence.
From population genetics to variation among species: Computing the rate of fixations.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Procedures in RFLP. RFLP analysis can detect Point mutations Length mutations Inversions.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
A Primer on Genetic Variation Variety Lawrence Brody - NHGRI.
Population Genetics: Chapter 3 Epidemiology 217 January 16, 2011.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Population Genetics: SNPS Haplotype Inference Eric Xing Lecture.
Announcements: Proposal resubmission deadline 4/23 (Thursday).
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Quantitative Genetics
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Randomized Algorithms for Bayesian Hierarchical Clustering
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
Ch. 20 – Mechanisms of Evolution 20.1 – Population Genetics macro-evolution – evolution on a large scale, such as the evolution of new species from a common.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.
Imputation-based local ancestry inference in admixed populations
Populations: defining and identifying. Two major paradigms for defining populations Ecological paradigm A group of individuals of the same species that.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Chapter 23: Evaluation of the Strength of Forensic DNA Profiling Results.
Identifying Ethnic Origins with A Prototype Classification Method Fu Chang Institute of Information Science Academia Sinica ext. 1819
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Chapter 2: Bayesian hierarchical models in geographical genetics Manda Sayler.
Modelling evolution Gil McVean Department of Statistics TC A G.
Modern Evolutionary Biology I. Population Genetics A. Overview Sources of VariationAgents of Change MutationN.S. Recombinationmutation - crossing over.
From: Cost of Antibiotic Resistance and the Geometry of Adaptation
Gil McVean Department of Statistics
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Variational Bayes Model Selection for Mixture Distribution
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Haplotype Reconstruction
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry  Oscar.
John Wakeley, Rasmus Nielsen, Shau Neen Liu-Cordero, Kristin Ardlie 
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Goals: To identify subpopulations (subsets of the sample with distinct allele frequencies) To assign individuals (probabilistically) to subpopulations.
Brian P. McEvoy, Joanne M. Lind, Eric T. Wang, Robert K
Genotyping Results Each person was typed for 3 unlinked Short Tandem Repeat loci (STR) vWFII – chromosome 12, intron 40 of the vWF gene UT 2203 – chromosome.
Brian P. McEvoy, Joanne M. Lind, Eric T. Wang, Robert K
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

mStruct: A New Admixture Model for Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations Suyash Shringarpure and Eric Xing School of Computer Science Carnegie Mellon University ICML 2008 Presented by Haojun Chen

Outline Background Background Structure Model Structure Model mStruct Model mStruct Model Experiment Results Experiment Results Summary Summary

Background Allele: one member of a pair or series of different forms of a gene Allele: one member of a pair or series of different forms of a gene Population structure analysis aim to shed light on evolutionary history of modern human population Population structure analysis aim to shed light on evolutionary history of modern human population Microsatellites and single nucleotide polymorphisms (SNP) data: base of population structure analysis Microsatellites and single nucleotide polymorphisms (SNP) data: base of population structure analysis State-of-the-art method: Structure State-of-the-art method: Structure

Structure Model x: Microsatellite alleles x: Microsatellite alleles : unique set : unique set of population-specific multinomial of population-specific multinomial distributions distributions : vector of : vector of multinomial parameters, a.k.a., allele multinomial parameters, a.k.a., allele frequency profile (AP), of the allele frequency profile (AP), of the allele distribution at locus i in ancestral distribution at locus i in ancestral population k population k : total number of observed marker : total number of observed marker alleles at locus I alleles at locus I : total number of marker loci : total number of marker loci : total number of individuals : total number of individuals : individual-specific admixing : individual-specific admixing coefficient vector coefficient vector

Pitfall of Structure There is no mutation model for modern individual alleles with respect to common prototypes in the modern populations There is no mutation model for modern individual alleles with respect to common prototypes in the modern populations Every unique allele in the modern population is assumed to have a distinct ancestral frequency, rather than allowing the possibility of it just being a descendent of some common ancestral allele Every unique allele in the modern population is assumed to have a distinct ancestral frequency, rather than allowing the possibility of it just being a descendent of some common ancestral allele

mStruct Model : set of ancestral alleles : mutation parameter associated with locus : frequencies of the ancestral alleles : total number of ancestral alleles Microsatellite mutation model SNP mutation model

Generative process for Structure Generative process for Structure where where Generative process for mStruct Generative process for mStruct step 2.2 above is replaced by step 2.2 above is replaced by Generative Process

mStruct Model Inference MCMC: slow MCMC: slow Variational inference for hidden variable Variational inference for hidden variable variational EM for hyperparameter variational EM for hyperparameter

Synthetic Data Twenty microsatellite genotype datasets with 100 individuals from 3 ancestral populations at 50 genotype loci

HGDP Microsatellite Data Model selection by BIC (Bayesian Information Criterion) score Model selection by BIC (Bayesian Information Criterion) score

HGDP Microsatellite Data am-spectrum: spectrums of different ancestral populations gm-spectrum: spectrums of different geographical populations 1056 individuals from 52 populations at 377 autosomal microsatellite loci

Contour of Mutation Rates

Summary mStruct takes into account genetic admixture and allele mutation effects mStruct takes into account genetic admixture and allele mutation effects mStruct: extended LDA which allows noisy observations mStruct: extended LDA which allows noisy observations Variational inference algorithm that allows tractable inference developed for mStruct Variational inference algorithm that allows tractable inference developed for mStruct Other application: images, text and so on Other application: images, text and so on