Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Slides:



Advertisements
Similar presentations
Generalized Regional Admixture Mapping (RAM) and Structured Association Testing (SAT) David T. Redden, Associate Professor, Department of Biostatistics,
Advertisements

Admixture in Horse Breeds Illustrated from Single Nucleotide Polymorphism Data César Torres, Yaniv Brandvain University of Minnesota, Department of Plant.
Lab 3 : Exact tests and Measuring Genetic Variation.
How many movies do you watch? Does the CLT apply to means?  The Central Limit Theorem states that the more samples you take, the more Normal your.
Sampling distributions of alleles under models of neutral evolution.
Modeling Populations forces that act on allelic frequencies.
BIOE 109 Summer 2009 Lecture 5- Part I Hardy- Weinberg Equilibrium.
AN INTRODUCTION TO RECOMBINATION AND LINKAGE ANALYSIS Mary Sara McPeek Presented by: Yue Wang and Zheng Yin 11/25/2002.
BMI 731- Winter 2005 Chapter1: SNP Analysis Catalin Barbacioru Department of Biomedical Informatics Ohio State University.
Section 3 Characterizing Genetic Diversity: Single Loci Gene with 2 alleles designated “A” and “a”. Three genotypes: AA, Aa, aa Population of 100 individuals.
MALD Mapping by Admixture Linkage Disequilibrium.
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Admixture Mapping Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Computational Statistical Genetics.
Tutorial #4 by Ma’ayan Fishelson Changes made by Anna Tzemach.
Lecture 5 Artificial Selection R = h 2 S. Applications of Artificial Selection Applications in agriculture and forestry Creation of model systems of human.
The infinitesimal model and its extensions. Selection (and drift) compromise predictions of selection response by changing allele frequencies and generating.
Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF.
Modeling evolutionary genetics Jason Wolf Department of ecology and evolutionary biology University of Tennessee.
Chapter 19 Human Heredity by Michael Cummings ©2006 Brooks/Cole-Thomson Learning Chapter 19 Population Genetics.
Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.
Lecture 2: Basic Population and Quantitative Genetics.
Forensic Statistics From the ground up…. Basics Interpretation Hardy-Weinberg equations Random Match Probability Likelihood Ratio Substructure.
Chapter 4 Heredity and Evolution. Hybrids Offspring of mixed ancestry; heterozygotes. Principle of segregation Genes (alleles) occur in pairs (because.
Process of Evolution Chapter 18 Mader: Biology 8th Ed.
Population Genetic Hardy-Wienberg Law Genetic drift Inbreeding Genetic Bottleneck Outbreeding Founder event Effective population size Gene flow.
Course outline Evolution: When violations in H-W assumptions cause changes in the genetic composition of a population Population Structure: When violations.
Microevolution  Look at processes by which inherited traits change over time  Changes in numbers & types of alleles  Measured in terms of frequency.
Genetic Drift Random change in allele frequency –Just by chance or chance events (migrations, natural disasters, etc) Most effect on smaller populations.
Terms: Population: Group of interbreeding or potentially interbreeding organisms Population Genetics: Branch of genetics that studies the genetic makeup.
Chapter 10 Inferences from Two Samples
AP STATISTICS LESSON COMPARING TWO PROPORTIONS.
Course outline HWE: What happens when Hardy- Weinberg assumptions are met Inheritance: Multiple alleles in a population; Transmission of alleles in a family.
Lecture 21 Based on Chapter 21 Population Genetics Copyright © 2010 Pearson Education Inc.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate.
Directional selection When individuals at one end of the curve have a higher fitness than individuals in the middle or at the other end.
A change in allele frequency. Q: How do scientists know when this occurs?  A: They compare it to a non-changing population  = Ideal population (like.
Statistical weights of single source DNA profiles Forensic Bioinformatics ( Dan E. Krane, Wright State University, Dayton, OH Forensic.
Effect of the Reference Set on Frequency Inference Donald A. Pierce Radiation Effects Research Foundation, Japan Ruggero Bellio Udine University, Italy.
Populations: defining and identifying. Two major paradigms for defining populations Ecological paradigm A group of individuals of the same species that.
Lab 7. Estimating Population Structure
1 Math 4030 – 10b Inferences Concerning Proportions.
Chapter 23: Evaluation of the Strength of Forensic DNA Profiling Results.
Exercise 1 DNA identification. To which population an individual belongs? Two populations of lab-mice have been accidentally put in a same cage. Your.
Chapter 2: Bayesian hierarchical models in geographical genetics Manda Sayler.
Population Genetics I. Basic Principles. Population Genetics I. Basic Principles A. Definitions: - Population: a group of interbreeding organisms that.
CHAPTER 13 How Populations Evolve
Ancient Wolf Genome Reveals an Early Divergence of Domestic Dog Ancestors and Admixture into High-Latitude Breeds  Pontus Skoglund, Erik Ersmark, Eleftheria.
HARDY WEINBERG.
Genetic Drift: Chance Change A common misconception about evolution is that the features of organisms have evolved due to random chance alone Random cause.
Daniel Falush, Dan Lawson, Lucy van Dorp
Since everything is a reflection of our minds,
Genetic Variation Within Populations
The ‘V’ in the Tajima D equation is:
Basic concepts on population genetics
Speciation: Down the bottleneck?
Lecture 2: Basic Population Genetics
Genetic Drift, followed by selection can cause linkage disequilibrium
Lecture: Natural Selection and Genetic Drift and Genetic Equilibrium
Modern Evolutionary Biology I. Population Genetics
The Genetic Legacy of the Indian Ocean Slave Trade: Recent Admixture and Post- admixture Selection in the Makranis of Pakistan  Romuald Laso-Jadart, Christine.
Testing for Selective Neutrality
Goals: To identify subpopulations (subsets of the sample with distinct allele frequencies) To assign individuals (probabilistically) to subpopulations.
Dr. Xijiang Yu Shandong Agricultural University
LEARNING OBJECTIVE 3. Differentiate between different types of Natural Selection and how they impact Evolution.
Haplotypes at ATM Identify Coding-Sequence Variation and Indicate a Region of Extensive Linkage Disequilibrium  Penelope E. Bonnen, Michael D. Story,
James A. Lautenberger, J. Claiborne Stephens, Stephen J
Human Population Genetic Structure and Inference of Group Membership
Yu Zhang, Tianhua Niu, Jun S. Liu 
Ancient Wolf Genome Reveals an Early Divergence of Domestic Dog Ancestors and Admixture into High-Latitude Breeds  Pontus Skoglund, Erik Ersmark, Eleftheria.
Presentation transcript:

Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang

P0P0 P1P1 P2P2 PhPh P1P1 P2P2 n1n1 n2n2 PhPh p1p1 p2p2 NhNh N2N2 N1N1 ShSh S1S1 S2S2 ξ ψ t 1 = ξ/2n 1 t 2 = ξ/2n 2 T 1 = ψ/2N 1 T h = ψ/2N h T 2 = ψ/2N 2 Ω = {p 1, t 1,t 2,T 1,T h,T 2 }

P0P0 P1P1 P2P2 PhPh P1P1 P2P2 n1n1 n2n2 PhPh p1p1 p2p2 NhNh N2N2 N1N1 ShSh S1S1 S2S2 ξ ψ t 1 = ξ/2n 1 t 2 = ξ/2n 2 T 1 = ψ/2N 1 T h = ψ/2N h T 2 = ψ/2N 2 Ω = {p 1, t 1,t 2,T 1,T h,T 2 } w c1c1 chch c2c2 x1x1 xhxh x2x2 y1y1 yhyh y2y2 C = (c 1,c 2,c 3 )

Likelihood function

Random sampling Admixture and genetic drift Genetic drift Prior on w

Allele frequencies in P 0 P0P0 w

Genetic drift after population split P0P0 P1P1 P2P2 n1n1 n2n2 ξ w x1x1 x2x2 t 1 = ξ/2n 1 t 2 = ξ/2n 2

Genetic drift in independent populations

Genetic drift: the diffusion approximation t i = ξ/2n i Crow and Kimura (1970) p. 382

P0P0 P1P1 P2P2 x1x1 PhPh p1p1 p2p2 xhxh x2x2 The admixture event

P0P0 P1P1 P2P2 PhPh P1P1 P2P2 PhPh NhNh N2N2 N1N1 ψ T 1 = ψ/2N 1 T h = ψ/2N h T 2 = ψ/2N 2 x1x1 xhxh x2x2 y1y1 yhyh y2y2 Genetic drift since admixture event

PhPh P1P1 P2P2 ShSh S1S1 S2S2 c1c1 chch c2c2 y1y1 yhyh y2y2 C = (c 1,c 2,c 3 ) Random sampling

Likelihood function Random sampling Admixture and genetic drift Genetic drift Prior on w

African-American Admixture Proportions

Profile log-likelihoods for New York Proportion of European ancestry Drift before admixture event Drift since admixture event

Application to canid populations: Grey wolf and coyote in North America

Common Ancestor Grey WolfCoyote Wolf- like Hybrid Grey WolfCoyote Coyote- like Hybrid

Discussion Suitable data Assumptions of the method given the model Comparing the model to other scenarios Aspects of the data used for inference

Discussion Suitable data Human data Genotypes of 10 nuclear loci. Chosen because they are either African or European specific or highly differentiated between the two. Canid data 10 microsatellite loci. Neither species-specific nor highly differentiated between wolves and coyotes.

Discussion Assumptions of method given the model Alleles are inherited independently across loci in the admixture event Drift acts independently on alleles across loci Alleles in a sampled individual are independent across loci

Discussion Assumptions of method given the model The prior distribution on w is flat, not U- shaped Admixture occurs instantaneously The effect of mutation on perturbing allele frequency is negligible

Discussion Comparing the model to other scenarios Modern ‘pure’ populations need to be sampled Thus the ‘structure’ of the population is assumed to be known If we cannot sample modern ‘pure’ populations assumes we cannot make inference on the admixture proportions

Discussion Aspects of the data used for inference Inference proceeds solely on the basis of allele frequencies Linkage disequilibrium is  Firstly, not used for inference  Secondly, assumed to be negligible LD might be exploited  Enhance inference when modern ‘pure’ populations are sampled  Relax the necessity to sample modern ‘pure’ populations at all