The Coalescent Theory And coalescent- based population genetics programs.

Slides:



Advertisements
Similar presentations
The multispecies coalescent: implications for inferring species trees
Advertisements

Two-locus systems. Scheme of genotypes genotype Two-locus genotypes Multilocus genotypes genotype.
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
Amorphophallus titanum Largest unbranched inflorescence in the world Monecious and protogynous Carrion flower (fly/beetle pollinated) Indigenous to the.
Sampling distributions of alleles under models of neutral evolution.
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
Lecture 23: Introduction to Coalescence April 7, 2014.
Atelier INSERM – La Londe Les Maures – Mai 2004
Study of Microevolution
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
From population genetics to variation among species: Computing the rate of fixations.
2: Population genetics break.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Modeling evolutionary genetics Jason Wolf Department of ecology and evolutionary biology University of Tennessee.
KEY CONCEPT A population shares a common gene pool.
Genetic Variability B-5.4. Genetic Variability Genetic variation is random and ensures that each new generation results in individuals with unique gentoypes.
KEY CONCEPT A population shares a common gene pool.
Molecular phylogenetics
Chapter 16 Objectives Section 1 Genetic Equilibrium
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
Lecture 3: population genetics I: mutation and recombination
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
1 Random Genetic Drift 2 Conditions for maintaining Hardy-Weinberg equilibrium: 1. random mating 2. no migration 3. no mutation 4. no selection 5.infinite.
Population genetics. Population genetics concerns the study of genetic variation and change within a population. While for evolving species there is no.
Deviations from HWE I. Mutation II. Migration III. Non-Random Mating IV. Genetic Drift A. Sampling Error.
Genomic diversity and differentiation heading toward exam 3.
The Structure, Function, and Evolution of Vascular Systems Instructor: Van Savage Spring 2010 Quarter 3/30/2010.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Coalescent Models for Genetic Demography
Introduction to History of Life. Biological evolution consists of change in the hereditary characteristics of groups of organisms over the course of generations.
Recombination, Mutation, Genetic Drift, Gene Flow Also evolution Also evolution.
Chapter 16 Table of Contents Section 1 Genetic Equilibrium
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Lecture 17: Phylogenetics and Phylogeography
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Figure 5.1 Giant panda (Ailuropoda melanoleuca)
NEW TOPIC: MOLECULAR EVOLUTION.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Chapter 23 Evolution of Populations Or…To change or not to change, that is a genetic question.
HWE, Speciation, and Population Genetics SI Session.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Amorphophallus titanum
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
Restriction enzyme analysis The new(ish) population genetics Old view New view Allele frequency change looking forward in time; alleles either the same.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
A Little Intro to Statistics What’s the chance of rolling a 6 on a dice? 1/6 What’s the chance of rolling a 3 on a dice? 1/6 Rolling 11 times and not getting.
(last 10 slides from Chapter 16) Chapter 17 Population Genetics and Evolution Jones and Bartlett Publishers © 2005.
Lecture 6 Genetic drift & Mutation Sonja Kujala
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
IMa2(Isolation with Migration)
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
Polymorphism Polymorphism: when two or more alleles at a locus exist in a population at the same time. Nucleotide diversity: P = xixjpij considers.
Why study population genetic structure?
Deviations from HWE I. Mutation II. Migration III. Non-Random Mating
Reminder: Populations
Statistical Modeling of Ancestral Processes
Testing the Neutral Mutation Hypothesis
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
Mechanisms of Evolution
The coalescent with recombination (Chapter 5, Part 1)
GENETIC EQUILIBRIUM II
Evolution by Genetic Drift : Main Points (p. 231)
Dr. Xijiang Yu Shandong Agricultural University
Modern Evolutionary Biology I. Population Genetics
Biological Evolution and Environmental Factors
A population shares a common gene pool.
Evolution by Genetic Drift : Main Points (p. 231)
Presentation transcript:

The Coalescent Theory And coalescent- based population genetics programs

Overview  Set up IMa run  The theory  Influence  Computer programs  IMa tutorial

Set up IMa Run  Download data file from Wiki  Open terminal  Type command: ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b –L0.5 –p 45  Can vary numbers for q’s, t, & m’s

Overview  Set up IMa run  The theory  Influence  Computer programs  IMa tutorial

COALESCENT THEORY  Formalized in 1982 by Kingman in “The Coalescent”  Based on main idea of:  Retrospective model of population genetics  Dependent on ancestral population size and time since divergence

COALESCENT THEORY  Formalized in 1982 by Kingman in “The Coalescent”  Based on main idea of:  Retrospective model of population genetics  Dependent on ancestral population size and time since divergence

COALESCENT THEORY  Terms:  Coalescence: two lineages tracing back to a common ancestor at particular time  Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census  Theta, Θ: capacity of population to maintain genetic variability (=4N e μ)  Incomplete lineage sorting: failure to coalesce

COALESCENT THEORY  Terms:  Coalescence: two lineages tracing back to a common ancestor at particular time  Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census  Theta, Θ: capacity of population to maintain genetic variability (=4N e μ)  Incomplete lineage sorting: failure to coalesce

Wright Fisher Model  Describes genetic drift in finite population  Assumptions  N diploid organisms  Monoecious reproduction with infinite number of gametes  Non-overlapping generations  Random mating  No mutation  No selection

COALESCENT THEORY  Terms:  Coalescence: two lineages tracing back to a common ancestor at particular time  Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census  Theta, Θ: capacity of population to maintain genetic variability (=4N e μ)  Incomplete lineage sorting: failure to coalesce

COALESCENT THEORY  Terms:  Coalescence: two lineages tracing back to a common ancestor at particular time  Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census  Theta, Θ: capacity of population to maintain genetic variability (=4N e μ)  Incomplete lineage sorting: failure to coalesce

Incomplete Lineage Sorting Degnan & Salter (2005)

COALESCENT THEORY  Mathematical expectation of distribution of time back to coalescence  Seeks to predict amount of time elapsed between introduction of mutation and arising of particular allele/gene distribution in population

Present Past

Present Past

Present Past

Present Past

Present Past

Present Past

Present Past

Present Past

Present Past

Mathematical Representation  Θ= 4N e μ  P(Coalescent event) = 1/(2Ne)  P c (t) = (1 – (1/2N e )) t-1 (1/(2N e ))  E(t k ) = 2/(k(k-1))

Overview  Set up IMa run  The theory  Influence  Computer programs  IMa tutorial

Influence  Population Genetics  Phylogenetics  Statistical Phylogeography

Population Genetics  Theory describes the genealogical relationships among individuals in a Wright-Fisher population

Phylogenetics  Gene tree-Species tree  Predicts certain distribution of gene tree frequencies

Statistical Phylogeography  Individual gene trees contain information about past demographic events when rate of coalescence different between

Overview  Set up IMa run  The theory  Influence  Computer programs  IMa tutorial

Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/IMa  IMa2

Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/IMa  IMa2

Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/IMa  IMa2

Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/Ima  IMa2

Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/IMa  IMa2

Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/IMa  IMa2

Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  COAL  CoaSim

Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  COAL  CoaSim

Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  COAL  CoaSim

Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  ms  COAL  CoaSim

Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  ms  COAL  CoaSim

Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  ms  COAL  CoaSim

Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  ms  COAL  CoaSim

Overview  Set up IMa run  The theory  Influence  Computer programs  IMa tutorial

Introduction  MCMC simulation of gene genealogies  IM simulates model parameters Hey, J (2006)

Introduction cont’d  Assumptions  No other populations more closely related  Selective neutrality  No recombination within loci  Free recombination between loci  Mutation model chosen is correct  Infinite sites  Hasegawa-Kishino-Yano  Stepwise  Compound locus

Input File Example data for IM # im test data population1 population2 3 locus I ( , ) pop1_1 ACTACTGTCATGA pop2_1 AGTACTATCACGA hapstrexample J pop1_ GTAC pop1_ GTAT pop2_ GTAT strexample S ( , ) strpop11a 23 strpop11b 26 strpop21a 25 strpop21b 31

Input File Example data for IM # im test data population1 population2 3 locus I ( , ) pop1_1 ACTACTGTCATGA pop2_1 AGTACTATCACGA hapstrexample J pop1_ GTAC pop1_ GTAT pop2_ GTAT strexample S ( , ) strpop11a 23 strpop11b 26 strpop21a 25 strpop21b 31

Input File Example data for IM # im test data population1 population2 3 locus I ( , ) pop1_1 ACTACTGTCATGA pop2_1 AGTACTATCACGA hapstrexample J pop1_ GTAC pop1_ GTAT pop2_ GTAT strexample S ( , ) strpop11a 23 strpop11b 26 strpop21a 25 strpop21b 31

Command Line (terminal) Command line: ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b –L10000 –p 45

Command Line (terminal) Command line: ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b –L –p 45 More complex run line: ima -i IMaEliurus -o IMaEliurus.out -q1 10 -q2 10 –qA 300 –m 12 –m 23 –t 80 –n 20 –b –L 0.5 –fl –g –p 45

Important Note!  Need “IMrun” file which only says “yes” to continue indefinitely (or until it crashes or DSCR kicks the job)

Ouput File.out  MCMC information  Summary  Acceptance rates  Autocorrelation  ESS  Chain swapping

Ouput File.out  Marginal Peak  Marginal distributions  Minbin  Maxbin  HiPt  HiSmth  Mean  95lo/hi  HPD90lo/hi

Ouput File.out  ASCII  Curves  Plots

Ouput File.out.ti  No outward information  Can be used on subsequent runs when in “L mode”

How can I get a “good” run?  Conduct preliminary run  Duration?  Ideally, once run reaches stationarity and convergence  Assess autocorrelation  Use Metropolis-coupled MCMC  Run many, many times (well, at least 3)

Robustness of Coalescent  Violation to assumptions of:  Intralocus recombination  Population structure  Gene flow from unsampled populations  Linkage among loci  Divergent selection  Different model of substitution

Questions?