The Coalescent Theory And coalescent- based population genetics programs
Overview Set up IMa run The theory Influence Computer programs IMa tutorial
Set up IMa Run Download data file from Wiki Open terminal Type command: ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b –L0.5 –p 45 Can vary numbers for q’s, t, & m’s
Overview Set up IMa run The theory Influence Computer programs IMa tutorial
COALESCENT THEORY Formalized in 1982 by Kingman in “The Coalescent” Based on main idea of: Retrospective model of population genetics Dependent on ancestral population size and time since divergence
COALESCENT THEORY Formalized in 1982 by Kingman in “The Coalescent” Based on main idea of: Retrospective model of population genetics Dependent on ancestral population size and time since divergence
COALESCENT THEORY Terms: Coalescence: two lineages tracing back to a common ancestor at particular time Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census Theta, Θ: capacity of population to maintain genetic variability (=4N e μ) Incomplete lineage sorting: failure to coalesce
COALESCENT THEORY Terms: Coalescence: two lineages tracing back to a common ancestor at particular time Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census Theta, Θ: capacity of population to maintain genetic variability (=4N e μ) Incomplete lineage sorting: failure to coalesce
Wright Fisher Model Describes genetic drift in finite population Assumptions N diploid organisms Monoecious reproduction with infinite number of gametes Non-overlapping generations Random mating No mutation No selection
COALESCENT THEORY Terms: Coalescence: two lineages tracing back to a common ancestor at particular time Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census Theta, Θ: capacity of population to maintain genetic variability (=4N e μ) Incomplete lineage sorting: failure to coalesce
COALESCENT THEORY Terms: Coalescence: two lineages tracing back to a common ancestor at particular time Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census Theta, Θ: capacity of population to maintain genetic variability (=4N e μ) Incomplete lineage sorting: failure to coalesce
Incomplete Lineage Sorting Degnan & Salter (2005)
COALESCENT THEORY Mathematical expectation of distribution of time back to coalescence Seeks to predict amount of time elapsed between introduction of mutation and arising of particular allele/gene distribution in population
Present Past
Present Past
Present Past
Present Past
Present Past
Present Past
Present Past
Present Past
Present Past
Mathematical Representation Θ= 4N e μ P(Coalescent event) = 1/(2Ne) P c (t) = (1 – (1/2N e )) t-1 (1/(2N e )) E(t k ) = 2/(k(k-1))
Overview Set up IMa run The theory Influence Computer programs IMa tutorial
Influence Population Genetics Phylogenetics Statistical Phylogeography
Population Genetics Theory describes the genealogical relationships among individuals in a Wright-Fisher population
Phylogenetics Gene tree-Species tree Predicts certain distribution of gene tree frequencies
Statistical Phylogeography Individual gene trees contain information about past demographic events when rate of coalescence different between
Overview Set up IMa run The theory Influence Computer programs IMa tutorial
Computer Programs Kuhner, 2008 BEAST GENETREE LAMARC MIGRATE-N IM/IMa IMa2
Computer Programs Kuhner, 2008 BEAST GENETREE LAMARC MIGRATE-N IM/IMa IMa2
Computer Programs Kuhner, 2008 BEAST GENETREE LAMARC MIGRATE-N IM/IMa IMa2
Computer Programs Kuhner, 2008 BEAST GENETREE LAMARC MIGRATE-N IM/Ima IMa2
Computer Programs Kuhner, 2008 BEAST GENETREE LAMARC MIGRATE-N IM/IMa IMa2
Computer Programs Kuhner, 2008 BEAST GENETREE LAMARC MIGRATE-N IM/IMa IMa2
Computer Programs Coalescent Simulators Approximate Bayesian Computation DIY-ABC PopABC Simulation (Using “Pipeline” Approach) GENOME COAL CoaSim
Computer Programs Coalescent Simulators Approximate Bayesian Computation DIY-ABC PopABC Simulation (Using “Pipeline” Approach) GENOME COAL CoaSim
Computer Programs Coalescent Simulators Approximate Bayesian Computation DIY-ABC PopABC Simulation (Using “Pipeline” Approach) GENOME COAL CoaSim
Computer Programs Coalescent Simulators Approximate Bayesian Computation DIY-ABC PopABC Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim
Computer Programs Coalescent Simulators Approximate Bayesian Computation DIY-ABC PopABC Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim
Computer Programs Coalescent Simulators Approximate Bayesian Computation DIY-ABC PopABC Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim
Computer Programs Coalescent Simulators Approximate Bayesian Computation DIY-ABC PopABC Simulation (Using “Pipeline” Approach) GENOME ms COAL CoaSim
Overview Set up IMa run The theory Influence Computer programs IMa tutorial
Introduction MCMC simulation of gene genealogies IM simulates model parameters Hey, J (2006)
Introduction cont’d Assumptions No other populations more closely related Selective neutrality No recombination within loci Free recombination between loci Mutation model chosen is correct Infinite sites Hasegawa-Kishino-Yano Stepwise Compound locus
Input File Example data for IM # im test data population1 population2 3 locus I ( , ) pop1_1 ACTACTGTCATGA pop2_1 AGTACTATCACGA hapstrexample J pop1_ GTAC pop1_ GTAT pop2_ GTAT strexample S ( , ) strpop11a 23 strpop11b 26 strpop21a 25 strpop21b 31
Input File Example data for IM # im test data population1 population2 3 locus I ( , ) pop1_1 ACTACTGTCATGA pop2_1 AGTACTATCACGA hapstrexample J pop1_ GTAC pop1_ GTAT pop2_ GTAT strexample S ( , ) strpop11a 23 strpop11b 26 strpop21a 25 strpop21b 31
Input File Example data for IM # im test data population1 population2 3 locus I ( , ) pop1_1 ACTACTGTCATGA pop2_1 AGTACTATCACGA hapstrexample J pop1_ GTAC pop1_ GTAT pop2_ GTAT strexample S ( , ) strpop11a 23 strpop11b 26 strpop21a 25 strpop21b 31
Command Line (terminal) Command line: ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b –L10000 –p 45
Command Line (terminal) Command line: ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b –L –p 45 More complex run line: ima -i IMaEliurus -o IMaEliurus.out -q1 10 -q2 10 –qA 300 –m 12 –m 23 –t 80 –n 20 –b –L 0.5 –fl –g –p 45
Important Note! Need “IMrun” file which only says “yes” to continue indefinitely (or until it crashes or DSCR kicks the job)
Ouput File.out MCMC information Summary Acceptance rates Autocorrelation ESS Chain swapping
Ouput File.out Marginal Peak Marginal distributions Minbin Maxbin HiPt HiSmth Mean 95lo/hi HPD90lo/hi
Ouput File.out ASCII Curves Plots
Ouput File.out.ti No outward information Can be used on subsequent runs when in “L mode”
How can I get a “good” run? Conduct preliminary run Duration? Ideally, once run reaches stationarity and convergence Assess autocorrelation Use Metropolis-coupled MCMC Run many, many times (well, at least 3)
Robustness of Coalescent Violation to assumptions of: Intralocus recombination Population structure Gene flow from unsampled populations Linkage among loci Divergent selection Different model of substitution
Questions?