Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Coalescent Theory And coalescent- based population genetics programs.

Similar presentations


Presentation on theme: "The Coalescent Theory And coalescent- based population genetics programs."— Presentation transcript:

1 The Coalescent Theory And coalescent- based population genetics programs

2 Overview  Set up IMa run  The theory  Influence  Computer programs  IMa tutorial

3 Set up IMa Run  Download data file from Wiki  Open terminal  Type command: ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b 100000 –L0.5 –p 45  Can vary numbers for q’s, t, & m’s

4 Overview  Set up IMa run  The theory  Influence  Computer programs  IMa tutorial

5 COALESCENT THEORY  Formalized in 1982 by Kingman in “The Coalescent”  Based on main idea of:  Retrospective model of population genetics  Dependent on ancestral population size and time since divergence

6

7

8

9

10 COALESCENT THEORY  Formalized in 1982 by Kingman in “The Coalescent”  Based on main idea of:  Retrospective model of population genetics  Dependent on ancestral population size and time since divergence

11 COALESCENT THEORY  Terms:  Coalescence: two lineages tracing back to a common ancestor at particular time  Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census  Theta, Θ: capacity of population to maintain genetic variability (=4N e μ)  Incomplete lineage sorting: failure to coalesce

12 COALESCENT THEORY  Terms:  Coalescence: two lineages tracing back to a common ancestor at particular time  Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census  Theta, Θ: capacity of population to maintain genetic variability (=4N e μ)  Incomplete lineage sorting: failure to coalesce

13 Wright Fisher Model  Describes genetic drift in finite population  Assumptions  N diploid organisms  Monoecious reproduction with infinite number of gametes  Non-overlapping generations  Random mating  No mutation  No selection

14 COALESCENT THEORY  Terms:  Coalescence: two lineages tracing back to a common ancestor at particular time  Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census  Theta, Θ: capacity of population to maintain genetic variability (=4N e μ)  Incomplete lineage sorting: failure to coalesce

15 COALESCENT THEORY  Terms:  Coalescence: two lineages tracing back to a common ancestor at particular time  Effective Population Size (N e ): size of Wright- Fisher population; usually smaller than census  Theta, Θ: capacity of population to maintain genetic variability (=4N e μ)  Incomplete lineage sorting: failure to coalesce

16 Incomplete Lineage Sorting Degnan & Salter (2005)

17 COALESCENT THEORY  Mathematical expectation of distribution of time back to coalescence  Seeks to predict amount of time elapsed between introduction of mutation and arising of particular allele/gene distribution in population

18 Present Past

19 Present Past

20 Present Past

21 Present Past

22 Present Past

23 Present Past

24 Present Past

25 Present Past

26 Present Past

27 Mathematical Representation  Θ= 4N e μ  P(Coalescent event) = 1/(2Ne)  P c (t) = (1 – (1/2N e )) t-1 (1/(2N e ))  E(t k ) = 2/(k(k-1))

28 Overview  Set up IMa run  The theory  Influence  Computer programs  IMa tutorial

29 Influence  Population Genetics  Phylogenetics  Statistical Phylogeography

30 Population Genetics  Theory describes the genealogical relationships among individuals in a Wright-Fisher population

31 Phylogenetics  Gene tree-Species tree  Predicts certain distribution of gene tree frequencies

32 Statistical Phylogeography  Individual gene trees contain information about past demographic events when rate of coalescence different between

33 Overview  Set up IMa run  The theory  Influence  Computer programs  IMa tutorial

34 Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/IMa  IMa2

35 Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/IMa  IMa2

36 Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/IMa  IMa2

37 Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/Ima  IMa2

38 Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/IMa  IMa2

39 Computer Programs  Kuhner, 2008  BEAST  GENETREE  LAMARC  MIGRATE-N  IM/IMa  IMa2

40 Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  COAL  CoaSim

41 Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  COAL  CoaSim

42 Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  COAL  CoaSim

43 Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  ms  COAL  CoaSim

44 Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  ms  COAL  CoaSim

45 Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  ms  COAL  CoaSim

46 Computer Programs  Coalescent Simulators  Approximate Bayesian Computation  DIY-ABC  PopABC  Simulation (Using “Pipeline” Approach)  GENOME  ms  COAL  CoaSim

47 Overview  Set up IMa run  The theory  Influence  Computer programs  IMa tutorial

48 Introduction  MCMC simulation of gene genealogies  IM simulates model parameters Hey, J (2006)

49 Introduction cont’d  Assumptions  No other populations more closely related  Selective neutrality  No recombination within loci  Free recombination between loci  Mutation model chosen is correct  Infinite sites  Hasegawa-Kishino-Yano  Stepwise  Compound locus

50 Input File Example data for IM # im test data population1 population2 3 locus1 1 1 13 I 1 0.0000000008 (0.0000000001, 0.0000000015) pop1_1 ACTACTGTCATGA pop2_1 AGTACTATCACGA hapstrexample 2 1 4 J2 0.75 pop1_1 13 34 GTAC pop1_2 12 35 GTAT pop2_1 12 37 GTAT strexample 2 2 1 S1 1 0.00001 (0.000001, 0.00005) strpop11a 23 strpop11b 26 strpop21a 25 strpop21b 31

51 Input File Example data for IM # im test data population1 population2 3 locus1 1 1 13 I 1 0.0000000008 (0.0000000001, 0.0000000015) pop1_1 ACTACTGTCATGA pop2_1 AGTACTATCACGA hapstrexample 2 1 4 J2 0.75 pop1_1 13 34 GTAC pop1_2 12 35 GTAT pop2_1 12 37 GTAT strexample 2 2 1 S1 1 0.00001 (0.000001, 0.00005) strpop11a 23 strpop11b 26 strpop21a 25 strpop21b 31

52 Input File Example data for IM # im test data population1 population2 3 locus1 1 1 13 I 1 0.0000000008 (0.0000000001, 0.0000000015) pop1_1 ACTACTGTCATGA pop2_1 AGTACTATCACGA hapstrexample 2 1 4 J2 0.75 pop1_1 13 34 GTAC pop1_2 12 35 GTAT pop2_1 12 37 GTAT strexample 2 2 1 S1 1 0.00001 (0.000001, 0.00005) strpop11a 23 strpop11b 26 strpop21a 25 strpop21b 31

53 Command Line (terminal) Command line: ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b 100000 –L10000 –p 45

54 Command Line (terminal) Command line: ima -i IMaEliurus -o IMaEliurus.out -q1 10 –q2 20 -m1 1 -m2 1 –n 10 -t 80 -b 100000 –L100000 –p 45 More complex run line: ima -i IMaEliurus -o IMaEliurus.out -q1 10 -q2 10 –qA 300 –m 12 –m 23 –t 80 –n 20 –b 100000 –L 0.5 –fl –g1 0.01 –p 45

55 Important Note!  Need “IMrun” file which only says “yes” to continue indefinitely (or until it crashes or DSCR kicks the job)

56 Ouput File.out  MCMC information  Summary  Acceptance rates  Autocorrelation  ESS  Chain swapping

57 Ouput File.out  Marginal Peak  Marginal distributions  Minbin  Maxbin  HiPt  HiSmth  Mean  95lo/hi  HPD90lo/hi

58 Ouput File.out  ASCII  Curves  Plots

59 Ouput File.out.ti  No outward information  Can be used on subsequent runs when in “L mode”

60 How can I get a “good” run?  Conduct preliminary run  Duration?  Ideally, once run reaches stationarity and convergence  Assess autocorrelation  Use Metropolis-coupled MCMC  Run many, many times (well, at least 3)

61 Robustness of Coalescent  Violation to assumptions of:  Intralocus recombination  Population structure  Gene flow from unsampled populations  Linkage among loci  Divergent selection  Different model of substitution

62

63 Questions?


Download ppt "The Coalescent Theory And coalescent- based population genetics programs."

Similar presentations


Ads by Google