Presentation is loading. Please wait.

Presentation is loading. Please wait.

How missing data and taxon sampling play the role in Phylogeny reconstruction? A case study on a five-gene dataset of Eurotiomycetous endophytic fungi.

Similar presentations


Presentation on theme: "How missing data and taxon sampling play the role in Phylogeny reconstruction? A case study on a five-gene dataset of Eurotiomycetous endophytic fungi."— Presentation transcript:

1 How missing data and taxon sampling play the role in Phylogeny reconstruction? A case study on a five-gene dataset of Eurotiomycetous endophytic fungi Ko-Hsuan Chen Systematic Biology, 2012 Fall

2 Introduction Species tree  from several gene trees Supermatrix  concatenate multiple-gene data  build tree Keep the taxa having incomplete multi-locus or not??

3 Objective See how adding taxa with missing data affect the tree The sensitivity of different methods to missing data

4 Methods Focus on a fungal Class: Eurotiomycetes. --Endophytes: Endophytic/Endo-lichenic fungi (28 taxa) --Reference taxa (38 taxa)

5 Methods 5-loci dataset Phylogenetic tree reconstruction: Likelihood: Raxml Bayesian: Beast

6 5-loci data

7 Different runs: taxa having different number of genes

8 Find model PartitionFinder Find the model by actually start the run 1 st : MCMC 10,000,000 for each genes separately test GTR+G+I 2 nd : observed their behavior in Tracer. 3 rd : Link those genes having similar features

9 Tracer file for mtSSU, nSSU, nLSU/frequency nLSU nSSU mtSSU

10 Settings for Analysis * Bootstrap replicates=1000

11 Beauti/Beast

12 Compare the three separate chains to see if they converge

13

14 Bootstrap probability/posterior probability response to missing data Percentage of node Taxa having different loci number included in the analysis

15 Bootstrap probability/posterior probability response to missing data Taxa having different loci number included in the analysis Percentage of node

16 Conclusion Missing data does not change the topology much When BP is not high, sometimes PP tend to be very high Missing data has a stronger effect on Bootstrap value than posterior probability (Be careful about posterior probability….)

17 Model/partition setting: compare with the result of PartitionFinder Choose the best one based on BIC: (LSU) (mtSSU) (nSSU) (MCM7_pos1, RPB1_pos1) (MCM7_pos2, RPB1_pos2) (MCM7_pos3) (RPB1_pos3)

18 Model/partition setting: compare with the result of PartitionFinder In beast, I set: (LSU) (mtSSU) (nSSU) (MCM7_pos1) (RPB1_pos1) (MCM7_pos2) RPB1_pos2) (MCM7_pos3) (RPB1_pos3) (LSU) (mtSSU) (nSSU) (MCM7_pos1, RPB1_pos1) (MCM7_pos2, RPB1_pos2) (MCM7_pos3) (RPB1_pos3) Never converge……

19 Comparison between my result and after change setting according to Partition finder Raxml The result is similar!

20 For beast: how about relaxed rate? Change the setting of “strick clock ” to Lognormal

21 Relaxed rate result

22

23 Supertree method Input all the Raxml tree for EVERY gene, CLANN do the search for possible tree space. Fail to test the tree………..

24 Compare Raxml tree (Supermatrix method) with Clann tree (Supertree method)

25 Thank you for your attention !


Download ppt "How missing data and taxon sampling play the role in Phylogeny reconstruction? A case study on a five-gene dataset of Eurotiomycetous endophytic fungi."

Similar presentations


Ads by Google