Download presentation
Presentation is loading. Please wait.
Published byBrittney Charity King Modified over 9 years ago
1
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance - Simulation. Enormous advantage that a large number of replicates can be examined, and this allows us to account for stochasticity prospective simulations, in which a set of conditions is specified a priori, and this defines the conditions under which data are simulated. retrospective simulations, in which simulation conditions are defined by analysis of a particular data set that’s relevant to some question.
2
Methods of assessing performance - Simulation. Huelsenbeck & Hillis (1993. Syst. Biol., 42:247) led to a host of prospective simulation studies that have tremendously advanced our understanding of the conditions across which phylogenetic estimation methods perform well. Prospective simulations have been very important.
3
Tateno et al. (1994. Mol. Biol. Evol., 11:261-277) simulated data under an F84+ model of sequence evolution They then compared how well NJ on -corrected distances did at estimating the tree with ML under an equal-rates model. Of course, NJ on -corrected distances performed better than ML with an equal-rates model, but this is not an appropriate comparison. Another weakness is that we must use relatively simple models to simulate the data, and this may compromise the generality of our results. Methods of assessing performance - Simulation. Danger – Easy to stack the deck.
4
Methods of assessing performance - Congruence. Use of well-corroborated phylogenies. “Trees of natural taxa, well supported by many independent lines of evidence, should be used in the same way as the known phylogenies of simulations and of certain laboratory and domesticated groups, i.e., as standards for evaluating the accuracy of different phylogenetic methods.” (Miyamoto & Fitch. 1995. Syst. Biol. 44:64) The advantage is that the data have been produced by the actual complex evolutionary process that has led to the diversity of the group being used, circumventing the weakness of simulation. There are several weaknesses, though. The history of the group can’t be manipulated to explore different combinations of branch lengths and properties of the data. Replication is non-existent. Assumes gene tree equals species tree (coalescent stochasticity is ignored as is HGT/hybridization).
5
Methods of assessing performance Experimental phylogenies. Sequences evolve via natural and the tree topology can be anything the investigator chooses. We can store ancestors and access the ancestral character states directly. A B C D E F G H A B C D E F G H Subject A & D to similar selection Bull et al. (1997. Genetics. 147:1497)
6
Criteria - Consistency A statistically consistent estimator is one that converges to the true value of the parameter being estimated as the amount of data increases. Sequence Length Prob. Correct So under conditions simulated here (FZ tree and GTR+I+ ), 3 of the 9 methods are inconsistent. Average gene length
7
Criteria - Efficiency Sequence Length Prob. Correct In the figure above, estimation with GTR+ is more efficient than JC+ . All are consistent, but MP and ML with ER models are most efficient. How many data are required to get the right answer? True model
8
Criteria - Robustness Sequence Length Prob. Correct How sensitive is method to violation of assumptions? Sequences simulated with GTR+I+ , but any model that incorporates ASRV somehow (I, , I+ ) is consistent. ML is robust to violations, as long as something is done to accommodate ASRV.
9
Interaction of Topology and Performance FZ tree Inverse FZ Tree Equal B.L. Tree
10
Efficiency of Parsimony in the Inverse-Felsenstein Zone Swofford et al. (2001. Syst. Biol., 50:525-539) examined the situation in detail. The probability that a state shared by the long branch taxa actually evolved on the internal branch and changed nowhere else represents the probability that any site pattern of the form xxyy is the result of a true synapomorphy. The probability of the site pattern xxyy being seen in the data under any scenario is 0.1172 Thus ca. 97% of the sites that have the pattern xxyy will have experienced multiple hits.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.