Download presentation
Presentation is loading. Please wait.
1
Why Models of Sequence Evolution Matter
Differences accumulate linearly with time for only a very shot time after two taxa diverge 2 pairs taxa that have different divergence times, may have a similar number of differences between them – saturation. Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy for time since divergence between the two taxa.
2
Models of sequence evolution expect multiple hits.
3 So even though there have been 4 substitutions, when we compare these two lineages, we only can detect 3 differences. Models of sequence evolution expect multiple hits.
3
Branch-length Information
Let’s assume a true tree. A ATCGAGCAGCCTGGGAGAGAGACTTATTTGACAAACGTAA B ATTGGGGAGTAGCGTAAACACTCTTATTTGACGAAATTAT C ATCGTGGGTTAGAGTAGAGACTCTCATTTGACGAAATTAT D AACGTGGCGAATAGTAGTCAAAAAATGTGTACCAGATTAC This tree is 37 steps, and the true tree is 38 steps. Increase # replicates – keeps happening. Increase # bp – happens with certainty.
4
Branch-length Information
Now let’s subject the sequences to an ME search. First, we need to convert the character by taxon matrix to a matrix of pairwise distances: above diagonal are p-distances, below are JC distances. A B C D A B C D
5
Branch-length Information
Optimum tree is the true tree. We’re getting pretty lousy estimates of branch lengths – under these conditions, branch-length estimates would converge on true values with more data. So in the simulation, the methods that are agnostic with respect to multiple hits (MP and ME using p-distances) incorrectly unite the long-branch taxa (A & D). ML can avoid long-branch attraction
6
Long-branch attraction
1) The long-branch taxa could have A 2 & 3) One of the long-branch taxa has a substitution to nucleotide X ( = G, C, or T) 1 2 1 2 Let’s assume that there is an A in both the short-branch taxa. There are four possibilities for states at the other two terminals.
7
Long-branch attraction
1 2 If X1 ≠ X2 , the site is uninformative. If X1 = X2 , the site is misleading. 4) There could be substitutions to X1 and X2 along both long branches. X1 X2 C C C G C T X1 X2 G C G G G T X1 X2 T C T G T T So 1/3 of all possibilities results in a convergence.
8
The Importance of Branch Lengths
Fitch Optimization :0 {A,C,G} :1 Large # of terminals with A, this is a slowly evolving site. C at node 2, transversion to A along short branch a, no change along b and change to G along g. G at node 2, transition to A along short branch a, no change along g, and change to C along b. A at node 2, no change along short branch a, a change to C along b, and change to G along g.
9
The Importance of Branch Lengths
{A,C,G} ML can voice a preference here, where parsimony can’t. This is because ML accounts for branch lengths in calculating reconstruction probabilities. No change along a short branch and changes along both long branches is more likely than a change along the short branch coupled with no change along one of the long branches. All reconstructions are permitted and accounted for, but the reconstruction with A at node 2 (& node 1) contributes the most to the single-site likelihood.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.