Download presentation
Presentation is loading. Please wait.
Published byChristine Quinn Modified over 9 years ago
1
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy for time since divergence between the two taxa. Differences accumulate linearly with time for only a very shot time after two taxa diverge 2 pairs taxa that have different divergence times, may have a similar number of differences between them – saturation.
2
Multiple Hits. 3 So even though there have been 4 substitutions, when we compare these two lineages, we only can detect 3 differences. Models of sequence evolution expect multiple hits.
3
Branch-length Information Let’s assume a true tree. A ATCGAGCAGCCTGGGAGAGAGACTTATTTGACAAACGT AA B ATTGGGGAGTAGCGTAAACACTCTTATTTGACGAAATTA T C ATCGTGGGTTAGAGTAGAGACTCTCATTTGACGAAATTA T D AACGTGGCGAATAGTAGTCAAAAAATGTGTACCAGATTA C Increase # replicates – keeps happening. Increase # bp – happens with certainty. This tree is 37 steps, and the true tree is 38 steps.
4
Branch-length Information Now let’s subject the sequences to an ME search. First, we need to convert the character by taxon matrix to a matrix of pairwise distances: A B C D A -------0.400 0.400 0.575 B 0.572-------- 0.200 0.525 C 0.5720.232------- 0.475 D 1.0910.903 0.752 -------
5
Branch-length Information So in the simulation, the methods that are agnostic with respect to multiple hits (MP and ME using p-distances) incorrectly unite the long-branch taxa (A & D). ML can avoid long-branch attraction Optimum tree is the true tree. We’re getting pretty lousy estimates of branch lengths – under these conditions, branch-length estimates would converge on true values with more data.
6
Long-branch attraction Let’s assume that there is an A in both the short-branch taxa. There are four possibilities for states at the other two terminals. 1) The long-branch taxa could have A 2 & 3) One of the long-branch taxa has a substitution to nucleotide X ( = G, C, or T) 12 4) There could be substitutions to X 1 and X 2 along both long branches. If X 1 ≠ X 2, the site is uninformative. If X 1 = X 2, the site is misleading. 1 2
7
The Importance of Branch Lengths {A} {A,C,G} Large # of terminals with A, this is a slowly evolving site. C at node 2, transversion to A along short branch , no change along and change to G along . G at node 2, transition to A along short branch , no change along and change to C along . A at node 2, no change along short branch , a change to C along and change to G along . Fitch Optimization
8
The Importance of Branch Lengths {A} {A,C,G} ML can voice a preference here, where parsimony can’t. This is because ML accounts for branch lengths in calculating reconstruction probabilities. No change along a short branch and changes along both long branches is more likely than a change along the short branch coupled with no change along one of the long branches. All reconstructions are permitted and accounted for, but the reconstruction with A at node 2 (& node 1) contributes the most to the single-site likelihood.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.