Presentation is loading. Please wait.

Presentation is loading. Please wait.

Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.

Similar presentations


Presentation on theme: "Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy."— Presentation transcript:

1 Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy for time since divergence between the two taxa. Differences accumulate linearly with time for only a very shot time after two taxa diverge 2 pairs taxa that have different divergence times, may have a similar number of differences between them – saturation.

2 Multiple Hits. 3 So even though there have been 4 substitutions, when we compare these two lineages, we only can detect 3 differences. Models of sequence evolution expect multiple hits.

3 Branch-length Information Let’s assume a true tree. A ATCGAGCAGCCTGGGAGAGAGACTTATTTGACAAACGT AA B ATTGGGGAGTAGCGTAAACACTCTTATTTGACGAAATTA T C ATCGTGGGTTAGAGTAGAGACTCTCATTTGACGAAATTA T D AACGTGGCGAATAGTAGTCAAAAAATGTGTACCAGATTA C Increase # replicates – keeps happening. Increase # bp – happens with certainty. This tree is 37 steps, and the true tree is 38 steps.

4 Branch-length Information Now let’s subject the sequences to an ME search. First, we need to convert the character by taxon matrix to a matrix of pairwise distances: A B C D A -------0.400 0.400 0.575 B 0.572-------- 0.200 0.525 C 0.5720.232------- 0.475 D 1.0910.903 0.752 -------

5 Branch-length Information So in the simulation, the methods that are agnostic with respect to multiple hits (MP and ME using p-distances) incorrectly unite the long-branch taxa (A & D). ML can avoid long-branch attraction Optimum tree is the true tree. We’re getting pretty lousy estimates of branch lengths – under these conditions, branch-length estimates would converge on true values with more data.

6 Long-branch attraction Let’s assume that there is an A in both the short-branch taxa. There are four possibilities for states at the other two terminals. 1) The long-branch taxa could have A 2 & 3) One of the long-branch taxa has a substitution to nucleotide X ( = G, C, or T) 12 4) There could be substitutions to X 1 and X 2 along both long branches. If X 1 ≠ X 2, the site is uninformative. If X 1 = X 2, the site is misleading. 1 2

7 The Importance of Branch Lengths {A} {A,C,G} Large # of terminals with A, this is a slowly evolving site. C at node 2, transversion to A along short branch , no change along  and change to G along . G at node 2, transition to A along short branch , no change along  and change to C along . A at node 2, no change along short branch , a change to C along  and change to G along . Fitch Optimization

8 The Importance of Branch Lengths {A} {A,C,G} ML can voice a preference here, where parsimony can’t. This is because ML accounts for branch lengths in calculating reconstruction probabilities. No change along a short branch and changes along both long branches is more likely than a change along the short branch coupled with no change along one of the long branches. All reconstructions are permitted and accounted for, but the reconstruction with A at node 2 (& node 1) contributes the most to the single-site likelihood.


Download ppt "Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy."

Similar presentations


Ads by Google