Presentation is loading. Please wait.

Presentation is loading. Please wait.

Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.1-2.

Similar presentations


Presentation on theme: "Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.1-2."— Presentation transcript:

1 Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.1-2

2 Distances vs. discrete characters This division is based on how the data are treated: Distance methods first convert aligned sequences into a pairwise distance matrix, then input that matrix into a tree building method Discrete methods consider each nucleotide site (or function of each site) separately SitesSequences

3 Distances vs. discrete characters SitesSequences 1 2 3 4 1 2 3 45 6 7 Parsimony tree

4 Distances vs. discrete characters SitesSequences 1 2 3 4 2 1 2 1 1 Distance tree

5 Distances vs. discrete characters Trees obtained by parsimony (a discrete method) and minimum evolution (a distance method) are identical in topology and branch lengths: Parsimony analysis identifies seven substitutions and places them on the five branches of the tree Distance tree apportions observed distances between sequences over branches of the tree Under parsimony each site requires one change, which gives a total of seven changes Summing the branch lengths of the distance tree gives the same value: 2 + 1 + 2 + 1 + 1 = 7 Parsimony tree gives additional information: which site contributes to which branch plus ancestral states

6 Clustering methods vs. search methods Cluster methods follow a set of steps (an algorithm) and arrive at a tree: Advantages: –Easy to implement, resulting in very fast computer programs –Always produce a single tree Disadvantages: –Results obtained from simple clustering algorithms often depend on the order in which sequences are added to the growing tree –Do not allow evaluation of competing hypotheses: two different trees could explain data equally well but no way of measuring fit between tree and data

7 A clustering method D ? D BC AD BC AD A BC Start tree A BC Decide where to place next sequence BC A Add next sequence to tree Round 1 Round 2 E ? E BC A D

8 Search methods Tree-building methods in this class use optimality criteria to choose among the set of all possible trees: Criterion is used to assign a “score” or “rank” to each tree which is a function of the relationship between the tree and the data Require an explicit function relating tree and data (e.g. a model of how sequences evolve) Allow comparison of how well competing hypotheses of evolutionary relationships fit the data Major disadvantage is that optimality methods are computationally very expensive: –For a given data set and tree, what is the optimality value? –Which of all possible trees has the maximum optimality value?

9 An optimality method ABC D EABC E DABE D CAEC D BEBC D A AC B D E AC B E D AE B D C AC E D B EC B D A AD B C E AE B C D AD B E C AD E C B ED B C A 46=115=8111213=114 10=875=1

10 Non-deterministic polynomial- completeness problems Non-deterministic polynomial-completeness problems represent a set of problems with no efficient algorithm for their solution known to exist Problem of finding the optimal evolutionary tree for a variety of criteria (e.g. minimum evolution, maximum parsimony) is NP-complete: For even a reasonable number of sequences (e.g. 20) it is impossible to guarantee that the optimal tree has been found In such cases, we must rely on heuristics to find something approaching the best tree, but this may be far from optimal Human mitochondrial DNA - different researchers obtained quite different trees using different heuristic searches

11 An heuristic method

12 Subtree methods The effectiveness of an heuristic search depends in part on the number of trees examined, which can be computationally demanding An alternative approach is to divide the set of sequences into smaller sets and find optimal trees for these subsets: Smallest unrooted tree is a quartet Each quartet has three possible unrooted trees Quartet puzzling follows these two steps: –For each quartet, identify the optimal tree –Take all four-sequence trees from step 1 and assemble them into a tree Due to homoplasy, the best tree will usually be the one which contains most quartets (but this is an NP-complete problem as well)

13 Comparing tree-building methods Type of data Distances Nucleotide sites Tree-building method Optimality criterion Clustering algorithm UPGMA Neighbour joining Minimum evolution Maximum parsimony Maximum likelihood

14 Comparing tree-building methods Efficiency: Effectively the time in which a computer program can find a tree Since virtually all optimality methods are NP-complete, efficient tree searching algorithms that guarantee the best tree are unlikely Some optimality criteria can be evaluated quicker than others: heuristic searches using parsimony can explore a much larger number of trees than a search using likelihood Power: Measure of how much data are needed before we can be reasonably sure of arriving at the correct result A method may be theoretically appealing, but if it requires huge numbers of sites it is not practical

15 Comparing tree-building methods Consistency: Will the method converge on the true tree as data are added? Inconsistent methods will fail even if data are continually added Robustness: All tree-building methods make (implicit or explicit) assumptions about evolutionary processes Sensitivity to violations of the underlying model which return poor estimates of phylogeny e.g. assumption of a molecular clock Falsifiability: The ability to tell whether these assumptions have been violated i.e. that we should not be using the method at all!


Download ppt "Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.1-2."

Similar presentations


Ads by Google