Presentation is loading. Please wait.

Presentation is loading. Please wait.

#31 - Phylogenetics Character-Based Methods

Similar presentations


Presentation on theme: "#31 - Phylogenetics Character-Based Methods"— Presentation transcript:

1 #31 - Phylogenetics Character-Based Methods
BCB 444/544 11/05/07 Lecture 31 Phylogenetics – Character-Based Methods #31_Nov05 BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

2 Required Reading (before lecture)
#31 - Phylogenetics Character-Based Methods Required Reading (before lecture) 11/05/07 Fri Oct 30 - Lecture 30 Phylogenetic – Distance-Based Methods Chp 11 - pp 142 – 169 Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33 Functional and Comparative Genomics Chp 17 and Chp 18 BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

3 Assignments & Announcements
#31 - Phylogenetics Character-Based Methods Assignments & Announcements 11/05/07 Mon Oct 29 - HW#5 HW#5 = Hands-on exercises with phylogenetics and tree-building software Due: Mon Nov 5 (not Fri Nov 1 as previously posted) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

4 BCB 544 Only: New Homework Assignment
#31 - Phylogenetics Character-Based Methods 11/05/07 BCB 544 Only: New Homework Assignment 544 Extra#2 Due: √PART 1 - ASAP PART 2 - meeting prior to 5 PM Fri Nov 2 Part 1 - Brief outline of Project, to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

5 #31 - Phylogenetics Character-Based Methods
11/05/07 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: Nov 7 Wed - BBMB Seminar 4:10 in 1414 MBB Sharon Roth Dent MD Anderson Cancer Center Role of chromatin and chromatin modifying proteins in regulating gene expression Nov 8 Thurs - BBMB Seminar 4:10 in 1414 MBB Jianzhi George Zhang U. Michigan Evolution of new functions for proteins Nov 9 Fri - BCB Faculty Seminar 2:10 in 102 SciI Amy Andreotti ISU Something about NMR BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

6 Chp 11 – Phylogenetic Tree Construction Methods and Programs
#31 - Phylogenetics Character-Based Methods 11/05/07 Chp 11 – Phylogenetic Tree Construction Methods and Programs SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs Distance-Based Methods Character-Based Methods Phylogenetic Tree Evaluation Phylogenetic Programs BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

7 Two main categories of tree building methods Distance-based
Tree Construction Two main categories of tree building methods Distance-based Overall similarity between sequences Character-based Consider the entire MSA BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

8 Summary of Distance-Based Methods
Clustering-based methods: Computationally very fast and can handle large datasets that other methods cannot Not guaranteed to find the best tree Optimality-based methods: Better overall accuracies Computationally slow All distance-based methods lose all sequence information and cannot infer the most likely state at an internal node BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

9 Character-Based Methods
Based directly on the sequence characters in the MSA rather than overall distances Count mutational events accumulated on sequences Evolutionary dynamics of each character can be studied and ancestral sequences inferred Two popular approaches Parsimony Maximum Likelihood (ML) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

10 Parsimony Parsimony is based on Occam’s Razor – the simplest explanation is most likely correct Goal: Find the tree that allows evolution of the sequences with the fewest changes BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

11 Parsimony Parsimony score of a tree: The smallest (weighted) number of steps required by the tree Two parsimony problems: Large Parsimony problem: Find the tree with the lowest parsimony score Small Parsimony problem: Given a tree, find its parsimony score Use the small parsimony problem to solve the large parsimony problem BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

12 Algorithms for Small Parsimony
Fitch’s algorithm: Based on set operations Evolutionary steps have the same weight Sankoff’s algorithm: Based on dynamic programming Allows steps to have different weights Both algorithms compute the minimum (weighted) number of steps a tree requires at a given site BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

13 Fitch’s Algorithm Example
BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

14 Allows for different weights for different evolutionary steps
Sankoff’s Algorithm Allows for different weights for different evolutionary steps Transitions (A <-> G or C <-> T) are more probable than transversions, so give a lower weight to transitions BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

15 Sankoff’s Algorithm Example
BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

16 Sankoff’s Algorithm Traceback
BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

17 Searching for a Most Parsimonious Tree
Solving the large parsimony problem requires searching all possible trees (or does it?) Exhaustive search (exact) Branch-and-Bound (exact) Heuristic search methods (not exact) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

18 Try all possible places to add the fourth taxon and score each tree
Exhaustive Search Build the only possible unrooted tree for three taxa (can be randomly chosen) Try all possible places to add the fourth taxon and score each tree Try all places to add the fifth taxon to the trees and score again … BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

19 Why Finding a True Tree is Difficult
Number of rooted trees The number of possible trees grows exponentially with the number of species (or sequences) Nr = (2n -3)!/2(n-2)(n-2)! Nu = (2n -5)!/2(n-3)(n-3)! To find the best tree, you must explore all possibilities (or must you?) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

20 Adding the Fourth Taxon
BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

21 Adding the Fifth Taxon BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

22 BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

23 Branch and Bound Similar to exhaustive search except that we maintain the score of best tree obtained so far If score of current tree exceeds the current best score, backtrack and take next available path Main idea: The parsimony score of a tree can only increase as we add another taxa BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

24 Branch and Bound When a tip of the search tree is reached the tree is either optimal (and retained) or suboptimal (and rejected) When all paths leading from the initial 3 taxon tree have been explored, the algorithm terminates, and all most parsimonious trees will have been identified BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

25 Branch and Bound BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

26 One way to find a reasonable lower bound quickly:
Branch and Bound One way to find a reasonable lower bound quickly: Use UPGMA or NJ to build a complete tree Calculate the parsimony score of this tree and use it as a lower bound in our search BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

27 Shortcuts have been designed to reduce the search space
Heuristic Search Shortcuts have been designed to reduce the search space Idea: Build a tree quickly (by NJ or some other fast method) and rearrange parts of it to explore some of the possible trees Branch swapping Nearest neighbor interchange Subtree pruning and regrafting Tree bisection and reconnection BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

28 Nearest-Neighbor Interchange
BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

29 Subtree Pruning and Regrafting
BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

30 Tree Bisection and Reconnection
BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

31 Stepwise Addition – Another Heuristic
A greedy method Start with 3 taxon tree Add one taxon at a time Keep only the best tree found so far No guarantee of optimality, but may provide a good starting point for a search BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

32 Maximum Likelihood Method
ML is based on a Markov model of evolution Observed: The species labeling the leaves Hidden: The ancestral states Transition probabilities: The mutation probabilities Assumptions: Only mutations are allowed Sites are independent BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

33 Models of Evolution at a Site
Transition probability matrix: M = [mij], i,j {A,C,T,G} Where mij = Prob(i -> j mutation in 1 time unit) Branches may have different lengths BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

34 The Probability of an Assignment
C T Probability = mTG · mGA · mGG · mTT · mTC · mTT BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

35 Ancestral Reconstruction: Most Likely Assignment
X Y Z A G C T L* = maxX,Y,Z {mXY · mYA · mYG · mXZ · mZC · mZT} Compute using Viterbi algorithm BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

36 L* = X,Y,Z {mXY · mYA · mYG · mXZ · mZC · mZT}
Likelihood of a Tree X Y Z A G C T L* = X,Y,Z {mXY · mYA · mYG · mXZ · mZC · mZT} Compute using forward algorithm BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

37 Maximum Likelihood Comments
ML is robust ML converges to the correct answer as more data is added Can put in a Bayesian statistical framework to obtain a distribution of possible phylogenies ML can be slow BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

38 Phylogenetic Tree Evaluation
Bootstrapping Jackknifing Bayesian Simulation Statistical difference tests (are two trees significantly different?) Kishino-Hasegawa Test (paired t-test) Shimodaira-Hasegawa Test (χ2 test) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

39 Bootstrapping A bootstrap sample is obtained by sampling sites randomly with replacement Obtain a data matrix with same number of taxa and number of characters as original one Construct trees for samples For each branch in original tree, compute fraction of bootstrap samples in which that branch appears Assigns a bootstrap support value to each branch Idea: If a grouping has a lot of support, it will be supported by at least some positions in most of the bootstrap samples BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

40 Bootstrapping Comments
Bootstrapping doesn’t really assess the accuracy of a tree, only indicates the consistency of the data To get reliable statistics, bootstrapping needs to be done on your tree 500 – 1000 times, this is a big problem if your tree took a few days to construct BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

41 Another resampling technique
Jackknifing Another resampling technique Randomly delete half of the sites in the dataset Construct new tree with this smaller dataset, see how often taxa are grouped Advantage – sites aren’t duplicated Disadvantage – again really only measuring consistency of the data BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

42 Bayesian Simulation Using a Bayesian ML method to produce a tree automatically calculates the probability of many trees during the search Most trees sampled in the Bayesian ML search are near an optimal tree BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

43 Phylogenetic Programs
Huge list at: PAUP* - one of the most popular programs, commercial, Mac and Unix only, nice user interface PHYLIP – free, multiplatform, a bit difficult to use but web servers make it easier WebPhylip – another interface for PHYLIP online BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

44 Phylogenetic Programs
TREE-PUZZLE – uses a heuristic to allow ML on large datasets, also available as a web server PHYML – web based, uses genetic algorithm MrBayes – Bayesian program, fast and can handle large datasets, multiplatform download BAMBE – web based Bayesian program BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

45 Final Comments on Phylogenetics
No method is perfect Different methods make very different assumptions If multiple methods using different assumptions come up with similar results, we should trust the results more than any single method BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods


Download ppt "#31 - Phylogenetics Character-Based Methods"

Similar presentations


Ads by Google