Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:

Similar presentations


Presentation on theme: ". Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:"— Presentation transcript:

1 . Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in: http://www.cs.technion.ac.il/~moran/lab06.htm - Come to me for more details -

2 . Phylogenetic Reconstruction We’d like to study the evolutionary history of species Distance-based approach: Calculate (ML) pairwise (evolutionary) distances between species Find the edge-weighted tree best describing this metric Major drawback: Lose of information when reducing data to pairwise distances Character-based approach: Consider the character vector of each specie: – morphological characters – bio-molecular characters Optimization criteria: – parsimony – likelihood / posterior-probability

3 3 Parsimony-score: Number of character-changes ( mutations ) along the evolutionary tree (tree containing labels on internal vertices) Example: Most Parsimonious Tree AGA AAA AAG GGA 1 1 02 0 0 1 0 01 0 1 AAA AGA AAA AAG GGA AAA AGA Most parsimonious tree:  Tree with minimal parsimony score Score = 4 Score = 3 Minimal Evolution Principle

4 4 We break the problem into two: 1.Small parsimony: Given the topology find the best assignment to internal nodes 2.Large parsimony: Find the topology which gives best score  Large parsimony is NP-hard  We’ll show solution to small parsimony ( Fitch and Sankoff’s algorithms ) Input to small parsimony: tree with character-state assignments to leaves Example: Small vs. Large Parsimony AardvarkBisonChimpDog Elephant A: CAGGTA B: CAGACA C: CGGGTA D: TGCACT E: TGCGTA

5 5 Fitch’s Algorithm Execute independently for each character: 1.Bottom-up phase: Determine set of possible states for each internal node 2.Top-down phase: Pick states for each internal node AardvarkBisonChimpDog Elephant 1 2 CA G GTA CA G ACA CG G GTA TG C ACT TG C GTA Dynamic Programming framework

6 6 Determine set of possible states for each internal node Initialization: R i = {s i } Do a post-order (from leaves to root) traversal of tree –Determine R i of internal node i with children j, k : Fitch’s Algorithm Bottom-up phase Parsimony-score = # union operations T CT T CTAGT AGT GT T score = 3

7 7 Pick states for each internal node Pick arbitrary state in R root for the root Do pre-order (from root to leaves) traversal of tree –Determine s j of internal node j with parent i : Fitch’s Algorithm Top-down phase T CT T CTAGT AGT GT T Complexity: O(mnk) #characters #taxa/nodes #states score = 3

8 8 Weighted Parsimony Sankoff’s algorithm Each mutation a↔b costs differently - S(a,b). 1.Bottom-up phase: Determine R i (s) – cost of optimal state- assignment for subtree of i, when it is assigned state s. 2.Top-down phase: Pick optimal states for each internal node Fitch’s algorithm as special case: R i – set of states which yield minimal-cost subtree of i Same as algorithm for optimal lifted tree alignment (Tutorial #4)

9 9 Determine R i (s) for each internal node Initialization: Do a post-order (from leaves to root) traversal of tree –Determine R i of internal node i with children j, k : Sankoff’s Algorithm Bottom-up phase CTAGTT Natural generalization For non-binary trees Remember pointers s  s’

10 10 Pick states for each internal node Select minimal cost character for root ( s minimizing R root (s) ) Do pre-order (from root to leaves) traversal of tree: - For internal node j, with parent i, select state that produced minimal cost at i (use pointers kept in 1 st stage) Sankoff’s Algorithm Top-down phase CTAGT T Complexity: O(mnk 2 ) #characters #taxa/nodes #states

11 11 Unweighted parsimony: Sankoff ’ s algorithm: R i (s) - cost of optimal subtree of i, when it is assigned state s Fitch ’ s algorithm: Score(i) - cost of optimal state-assignment for subtree of i R i - set of optimal state-assignment for subtree of i We need to show that: 1.Optimal tree assigns node i with state from R i. 2.Fitch’s bottom-up recursive formula for R i. is correct: Fitch’s Algorithm as special case of Sankoff’s algorithm Check for yourselves

12 12 Unweighted parsimony: Score(i) - cost of optimal state-assignment for subtree of i R i - set of optimal state-assignment for subtree of i We need to show that: 1.Optimal tree assigns node i with state from R i. Trivially true for the root Assume ( to the contrary ) that in an optimal assignment, some node – j is assigned s j ∉ R j root i j s j ∉ R j  R j (s j ) ≥ Score(j)+1  By switching from s j to some s ∊ R j we do not raise the parsimony-score Why is this not the case for the weighted version? Parsimony-score is integer Fitch’s Algorithm as special case of Sankoff’s algorithm

13 13 Exploring the Space of Trees We saw how to find optimal state-assignment for a given tree topology We need to explore space of topologies Given n sequences there are (2n-3)!! possible rooted trees and (2n-5)!! possible unrooted trees taxa (n) # rooted trees # unrooted trees 331 4 153 5 10515 6 945105 8 135,13510,395 10 34,459,4252,027,025

14 14 Exploring the Space of Trees Possible solutions: 1.Heuristic solutions for “ traveling ” through “ topology-space ” 2.Find (basic) topology using distance-based methods (NJ) Notice another problem: We obtain state-assignments to taxa using multiple alignment We obtain optimal MA using topology of phylogenetic tree (e.g. CLUSTAL ) Solution: Again, use some initial topology (via NJ) A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- C 1,C 2, …, C m


Download ppt ". Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:"

Similar presentations


Ads by Google