UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 17.4-6: Strings and.

Slides:



Advertisements
Similar presentations
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Advertisements

PHYLOGENETIC TREES Bulent Moller CSE March 2004.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Greedy Algorithms Greed is good. (Some of the time)
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Lectures on Network Flows
Molecular Evolution Revised 29/12/06
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 12: Refining Core String.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
D. Gusfield, V. Bansal (Recomb 2005) A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Bioinformatics Algorithms and Data Structures
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 11 sections4-7 Lecturer:
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Phylogenetic Trees: Assumptions All existing species have a common ancestor Each species is descended from a single ancestor Each speciation gives rise.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 14.10: Common Multiple.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Data Structures – LECTURE 10 Huffman coding
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 11: Core String Edits.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Perfect Phylogeny MLE for Phylogeny Lecture 14
Backtracking.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
Phylogenetic trees Sushmita Roy BMI/CS 576
Important Problem Types and Fundamental Data Structures
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Phylogenetic Trees - Parsimony Tutorial #13
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Multiple Alignment.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
by d. gusfield v. bansal v. bafna y. song presented by vikas taliwal
Lectures on Network Flows
Character-Based Phylogeny Reconstruction
Multiple Alignment and Phylogenetic Trees
Graph Algorithms Using Depth First Search
Bioinformatics Algorithms and Data Structures
BNFO 602 Phylogenetics Usman Roshan.
CS 581 Tandy Warnow.
Backtracking and Branch-and-Bound
Phylogeny.
Bioinformatics Algorithms and Data Structures
Computational Genomics Lecture #3a
Switching Lemmas and Proof Complexity
Presentation transcript:

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and Evolutionary Trees Lecturer: Dr. Rose Slides by: Dr. Rose April 10, 2007

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem Centrality Four related tree problems: 1.Ultrametric 2.Additive 3.Binary perfect phylogeny 4.Tree compatibility All can be solved as ultrametric tree problems. Recall tree compatibility reduces to perfect phylogeny. Now we reduce additive tree & (binary) perfect phylogeny problems to the ultrametric tree problem.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Goal: reduce additive tree problem to ultrametric problem Complexity: O(n 2 ) reduction Approach: create a matrix D that is ultrametric  D is additive. We will start by describing a reduction that involves a tree T for D and T for D. We will then describe a direct reduction of D to D.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Assume that D is additive. Assume that we know of an additive tree T for D Assume that each of the n taxa in D labels a leaf of T. Idea: label the nodes of T to create an ultrametric tree T. Q: How can we do this?

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees A: we will do the following: –Select one node as the root –Stretch the leaf edges so that they are equidistant from the root. Let v be the row of D containing the largest entry. Let m v denote the value of this entry. Select node v as the root of T. This creates a directed tree.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Example: A is the row of D containing the largest entry. Select node A as the root of T.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Stretch leaf edges: –for each leaf i, add m A – D(A, i) to the leaf edge. –Leaf edges are now equidistant from A.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees The resulting tree T is: –a rooted edge-weighted tree –distance m v from root to every leaf –each internal node is equidistant to leaves in its subtree.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Since each internal node is equidistant to the leaves in its subtree: Label each internal node by this unique distance. These labels can be used to define an ultrametric matrix D. D(i, j) is the label at the least common ancestor of leaves i and j in T. Q: How can we go directly from matrix D to matrix D without involving T and T?

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Consider leaves i & j in T: –Let node w be their least common ancestor –Let x be the distance from the root v to w. –Let y be the distance from node w to leaf i.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Q: What is the distance from w to i in T? A: y + m v - D(v, i) in T. Q: Where does m v - D(v, i) come from? A: Recall we add m v - D(v, i) to stretch the leaf edges.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Gusfield presents the following lemma: Without knowing T or T´ explicitly, we can deduce that D´(i, j) = m v + (D(i, j) - D(v, i) - D(v, j))/2 Q: Is this equation correct? D´(i, j) = m v + ((y + z) - (x + y) - (x + z))/2 ? D´(i, j) = m v + -2x/2 ? Should it instead be: D´(i, j) = 2m v + D(i, j) - D(v, i) - D(v, j)? i.e., D´(i, j) = 2m v - 2x? Probably, but it is not necessary for the reduction (slide 9)

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees This brings us to the following Theorem: If D is an additive matrix, then D´ is ultrametric, where D´(i, j) = m v + (D(i, j) - D(v, i) - D(v, j))/2 Proof. We’ve shown that: D´(i, j) = y + m v - D(v, i) y = D(v, i) – x x = (D(v, i) + D(v, j) - D(i, j))/2 Putting it altogether establishes the equation in the theorem. D´ satisfies the ultrametric requirement.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Q: What is the value of y? A: y = D(v, i) - x. Q: What is the value of x in terms of values in D? A: x = (D(v, i) + D(v, j) - D(i, j))/2

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees So: D additive  D´ ultrametric By contraposition:  D´ ultrametric   D additive Q: does D´ ultrametric  D additive? A: Theorem: D´ ultrametric  D additive Proof. (constructive) Let T ´´ be the ultrametric tree for D´ Assign weights to edges of T ´´ –Note: the sum of edges from a leaf to an ancestor must match the ancestor’s label. –For each edge (p, q), assign the weight |p-q|

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Assign weights to edges of T ´´ continued –Note the path distance between leaves (i, j) is twice the value labeling the least common ancestor –Hence, 2D´(i, j) = 2m v + D(i, j) - D(v, i) - D(v, j) –Now shrink the edge into each leaf i by m v - D(v, i) –The path from leaf i to leaf j is now D(i, j) The result is an additive tree for matrix D from D´’s ultrametric tree. Putting all of this together results in a method for contructing and additive tree for an additive matrix.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Additive Tree Algorithm –Create matrix D´ from D. –Create ultrametric tree T ´´ from D´ –Create T from T ´´ Label edge (p, q) with the value |p-q| For each leaf i, shrink the leaf edge by m v - D(v, i) Note: no step takes more than O(n 2 ) time. Thm. An additive tree for an additive matrix can be constructed in O(n 2 ) time.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Example: Given D, first find D´ Recall: D´(i, j) = m v + (D(i, j) - D(v, i) - D(v, j))/2

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Example: From D´ find T´´ Recall: label edge inner edges (p, q) by |p-q|

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Example: From T´´ find T Recall: shrink leaf edge i by m v - D(v, i)

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Additive Trees Example: Finally compare the derived T with the original tree as a sanity check.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Perfect Phylogeny We now recast perfect phylogeny in terms of an ultrametric tree problem. Defn. D M – the n by n matrix of shared characters More formally: Given the n by m character matrix M, define the n by n matrix D M : for each pair of objects, set D M (p, q) to be the number of characters that p and q both possess.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Perfect Phylogeny Lemma: If M has a perfect phylogeny, then D M is a min-ultrametric matrix. Proof: convert M’s perfect phylogeny T to a min- ultrametric tree for D M –Let T be the perfect phylogeny for M. –Label T’s root be zero. –Traverse T from top to bottom, for each node v: Let p v be the number labeling node v’s parent. Let e v be the # of characters labeling the edge into v. Label node v with the sum p v + e v

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Perfect Phylogeny –The label of node v is the number of characters common to all leaves in the subtree rooted at v. –if v is the immediate parent of leaves p and q, then the label of v is D M (p, q) –The numbers labeling nodes on any path from the root are strictly increasing.  The result is an ultrametric tree for matrix D M.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Perfect Phylogeny Algorithm: perfect phylogeny via ultrametrics: 1.Create matrix D M from M. 2.Attempt to create a min-ultrametric tree T´ from D M. If not possible, then M has no perfect phylogeny. 3.If T´ was successfully created in step 2: Attempt to label its edges with the m characters of M. If not possible, then M has no perfect phylogeny. O/w the modified T´ is the perfect phylogeny T. Note: T´ may be min-ultrametric but M may not have a perfect phylogeny, hence the check in step 3

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Ultrametric Problem: Perfect Phylogeny Final notes on the centrality ultrametric problem. We can see that the following problems: 1.perfect phylogeny 2.tree compatibility can be cast as ultrametric problems. This is not an efficient way to address these problems.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Maximum Parsimony Maximum parsimony: Perfect phylogeny is a special instance Can be viewed as a Steiner tree problem on a hypercube Presentation Approach: Introduce Steiner trees Hypercube graphs Maximum parsimony as a Steiner tree problem

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Maximum Parsimony Definitions: Let N be a set of nodes Let E be a set undirected edges with non-negative weight Let G = (N, E) be an undirected graph Let X  N be a subset of nodes. A Steiner tree ST for X is any connected subtree of G that contains all nodes of X and possibly nodes in N-X. Weighted Steiner Tree Problem: Given G and X, find the Steiner tree of minimum total weight.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Maximum Parsimony More Definitions: A hypercube of dimension d is an undirected graph with 2 d nodes, labeled 0..2 d -1. Adjacent nodes differ in only one label bit position. The weighted Steiner tree problem on hypercubes: G must be a hypercube.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Maximum Parsimony More Definitions: Maximum Parsimony: Occam’s razor applied to phylogenetic reconstruction. A preference for trees requiring fewer evolutionary events to explain data. Gusfield’s definition: The Maximum Parsimony problem is the unweighted Steiner tree problem on a d-dimensional hypercube.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Maximum Parsimony More about the hypercube formulation of MP: –The X input taxa are described as d-length binary vectors. –Recall: adjacent nodes differ in only one label bit position. –Correspondingly, taxa that differ by a single mutation will be adjacent.  Steiner tree of X nodes and l edges iff  a corresponding phylogenetic tree that entails l character-state mutations.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Steiner interpretation of Perfect Phylogeny Define a nontrivial binary character to be a character contained by some taxa but not all. Consider an MP dataset of d nontrivial binary characters Q: what is the minimal number of mutations in the MP tree? A: at least d.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Steiner interpretation of Perfect Phylogeny Q: What is the relation to binary perfect phylogeny? A: the binary perfect phylogeny problem is equivalent to asking if there is an MP solution with a cost of exactly d. Q: What about generalized perfect phylogeny? A: It’s similar. The lower bound must reflect: –the number of character states in the input taxa. –a character having r states in the input taxa is allowed only r-1 transitions.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Steiner interpretation of Perfect Phylogeny Complexity: No known efficient solution for Steiner tree problem on unweighted graphs. Polynomial time solution for generalized perfect phylogeny problem when r is fixed.  this particular Steiner tree problem can be answer in polynomial time.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Steiner interpretation of Perfect Phylogeny MP approximations: –The weighted Steiner tree problem on hypercubes is NP-hard. –There is an approximate method with an error bound of a factor of 11/6. –Also MST can be used to find a Steiner tree with weight less than twice the optimal Steiner tree.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Phylogenetic Alignment Recall: phylogenetic alignment was discussed in section 14.8 The focus was on deriving a multiple alignment enlightened by evolutionary history. The tree focused emphasis on specific alignment groupings Internal node sequences were a secondary artifact

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Phylogenetic Alignment Phylogenetic alignment as a parsimony problem: In contrast: we are now interested in the internal sequences These sequences are waypoints in the evoutionary trajectory leading to the extant taxa phylogenetic alignment is thus a parsimony problem

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Phylogenetic Alignment Hypothesis: optimal phylogenetic alignment describes evolutionary history. Assumptions: –Edit distance realistically models evolutionary distance –Globally optimal phylogenetic alignment captures essence of the evolutionary process We will look at minimum mutation, a variant of phylogenetic alignment

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Fitch-Hartigan minimum mutation problem Defn. minimum mutation problem – variant of phylogenetic alignment problem. Input comprised of: 1.Tree 2.Strings labeling the leaves 3.A multiple alignment of those strings

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Fitch-Hartigan minimum mutation problem Q: If you are given the tree and the multiple alignment, what is left to compute? A: the mutations that accounts for the input data. These mutations should be: 1.minimum sequence of site mutations that is 2.compatible with the given tree and 3.the given multiple alignment.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Fitch-Hartigan minimum mutation problem Q: How is the input data used to determine the minimum sequence of mutations? 1.The multiple alignment associates each amino acid with a specific position. 2.The evolutionary history of the sequences is then treated as a combined but independent evolutionary history of each position. 3.The tree guides the order of mutations for each position.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Fitch-Hartigan minimum mutation problem Assumptions: –Each column of the alignment can be solved separately –The strings labeling inner nodes adhere to the same alignment The problem reduces to a computation at a single position.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Fitch-Hartigan minimum mutation problem Minimum mutation for a single position: Input: 1.rooted tree with n nodes 2.Each leaf is labeled by a single character Output: 1.Each interior node is labeled by a single character 2.The labeling minimizes the number of edges between nodes with different labels.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Fitch-Hartigan minimum mutation problem Algorithmic approach: Dynamic Programming Let T v denote the subtree rooted at node v Let C(v) be the cost of the optimal solution for T v Let C(v, x) be the cost when v must be labeled by x Let v i denote the i th child of node v Base case: for each leaf specify C(v) & C(v, x)  x  . C(v) = 0 & C(v, x) = 0 if leaf v is labeled by x. C(v, x) =  if leaf v is not labeled by x.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Fitch-Hartigan minimum mutation problem When v is an internal node: The recurrence relations start from the base cases. Bottom up from leaves Backtracking is used to after all C(v,x) computed to extract the solution.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Fitch-Hartigan minimum mutation problem Backtracking process: The root is labeled by the character x s.t. C(r) = C(r,x) The traversal is then top-down If v is labeled x, then v i is labeled: character x if C(v i ) + 1 > C(v i,x) o/w character y such that C(v i ) = C(v i,y)

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Fitch-Hartigan minimum mutation problem Let’s evaluate an example: C(v) = 0 & C(v, x) = 0 if leaf v is labeled by x, o/w C(v, x) =  if leaf v is not labeled by x.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Fitch-Hartigan minimum mutation problem Time complexity: Bottom-up portion –Let  = |  | –Each node is evaluate wrt each x   –For n nodes this gives O(n  ) The backtracking portion is O(n) Overall O(n  )

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Maximum Parsimony Most widely used tree building algorithm Differs from distance-based algorithms: –Does not actually build trees from distances –Parsimony is used to compute the cost of a tree –A search strategy is used to search through all topologies –Goal: find the tree topology with the overall minimum cost

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Traditional Parsimony Algorithm: Traditional parsimony [Fitch 1971] Goal: count the number of substitutions at a site. Method: recursion, keeping track of –C, the current cost –R k, the residues at k, the current node

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Traditional Parsimony Algorithm: Traditional parsimony [Fitch 1971] C = 0, k = root/ initialize the cost and TP(k) { If k is a leaf then return x k R left = TP( k.left) R right = TP(k.right) if R left  R right   return R left  R right else { C = C +1 return R left  R right }}

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Traditional Parsimony Let’s evaluate an example: if R left  R right   return R left  R right else C = C +1, return R left  R right

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Traditional Parsimony There is a traceback procedure for finding ancestral assignments. Q: How do you think the traceback works? A: Start from the root: 1.Pick a residue 2.Pick the same residue for each child set if possible 3.If a child set does not contain the parent’s residue, randomly select a residue from its set.

UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Traditional Parsimony Let’s perform the traceback on our example: