. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
2 Phylogenetic Trees - Methods There are several methods with which we construct trees and estimate how good a tree describes the data (and thus the evolution process) Distance based methods Parsimony character based methods Likelihood Whole genome/proteome methods
3 Additive Distances We say that a distance metric D on L objects is additive if there is an unrooted binary tree on L leaves, with positive edge weights, that realizes the distance D. Namely for all i,j, D(i,j)=D T (i,j)
4 Characterizing Additive Distances An additive distance is fully characterized by the four point condition: Any 4 points can be renamed such that
5 Trees from Additive Distances: Algorithm Verify that the distance matrix constitutes an additive metric Choose a pair of objects, which results in the first path in the tree. Choose a third object and establish the linear equations to let the object branch off the path. Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. 2. Once the new path branches off an edge in the tree, this insertion is finished. ABCDE A02747 B0747 C076 D07 E0 A C 7
6 Verify that the distance matrix constitutes an additive metric Choose a pair of objects, which results in the first path in the tree. Choose a third object and establish the linear equations to let the object branch off the path. Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. 2. Once the new path branches off an edge in the tree: This insertion is finished. ABCDE A02747 B0747 C076 D07 E0 A C 6 B 1 1 X Trees from Additive Distances: Algorithm
7 Verify that the distance matrix constitutes an additive metric Choose a pair of objects, which results in the first path in the tree. Choose a third object and establish the linear equations to let the object branch off the path. Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. 2. Once the new path branches off an edge in the tree: This insertion is finished. ABCDE A02747 B0747 C076 D07 E0 d(A,B)=d(A,X)+d(X,B) d(A,C)=d(A,X)+d(X,C) d(B,C)=d(B,X)+d(X,C) Trees from Additive Distances: Algorithm
8 Verify that the distance matrix constitutes an additive metric Choose a pair of objects, which results in the first path in the tree. Choose a third object and establish the linear equations to let the object branch off the path. Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. 2. Once the new path branches off an edge in the tree: This insertion is finished. ABCDE A02747 B0747 C076 D07 E0 A C 1 B D Trees from Additive Distances: Algorithm
9 Verify that the distance matrix constitutes an additive metric Choose a pair of objects, which results in the first path in the tree. Choose a third object and establish the linear equations to let the object branch off the path. Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. 2. Once the new path branches off an edge in the tree: This insertion is finished. ABCDE A02747 B0747 C076 D07 E0 A C 1 B D E 5 NO! Trees from Additive Distances: Algorithm
10 Verify that the distance matrix constitutes an additive metric Choose a pair of objects, which results in the first path in the tree. Choose a third object and establish the linear equations to let the object branch off the path. Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. 2. Once the new path branches off an edge in the tree: This insertion is finished. ABCDE A02747 B0747 C076 D07 E0 A C 1 B D E 3 3 Trees from Additive Distances: Algorithm
11 Verify that the distance matrix constitutes an additive metric Choose a pair of objects, which results in the first path in the tree. Choose a third object and establish the linear equations to let the object branch off the path. Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object is inserted at. 1. If the new path branches off an existing branch in the tree: Do the insertion step once more, replacing one of the two original leaves by another leaf along the branching path. 2. Once the new path branches off an edge in the tree: This insertion is finished. ABCDE A02747 B0747 C076 D07 E0 A C 1 B D E 3 3 is this necessary? Trees from Additive Distances: Algorithm
12 Reconstructing a Tree from an Additive Distance ABCDE A02747 B0747 C076 D07 E0 A C 1 B D E 3 3 By algorithm, given a distance matrix constituting an additive metric, the topology of the corresponding additive tree is unique. Q.: Given an additive metric on n leaves, what is the run time of the algorithm? A.: Number of phases is n. Work per phase is O(n). So total is O(n 2 ).
13 Approximating Additive Metrices In practice, the distance matrix between molecular sequences will not be additive. In such case we want to find a tree T whose distance matrix is “close” to the given one. The methods for exact tree reconstruction provide an inventory for heuristics for tree construction based on approximating additive metrics. Heuristics give exact results when operating on additive metrics, but the performance of solutions gets unclear when non additive metrics are handled.
14 Neighbor Finding How can we find from distances alone a pair of sisters (neighboring leaves)? Closest nodes are not necessarily neighboring leaves. A B C D Next, we show a way to find neighbors from distances.
15 Neighbour Joining Algorithm: Outline Identify a pair of leaves u,v as neighbors. Combine u,v into a new node, w. Update the distance matrix: Calculate w’s distance from any other node x of the tree using Notice that all 3 quantities on rhs are known. When only 3 nodes are left – compute 3 distances & finish.
16 Neighbour Joining Algorithm Identify a pair of neighbors i,j among n leaves. Combine i,j into a new node u. Update the distance matrix. When only 3 nodes are left – finish. Let r i be the sum of distances from i to every other node The measure between i and j we use in the algorithm is im jn kl
17 Neighbour Joining Algorithm Let r i be the sum of distances from i to all other nodes The measure between i and j we use in the algorithm is im jn kl
18 Neighbor Finding: Seitou & Nei method Theorem (Saitou&Nei) Assume D is additive, and all tree edge weights are positive. If X D (i,j) is minimal (among all pairs of leaves), then i and j are sister taxa in the tree. i j kl m T1T1 T2T2 The proof is rather involved, and will be skipped (no tears pls).skipped
19 Complexity of Neighbor Joining Algorithm Naive Implementation: Initialization: θ(L 2 ) to compute the X D (i,j)’s. Each Iteration: u O(L) to update {X D (i,k):i L} for the new node k. u O(L 2 ) to find the minimal X D (i,j). Total of O(L 3 ). u This can be improved using better data structures (e.g. heap) i j k m
20 Reconstructing Trees from Additive Matrices ABCDE A02747 B0747 C076 D07 E0 A C 1 B D E 3 3 Q: Do we have to test additivity before running NJ? A: By Seito-Nei, if matrix is additive, NJ will construct the correct tree. Algorithm does not care about awareness and need not know anything about the matrix!
21 Running NJ: Example on 4 Leaves ABCD A0236 B2035 C3306 D6560 U B A Remark: The X D values imply that the distances are not additive (why?).
22 Updated Distance Matrix, Choosing A,B as Neighbors UCD U024.5 C206 D 60 U B A V D Notice that now we have only one Choice: The neighbors are U and D.
23 Final Distance Matrix VC V05.6 C 0 U B A V D C Remark: Resulting tree is unrooted.
24 Reconstructing Trees from non Additive Matrices Q: What if the distance matrix is not additive? A: We could still run NJ! Q: But can anything be said about the resulting tree? A: Not really. Resulting tree topology could even vary according to way ties are resolved on the way. Remark: This indeed was the case with last example.
25 Almost Additive Matrix A distance matrix d’ is “almost additive” if there exists an additive matrix D such that Atteson: If d’ is almost additive with respect to a tree T, then the output of NJ is a tree T’ with the same topology as T
26 Distance Matrix Example
27 Unrooted Tree - NJ Root
28 Output - NJ Tree Branch length is proportional to distance
29 N-J Method produces an Unrooted, Additive tree