dij(T) - the length of a path between leaves i and j

dij(T) - the length of a path between leaves i and j
Distance in Trees dij(T) - the length of a path between leaves i and j i j d1,4 = = 69

Phylogenetic Tree Reconstruction
Input: Distance matrix D Output: Binary Tree T such that dij(T) = Dij

Reconstructing a 3 Leaved Tree
Tree reconstruction for any 3x3 matrix is straightforward We have 3 leaves i, j, k and a center vertex c Observe: dic + djc = Dij dic + dkc = Dik djc + dkc = Djk

Reconstructing a 3 Leaved Tree (cont’d)
dic + djc = Dij + dic + dkc = Dik 2dic + djc + dkc = Dij + Dik 2dic + Djk = Dij + Dik dic = (Dij + Dik – Djk)/2 Similarly, djc = (Dij + Djk – Dik)/2 dkc = (Dki + Dkj – Dij)/2

Trees with > 3 Leaves An tree with n leaves has 2n-3 edges
This means fitting a given tree to a distance matrix D requires solving a system of “n choose 2” equations with 2n-3 variables This is not always possible to solve for n > 3

The Four Point Condition
Compute: 1. Dij + Dkl, 2. Dik + Djl, 3. Dil + Djk 2 3 1 2 and 3 represent the same number: the length of all edges + the middle edge (it is counted twice) 1 represents a smaller number: the length of all edges – the middle edge

The Four Point Condition
Four point condition: For i,j,k,l two of the sums Dij + Dkl, Dik + Djl, Dil + Djk are equal and the third sum is smaller Definition : An n x n matrix D is additive provided there exists a tree T with D(T) = D. (Note: T is unique.) Theorem: D is additive if and only if the four point condition holds for every quartet 1 ≤ i,j,k,l ≤ n

Additive Distance Matrices
Matrix D is ADDITIVE if there exists a tree T with dij(T) = Dij NON-ADDITIVE otherwise

Reconstructing Additive Distances Given T
x T D y v w x y z 10 17 16 15 14 9 z w v If we know T and D, but do not know the length of each edge, we can reconstruct those lengths

x v w x y z 10 17 16 15 14 9 T y D z a w dvx + dwx = 2 dax + dvw v a x y z 11 10 9 15 14 dax = ½ (dvx + dwx – dvw) day = ½ (dvy + dwy – dvw) D1 daz = ½ (dvz + dwz – dvw)

x a x y z 11 10 9 15 14 T y 5 4 D1 b 3 z 3 a 7 c 4 w 6 d(a, c) = 3 d(b, c) = d(a, b) – d(a, c) = 3 d(c, z) = d(a, z) – d(a, c) = 7 d(b, x) = d(a, x) – d(a, b) = 5 d(b, y) = d(a, y) – d(a, b) = 4 d(a, w) = d(z, w) – d(a, z) = 4 d(a, v) = d(z, v) – d(a, z) = 6 Correct!!! v a b z 6 10 D3 D2 a c 3

Distance Based Phylogeny Problem
Goal: Reconstruct an evolutionary tree from a distance matrix Input: n x n distance matrix Dij Output: weighted tree T with n leaves fitting D If D is additive, this problem has a solution and there is a simple algorithm to solve it

Using Neighboring Leaves to Construct the Tree
Find neighboring leaves i and j with parent k Remove the rows and columns of i and j Add a new row and column corresponding to k, where the distance from k to any other leaf m can be computed as: Dkm = (Dim + Djm – Dij)/2 Compress i and j into k, iterate algorithm for rest of tree

Finding Neighboring Leaves
To find neighboring leaves we simply select a pair of closest leaves.

To find neighboring leaves we simply select a pair of closest leaves. WRONG

Closest leaves aren’t necessarily neighbors i and j are neighbors, but (dij = 13) > (djk = 12) Finding a pair of neighboring leaves is a nontrivial problem!

Degenerate Triples A degenerate triple is a set of three distinct elements 1≤i,j,k≤n where Dij + Djk = Dik Element j in a degenerate triple i,j,k lies on the evolutionary path from i to k (or is attached to this path by an edge of length 0).

Looking for Degenerate Triples
If distance matrix D has a degenerate triple i,j,k then j can be “removed” from D thus reducing the size of the problem. If distance matrix D does not have a degenerate triple i,j,k, one can “create” a degenerative triple in D by shortening all hanging edges (in the tree).

Shortening Hanging Edges to Produce Degenerate Triples
Shorten all “hanging” edges (edges that connect leaves) until a degenerate triple is found

Finding Degenerate Triples
If there is no degenerate triple, all hanging edges are reduced by the same amount δ, so that all pair-wise distances in the matrix are reduced by 2δ. Eventually this process collapses one of the leaves (when δ = length of shortest hanging edge), forming a degenerate triple i,j,k and reducing the size of the distance matrix D. The attachment point for j can be recovered in the reverse transformations by saving Dij for each collapsed leaf.

Reconstructing Trees for Additive Distance Matrices
Trim(D, δ) for all 1 ≤ i ≠ j ≤ n Dij = Dij - 2δ

AdditivePhylogeny Algorithm
AdditivePhylogeny(D) if D is a 2 x 2 matrix T = tree of a single edge of length D1,2 return T if D is non-degenerate Compute trimming parameter δ Trim(D, δ) Find a triple i, j, k in D such that Dij + Djk = Dik x = Dij Remove jth row and jth column from D T = AdditivePhylogeny(D) Traceback

AdditivePhylogeny (cont’d)
Traceback Add a new vertex v to T at distance x from i to k Add j back to T by creating an edge (v,j) of length 0 for every leaf l in T if distance from l to v in the tree ≠ Dl,j output “matrix is not additive” return Extend all “hanging” edges by length δ return T

Neighbor Joining Algorithm
In 1987 Naruya Saitou and Masatoshi Nei developed a neighbor joining algorithm for phylogenetic tree reconstruction Finds a pair of leaves that are close to each other but far from other leaves: implicitly finds a pair of neighboring leaves Advantages: works well for additive and other non-additive matrices, it does not have the flawed molecular clock assumption

Neighbor-Joining Guaranteed to produce the correct tree if distance is additive May produce a good tree even when distance is not additive Let C = current clusters. Step 1: Finding neighboring clusters Define: u(C) =1/(|C|-2) C’ 2 C D(C, C0 ) u(C) measures separation of C from other clusters Want to minimize D(C1, C2) and maximize u(C1) + u(C2) Magic trick: Choose C1 and C2 that minimize D(C1, C2) - (u(C1) + u(C2) ) Claim: Above ensures that Dij is minimal iff i, j are neighbors Proof: Very technical, please read Durbin et al.! 1 3 0.1 0.1 0.1 0.4 0.4 2 4

Algorithm: Neighbor-joining
Initialization: For n clusters, one for each leaf node Define T to be the set of leaf nodes, one per sequence Iteration: Pick Ci, Cj s.t. D(Ci, Cj) – (u(C1) + u(C2)) is minimal Merge C1 and C2 into new cluster with |C1| + |C2| elements Add a new vertex C to T and connect to vertices C1 and C2 Assign length 1/2 (D(C1, C2) + (u(C1) - u(C2) ) to edge (C1, C) Assign length 1/2 (D(C1, C2) + (u(C2) - u(C1) ) to edge (C2, C) Remove rows and columns from D corresponding to C1 and C2; Add row and column to D for new cluster C Termination: When only one cluster

dij(T) - the length of a path between leaves i and j

Similar presentations

Presentation on theme: "dij(T) - the length of a path between leaves i and j"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

dij(T) - the length of a path between leaves i and j

Similar presentations

Presentation on theme: "dij(T) - the length of a path between leaves i and j"— Presentation transcript:

Similar presentations

About project

Feedback