Download presentation
Presentation is loading. Please wait.
Published byColleen Logan Modified over 9 years ago
1
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau
2
2 Recall: Distance-Based Reconstruction: Input: distances between all taxon-pairs Output: a tree (edge-weighted) best-describing the distances 4 5 7 2 1 2 10 6 1
3
3 Requirements from Distance-Based Tree-Reconstruction Algorithms 1.Consistency: If the input metric is a tree metric, the returned tree should be the (unique) tree which fits this metric. 2.Efficiency: poly-time, preferably no more than O(n 3 ), where n is the number of leaves (ie, the distance matrix is nXn). 3.Robustness: if the input matrix is “close” to tree metric, the algorithm should return the corresponding tree. Definition: Tree metric or additive distances are distances which can be realized by a weighted tree. A natural family of algorithms which satisfy 1 and 2 is called “Neighbor Joining”, presented next. Then we present one such algorithm which is known to be robust in practice.
4
4 The Neighbor Joining Tree-Reconstruction Scheme 1. Use D to select pair of neighboring leaves (cherries) i,j 2.Define a new vertex v as the parent of the cherries i,j 3.Compute a reduced (n-1) ✕( n-1) distance matrix D’, over S’=S \ {i,j} {v}: Important: need to compute distances from v to other vertices in S’, s.t. D’ is a distance matrix of the reduced tree T’, obtained by prunning i,j from T. Start with an n ✕ n distance matrix D over a set S of n taxa (or vertices, or leaves) D’ D i v j
5
5 The Neighbor Joining Tree-Reconstruction Scheme 4.Apply the method recursively on the reduced matrix D’, to get the reduced tree T’. 5.In T’, add i,j as children of v (and possibly update edge lengths). Recursion base: when there are only two objects, return a tree with 2 leaves. v j i D’ v T’ Question: how can we find cherries?
6
6 Consistency of Neighbor Joining Theorem: Assume that the following holds for each input tree-metric D defined by some weighted tree T: 1.Correct Neighbor Selection: The vertices chosen at step 1 are cherries in T. 2.Correct Updating: The reduced matrix D’ is a distance matrix of some weighted tree T’, which is obtained by replacing in T the cherries i,j by their parent v (T’ is the reduced tree). Then the neighbor joining scheme is consistent: For each D which defines a tree metric it returns the corresponding tree T.
7
7 Least Common Ancestor Depth Let i,j be leaves in T, and let r i,j be a vertex in T. LCA r (i,j) is the Least Common Ancestor of i and j when r is viewed as a root. If r is fixed we just write LCA(i,j). d T (r,LCA(i,j)) is the “depth of LCA r (i,j)”. i j r d T (r,LCA(i,j))
8
8 Let T be a weighted tree, with a root r. For leaves i,j ≠r, let L (i,j)=d T (r,LCA(i,j)) Then if : Cherries maximize the LCA Depth i j r j v Then i and j are cherries. This property can be used to select cherries pairs. The “Saitou&Nei” NJ algorithm uses a variant of this property.
9
9 Saitou & Nei’s Neighbor Joining Algorithm (1987) ~13,000 citations ( Science Citation Index ) Implemented in numerous phylogenetic packages Fastest implementation - θ(n 3 ) Usually referred to as “the NJ algorithm” Identified by its neigbor selection criterion Saitou & Nei’s neighbor-selection criterion
10
10 Consistency of Seitou&Nei method Theorem (Saitou&Nei) Assume all edge weights of T are positive. If Q(i,j)=max {i’,j’} Q(i’,j’), then i and j are cherries in the tree. Proof: in the following slides.
11
Intuition: NJ “tries” to selects taxon-pairs with average deepest LCA The addition of D(i,j) is needed to make the formula consistent. Next we prove the above equality. Saitou & Nei’s Selection criterion: Select i,j which maximize 1 st step in the proof: Express Saitou&Nei selection criterion in terms of LCA distances
12
12 Proof of equality in previous slide -2d(r,LCA r (i,j)) riri rjrj
13
13 2 nd step in proof: Consistency of Saitou&Nei Neighbor Selection For a vertex i, and an edge e: N i (e) = |{r S : e is on path(i,r)}| Then: Note: If e’ is a “leaf edge”, then w(e’) is added exactly once to Q(i,j). i j r Rest of T e path(i,j)
14
14 Let (see the figure below): path(i,j) = (i,...,k,j). T 1 = the subtree rooted at k. WLOG that T 1 has at most n/2 leaves. T 2 = T \ T 1. i j k T1T1 T2T2 Assume for contradiction that Q’(i,j) is maximized for i,j which are not cherries. i’ j’ Let i’,j’ be any two cherries in T 1. We will show that Q’(i’,j’) > Q’(i,j). Consistency of Saitou&Nei (cont)
15
15 i j k T1T1 T2T2 Proof that Q’(i’,j’)>Q’(i,j): i’ j’ Each leaf edge e adds w(e) both to Q’(i,j) and to Q’(i’,j’), so we can ignore the contribution of leaf edges to both Q’(i,j) and Q’(i’,j’) Consistency of Saitou&Nei (cont)
16
16 i j k T1T1 T2T2 i’ j’ Location of internal edge e # w(e) added to Q’(i,j) # w(e) added to Q’(i’,j’) e path(i,j) 1N i’ (e)≥2 e path(i’,j) N i (e) < n/2N i’ (e) ≥ n/2 e T\path(i,i’) N i (e) =N i’ (e) Since there is at least one internal edge e in path(i,j), Q’(i’,j’) > Q’(i,j). QED Contribution of internal edges to Q(i,j) and to Q(i’,j’) Consistency of Saitou&Nei (end)
17
17 Initialization: θ(n 2 ) to compute Q(i,j) for all i,j L. Each Iteration: u O(n 2 ) to find the maximal Q(i,j), and to update the values of Q(x,y) Total: O(n 3 ) Complexity of Seitou&Nei NJ Algorithm
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.