Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.

Similar presentations


Presentation on theme: "The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau."— Presentation transcript:

1 The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

2 2 Recall: Distance-Based Reconstruction: Input: distances between all taxon-pairs Output: a tree (edge-weighted) best-describing the distances 4 5 7 2 1 2 10 6 1

3 3 Requirements from Distance-Based Tree-Reconstruction Algorithms 1.Consistency: If the input metric is a tree metric, the returned tree should be the (unique) tree which fits this metric. 2.Efficiency: poly-time, preferably no more than O(n 3 ), where n is the number of leaves (ie, the distance matrix is nXn). 3.Robustness: if the input matrix is “close” to tree metric, the algorithm should return the corresponding tree. Definition: Tree metric or additive distances are distances which can be realized by a weighted tree. A natural family of algorithms which satisfy 1 and 2 is called “Neighbor Joining”, presented next. Then we present one such algorithm which is known to be robust in practice.

4 4 The Neighbor Joining Tree-Reconstruction Scheme 1. Use D to select pair of neighboring leaves (cherries) i,j 2.Define a new vertex v as the parent of the cherries i,j 3.Compute a reduced (n-1) ✕( n-1) distance matrix D’, over S’=S \ {i,j}  {v}: Important: need to compute distances from v to other vertices in S’, s.t. D’ is a distance matrix of the reduced tree T’, obtained by prunning i,j from T. Start with an n ✕ n distance matrix D over a set S of n taxa (or vertices, or leaves) D’ D i v j

5 5 The Neighbor Joining Tree-Reconstruction Scheme 4.Apply the method recursively on the reduced matrix D’, to get the reduced tree T’. 5.In T’, add i,j as children of v (and possibly update edge lengths). Recursion base: when there are only two objects, return a tree with 2 leaves. v j i D’ v T’ Question: how can we find cherries?

6 6 Consistency of Neighbor Joining Theorem: Assume that the following holds for each input tree-metric D defined by some weighted tree T: 1.Correct Neighbor Selection: The vertices chosen at step 1 are cherries in T. 2.Correct Updating: The reduced matrix D’ is a distance matrix of some weighted tree T’, which is obtained by replacing in T the cherries i,j by their parent v (T’ is the reduced tree). Then the neighbor joining scheme is consistent: For each D which defines a tree metric it returns the corresponding tree T.

7 7 Least Common Ancestor Depth Let i,j be leaves in T, and let r  i,j be a vertex in T. LCA r (i,j) is the Least Common Ancestor of i and j when r is viewed as a root. If r is fixed we just write LCA(i,j). d T (r,LCA(i,j)) is the “depth of LCA r (i,j)”. i j r d T (r,LCA(i,j))

8 8 Let T be a weighted tree, with a root r. For leaves i,j ≠r, let L (i,j)=d T (r,LCA(i,j)) Then if : Cherries maximize the LCA Depth i j r j v Then i and j are cherries. This property can be used to select cherries pairs. The “Saitou&Nei” NJ algorithm uses a variant of this property.

9 9 Saitou & Nei’s Neighbor Joining Algorithm (1987)  ~13,000 citations ( Science Citation Index )  Implemented in numerous phylogenetic packages  Fastest implementation - θ(n 3 )  Usually referred to as “the NJ algorithm”  Identified by its neigbor selection criterion Saitou & Nei’s  neighbor-selection criterion

10 10 Consistency of Seitou&Nei method Theorem (Saitou&Nei) Assume all edge weights of T are positive. If Q(i,j)=max {i’,j’} Q(i’,j’), then i and j are cherries in the tree. Proof: in the following slides.

11 Intuition: NJ “tries” to selects taxon-pairs with average deepest LCA The addition of D(i,j) is needed to make the formula consistent. Next we prove the above equality. Saitou & Nei’s Selection criterion: Select i,j which maximize 1 st step in the proof: Express Saitou&Nei selection criterion in terms of LCA distances

12 12 Proof of equality in previous slide -2d(r,LCA r (i,j)) riri rjrj

13 13 2 nd step in proof: Consistency of Saitou&Nei Neighbor Selection For a vertex i, and an edge e: N i (e) = |{r  S : e is on path(i,r)}| Then: Note: If e’ is a “leaf edge”, then w(e’) is added exactly once to Q(i,j). i j r Rest of T e path(i,j)

14 14 Let (see the figure below): path(i,j) = (i,...,k,j). T 1 = the subtree rooted at k. WLOG that T 1 has at most n/2 leaves. T 2 = T \ T 1. i j k T1T1 T2T2 Assume for contradiction that Q’(i,j) is maximized for i,j which are not cherries. i’ j’ Let i’,j’ be any two cherries in T 1. We will show that Q’(i’,j’) > Q’(i,j). Consistency of Saitou&Nei (cont)

15 15 i j k T1T1 T2T2 Proof that Q’(i’,j’)>Q’(i,j): i’ j’ Each leaf edge e adds w(e) both to Q’(i,j) and to Q’(i’,j’), so we can ignore the contribution of leaf edges to both Q’(i,j) and Q’(i’,j’) Consistency of Saitou&Nei (cont)

16 16 i j k T1T1 T2T2 i’ j’ Location of internal edge e # w(e) added to Q’(i,j) # w(e) added to Q’(i’,j’) e  path(i,j) 1N i’ (e)≥2 e  path(i’,j) N i (e) < n/2N i’ (e) ≥ n/2 e  T\path(i,i’) N i (e) =N i’ (e) Since there is at least one internal edge e in path(i,j), Q’(i’,j’) > Q’(i,j). QED Contribution of internal edges to Q(i,j) and to Q(i’,j’) Consistency of Saitou&Nei (end)

17 17 Initialization: θ(n 2 ) to compute Q(i,j) for all i,j  L. Each Iteration: u O(n 2 ) to find the maximal Q(i,j), and to update the values of Q(x,y) Total: O(n 3 ) Complexity of Seitou&Nei NJ Algorithm


Download ppt "The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau."

Similar presentations


Ads by Google