The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.

Slides:



Advertisements
Similar presentations
Great Theoretical Ideas in Computer Science
Advertisements

Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
WSPD Applications.
Divide and Conquer. Subject Series-Parallel Digraphs Planarity testing.
2/14/13CMPS 3120 Computational Geometry1 CMPS 3120: Computational Geometry Spring 2013 Planar Subdivisions and Point Location Carola Wenk Based on: Computational.
3.3 Spanning Trees Tucker, Applied Combinatorics, Section 3.3, by Patti Bodkin and Tamsen Hunter.
One of the most important problems is Computational Geometry is to find an efficient way to decide, given a subdivision of E n and a point P, in which.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
פרויקט בתכנות מחקר השוואתי בשחזור עצי אבולוציה: אלגוריתמים קיימים מול תכנות בשלמים אביב 2013 מרצה: שלמה מורן מנחה חיצוני: יוסי שילוח Website:
PLGW01 - September Inferring Phylogenies from LCA distances (back to the basics of distance-based phylogenetic reconstruction) Ilan Gronau Shlomo.
CSL758 Instructors: Naveen Garg Kavitha Telikepalli Scribe: Manish Singh Vaibhav Rastogi February 7 & 11, 2008.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Bioinformatics Algorithms and Data Structures
Fast Algorithms for Minimum Evolution Richard Desper, NCBI Olivier Gascuel, LIRMM.
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
. Advanced programming Algorithms for reconstructing phylogenetic trees spring 2006 Lecturer: Shlomo Moran, Taub 639, tel 4363 TA: Ilan Gronau,
. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau.
Great Theoretical Ideas in Computer Science.
The Tree of Life From Ernst Haeckel, 1891.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #12 © Ilan Gronau.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Linear Least Squares and its applications in distance matrix methods Presented by Shai Berkovich June, 2007 Seminar in Phylogeny, CS Based on the.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Rooted Trees. More definitions parent of d child of c sibling of d ancestor of d descendants of g leaf internal vertex subtree root.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #13 © Ilan Gronau.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Estimating Evolutionary Distances from DNA Sequences Lecture 14 ©Shlomo Moran, parts based on Ilan Gronau.
Phylogenetic Trees Lecture 2
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Algorithm Animation for Bioinformatics Algorithms.
CSC 2300 Data Structures & Algorithms February 6, 2007 Chapter 4. Trees.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
CMSC 341 Introduction to Trees. 8/3/2007 UMBC CMSC 341 TreeIntro 2 Tree ADT Tree definition  A tree is a set of nodes which may be empty  If not empty,
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Plgw03, 17/12/07 1 On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities Ilan Gronau Shlomo Moran Technion – Israel Institute of Technology.
The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
Lectures on Greedy Algorithms and Dynamic Programming
Introduction to Graph Theory
3.2 Matchings and Factors: Algorithms and Applications This copyrighted material is taken from Introduction to Graph Theory, 2 nd Ed., by Doug West; and.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Recursive Data Structures and Grammars  Themes  Recursive Description of Data Structures  Recursive Definitions of Properties of Data Structures  Recursive.
CMSC 341 Introduction to Trees. 2/21/20062 Tree ADT Tree definition –A tree is a set of nodes which may be empty –If not empty, then there is a distinguished.
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
1 CMSC 341 Introduction to Trees Textbook sections:
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
Prüfer code algorithm Algorithm (Prüfer code)
Consistent and Efficient Reconstruction of Latent Tree Models
dij(T) - the length of a path between leaves i and j
Computing Connected Components on Parallel Computers
B+ Tree.
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
Taibah University College of Computer Science & Engineering Course Title: Discrete Mathematics Code: CS 103 Chapter 10 Trees Slides are adopted from “Discrete.
Basic Graph Algorithms
Phylogeny.
Perfect Phylogeny Tutorial #10
Presentation transcript:

The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

2 Recall: Distance-Based Reconstruction: Input: distances between all taxon-pairs Output: a tree (edge-weighted) best-describing the distances

3 Requirements from Distance-Based Tree-Reconstruction Algorithms 1.Consistency: If the input metric is a tree metric, the returned tree should be the (unique) tree which fits this metric. 2.Efficiency: poly-time, preferably no more than O(n 3 ), where n is the number of leaves (ie, the distance matrix is nXn). 3.Robustness: if the input matrix is “close” to tree metric, the algorithm should return the corresponding tree. Definition: Tree metric or additive distances are distances which can be realized by a weighted tree. A natural family of algorithms which satisfy 1 and 2 is called “Neighbor Joining”, presented next. Then we present one such algorithm which is known to be robust in practice.

4 The Neighbor Joining Tree-Reconstruction Scheme 1. Use D to select pair of neighboring leaves (cherries) i,j 2.Define a new vertex v as the parent of the cherries i,j 3.Compute a reduced (n-1) ✕( n-1) distance matrix D’, over S’=S \ {i,j}  {v}: Important: need to compute distances from v to other vertices in S’, s.t. D’ is a distance matrix of the reduced tree T’, obtained by prunning i,j from T. Start with an n ✕ n distance matrix D over a set S of n taxa (or vertices, or leaves) D’ D i v j

5 The Neighbor Joining Tree-Reconstruction Scheme 4.Apply the method recursively on the reduced matrix D’, to get the reduced tree T’. 5.In T’, add i,j as children of v (and possibly update edge lengths). Recursion base: when there are only two objects, return a tree with 2 leaves. v j i D’ v T’ Question: how can we find cherries?

6 Consistency of Neighbor Joining Theorem: Assume that the following holds for each input tree-metric D defined by some weighted tree T: 1.Correct Neighbor Selection: The vertices chosen at step 1 are cherries in T. 2.Correct Updating: The reduced matrix D’ is a distance matrix of some weighted tree T’, which is obtained by replacing in T the cherries i,j by their parent v (T’ is the reduced tree). Then the neighbor joining scheme is consistent: For each D which defines a tree metric it returns the corresponding tree T.

7 Least Common Ancestor Depth Let i,j be leaves in T, and let r  i,j be a vertex in T. LCA r (i,j) is the Least Common Ancestor of i and j when r is viewed as a root. If r is fixed we just write LCA(i,j). d T (r,LCA(i,j)) is the “depth of LCA r (i,j)”. i j r d T (r,LCA(i,j))

8 Let T be a weighted tree, with a root r. For leaves i,j ≠r, let L (i,j)=d T (r,LCA(i,j)) Then if : Cherries maximize the LCA Depth i j r j v Then i and j are cherries. This property can be used to select cherries pairs. The “Saitou&Nei” NJ algorithm uses a variant of this property.

9 Saitou & Nei’s Neighbor Joining Algorithm (1987)  ~13,000 citations ( Science Citation Index )  Implemented in numerous phylogenetic packages  Fastest implementation - θ(n 3 )  Usually referred to as “the NJ algorithm”  Identified by its neigbor selection criterion Saitou & Nei’s  neighbor-selection criterion

10 Consistency of Seitou&Nei method Theorem (Saitou&Nei) Assume all edge weights of T are positive. If Q(i,j)=max {i’,j’} Q(i’,j’), then i and j are cherries in the tree. Proof: in the following slides.

Intuition: NJ “tries” to selects taxon-pairs with average deepest LCA The addition of D(i,j) is needed to make the formula consistent. Next we prove the above equality. Saitou & Nei’s Selection criterion: Select i,j which maximize 1 st step in the proof: Express Saitou&Nei selection criterion in terms of LCA distances

12 Proof of equality in previous slide -2d(r,LCA r (i,j)) riri rjrj

13 2 nd step in proof: Consistency of Saitou&Nei Neighbor Selection For a vertex i, and an edge e: N i (e) = |{r  S : e is on path(i,r)}| Then: Note: If e’ is a “leaf edge”, then w(e’) is added exactly once to Q(i,j). i j r Rest of T e path(i,j)

14 Let (see the figure below): path(i,j) = (i,...,k,j). T 1 = the subtree rooted at k. WLOG that T 1 has at most n/2 leaves. T 2 = T \ T 1. i j k T1T1 T2T2 Assume for contradiction that Q’(i,j) is maximized for i,j which are not cherries. i’ j’ Let i’,j’ be any two cherries in T 1. We will show that Q’(i’,j’) > Q’(i,j). Consistency of Saitou&Nei (cont)

15 i j k T1T1 T2T2 Proof that Q’(i’,j’)>Q’(i,j): i’ j’ Each leaf edge e adds w(e) both to Q’(i,j) and to Q’(i’,j’), so we can ignore the contribution of leaf edges to both Q’(i,j) and Q’(i’,j’) Consistency of Saitou&Nei (cont)

16 i j k T1T1 T2T2 i’ j’ Location of internal edge e # w(e) added to Q’(i,j) # w(e) added to Q’(i’,j’) e  path(i,j) 1N i’ (e)≥2 e  path(i’,j) N i (e) < n/2N i’ (e) ≥ n/2 e  T\path(i,i’) N i (e) =N i’ (e) Since there is at least one internal edge e in path(i,j), Q’(i’,j’) > Q’(i,j). QED Contribution of internal edges to Q(i,j) and to Q(i’,j’) Consistency of Saitou&Nei (end)

17 Initialization: θ(n 2 ) to compute Q(i,j) for all i,j  L. Each Iteration: u O(n 2 ) to find the maximal Q(i,j), and to update the values of Q(x,y) Total: O(n 3 ) Complexity of Seitou&Nei NJ Algorithm