Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope Ruriko Yoshida.

Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope
Ruriko Yoshida

Origins of Species Figure Genomes 3 (© Garland Science 2007)

Tree (or web) of life bacteria archaea eukarya
History of life is written in the genes. Mount DW (2001) Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York

Goals of molecular phylogenetics
Read the evolutionary history of organisms genes Stechmann A, Cavalier ST (2002) Rooting the eukaryote tree by using a derived gene fusion. Science 297: 89-91

Gene tree on a Species tree
Maddison WP (1997) Gene trees in species trees. Systematic Biology 46:

Phylogenetic tree reconstruction
The maximum likelihood estimation (MLE) methods: These methods describe evolution in terms of a discrete-state continuous-time Markov process. The Balanced Minimum Evolution (BME) method: This is a distance based method and Weighted Least Square method. The Neighbor-Joining (NJ) method is a greedy algorithm of the BME method (Steel and Gasquel, 2008). Bayesian inference for trees: Use Bayes Theorem and MCMC to estimate the posterior distribution rather than obtaining the point estimation.

Distance Matrix A distance matrix for a tree T is a matrix D whose entry Dij stands for the mutation distance between i and j.

Distance matrix 6 8 9 12 11 7 10 3 5 4

Balanced Minimum Evolution
BME is a weighted least squares distance based method which puts more emphasis on the shorter distances. Given a distance matrix D, the BME method can assign edge lengths to any binary tree topology T with n leaves. Goal of BME, given fixed D, is to find the binary tree T with the smallest sum of total branch lengths ∆D(T) (assigned by BME). min ∆D(T) for all (2n-5)!! tree topologies. 1 2 3 4 5 6 8 9 12 11 7 10 BME + =

Balanced Minimum Evolution
The BME is also a distance based method. A weighted LS method to find the closest tree metric such that the total branch lengths of the tree is the smallest. Based on Pauplin’s formula, ∆D(τ), which estimates the total length of a tree, based on: [Pauplin 2000 J Mol Evol 51] (1) its topology τ, (2) an estimated distance matrix D = (Dij). The BME method is to find the tree topologyτ such that min ∆D(τt) for τt, t=1,··· ,(2n−5)!!.

Pauplin’s formula Pauplin’s formula is defined as:
∆D(τ) = Σi<j Wij(τ)Dij, where Wij(τ) = (2)(1−# of branches between i and j) for a particular tree topology τ.

Example For the tree topology above, we have

Pauplin’s formula Note that Pauplin’s formula can be seen as a linear programming problem such that min D· x such that x∈PMEn where PMEn = conv{Wτ1 ,···, Wτ(2n-5)!!}. We call PMEn a BME polytope. Thus, the set of all D such that the topology τt is minimal is the normal cone at a vertex Wτt. We call this BME cone for a topology τt.

Combinatorics of the BME polytopes
For up to n = 7 taxa, we computed BME polytopes and studied their structure. n dimension of BME polytope F-vector 4 2 (3, 3) 5 (15, 105, 250, 210, 52) 6 9 (105, 5460, ?, ?, ?, 90262) 7 14 (945, , ?, ?, ?, ?, ?) For n = 5,6, the number of edges is n choose 2, so all pairs of bifurcating tree topologies τ1, τ2 on n ≤ 6 taxa can be cooptimal for BME, which we found surprising. But for n = 7, there is one combinatorial type of non-edge.

Combinatorial type of non-edge
with n = 7.

Edges and non-edges of the BME polytope
We still do not understand which pairs of trees will form edges on the BME polytope. If we did understand the edges, then we might be able to devise a competitive alternative to FastME that improves trees by walking along edges on the BME polytope, rather than performing tree moves. Edge-walking is called the simplex algorithm in linear programming, and it works very well in practice.

SPR move adjacency Theorem [Haws, Hodge, and Y, 2010] If a pair of bifurcating tree topologies τ1, τ2 on n taxa are adjacent by a subtree prune regraft (SPR) move then Wτ1 and Wτ2 are on an edge of the BME polytope PMEn. This means that a pair of bifurcating tree topologies τ1, τ2 on n taxa adjacent by a SPR move implies they are adjacent by an edge on the BME polytope.

Subtree Prune Regraft (SPR) moves
First, select a subtree of the given tree. Next, we detach the selected subtree. Then we attempt to regraft it onto another branch of the remaining tree, in such a way that a new tree is formed.

NNI move adjacency Corollary [Haws, Hodge, and Y, 2010] If a pair of bifurcating tree topologies τ1, τ2 on n taxa are adjacent by a Nearest Neighbor Interchange (NNI) move then Wτ1 and Wτ2 are on an edge of the BME polytope PMEn. Thus we know that the software FASTME attempts to find an optimal solution via edge moves over BME polytope.

Balanced minimum evolution cones
For each bifurcating tree topology τ, the BME cone of τ is the set of all choices of pairwise distances D = (Dij) for which τ minimizes the dot-product D · Wτ . The edges of the BME polytope emanating from the vertex Wτ determine the facets (flat sides) of the BME cone of τ. The facets of the BME polytope that contain Wτ determine the extreme rays of the BME cone of τ. (This is a perfect example of duality.) BME cones are convex. Thus the BME method is convex: If the BME method outputs tree topology τ for two inputs D, D’, then BME will also output τ on the input (D + D’)/2.

Comparing NJ and BME cones
Neighbor-joining (NJ) method: This is the most popular distance based method. It computes a tree from all pair-wise distances obtained which are easily obtained. (Saito and Nei (1987), Studier and Keppler (1988)) by picking cherries. Fact: The NJ algorithm is a greedy algorithm to find the BME tree (Gascuel and Steel (2006)). From this point of view, NJ is “optimal” whenever the NJ algorithm outputs the tree which minimizes the BME criterion. Question: Given a tree T with any particular order of picking its pairs of leaves, is there a dissimilarity map such that the NJ Algorithm returns the BME tree T?

Neighbor joining: Fast and consistent
Neighbor joining is a popular algorithm for phylogenetic reconstruction. Input is pairwise distances D = (Dij), presumed to arise as a perturbed tree metric. Output is a tree topology which induces a tree metric that is hopefully close to D. Intuition: Find two nodes which are ’close,’ and join them as siblings (a ‘cherry’) in the tree. Quantitatively neighbor joining joins nodes a, b which have minimal Q-value: Qab=Dab− (Σk Dak+ Σk Dbk) / (n−2)

Neighbor joining: Fast and consistent
Nodes a, b are then replaced by a single new node z which is the root of the cherry (a, b), and distances Dzk are defined as Dzk = Dak + Dbk − 2Dab. Then neighbor joining is applied recursively on the remaining nodes, until a tree is obtained. Using Q matrix instead of the original distance matrix compensates for short internal edges. In fact neighbor joining based on Q is consistent: Given a tree metric D = DT as input, NJ will correctly output tree T .

Neighbor joining cones
Notice that all elements in Q are linear in the distances, so picking a cherry (a, b) in the tree means that the distances satisfy linear inequalities. Also, after picking cherry (a, b) and replacing it with a new node z, the new distances Dzk are linear in the old distances: Dzk = Dak + Dbk − 2Dab. Thus NJ will output a particular tree topology T, and pick cherries in a particular order, iff the original distances Dij satisfy certain linear inequalities. The inequalitiesdefine a cone (apex 0) in Rn(n-1)/2, which we call a NJ cone. So NJ will output a particular tree topology T iff the pairwise distances D ∈ Rn(n-1)/2 lie in a union of NJ cones.

Issues with neighbor joining
Neighbor joining is fast and consistent, but it isn’t based on an evolutionary model. Until recently, it hasn’t been very clear what NJ is optimizing — if anything at all. (NJ is a greedy algorithm of the BME, Gascuel and Steel) Neighbor joining outputs a tree topology T iff the data lies in a union of cones. Unions of cones need not be convex. In fact neighbor joining is not convex: There are distance matrices D, D’, such that NJ produces the same tree T1 when run on input D or D’, but NJ produces a different tree T2 not equal to T1 when run on the input (D + D’)/2

NJ and BME cones Theorem [Haws, Hodge, Y (2010)]
Given a tree T with any number of taxa and any particular order σ of picking its pairs of leaves, the BME cone and every NJ cone associated to a tree $T$ and for any order σ have an intersection of positive measure. This result is particularly important in phylogenetics since this shows that even though the NJ Algorithm is a greedy algorithm, with any order to pick leaf pairs, the NJ Algorithm will return the BME tree for some dissimilarity map.

Faces of BME Polytope A clade of a binary tree T is the subtree given by an internal node and all its decendents. Blue and red boxes are clades, while green is not a clade.

Clade-Faces of the BME Polytope
Theorem [H., Hodge, Yoshida (2010)] Every disjoint collection of clades C1,C2,…,Ck gives a face of the BME polytope, F(C1,C2,…,Ck) = { T | C1,C2,…,Ck ∈ T } We can now describe a large class of faces of the BME polytope. Note: Clade-face is a smaller dimensional BME polytope.

Future work Want to show if two trees are adjacent with a Tree Bisection and Reconnection (TBR) move, then they can be cooptimal in terms of Pauplin’s formula for BME. Is there a combinatorial criterion (or at least sufficient conditions) for when two tree topologies form an edge on the BME polytope? Can this be used as a better way to move through tree space? (Bordewich et al, 2009) showed that the hill climbing algorithm with SPR moves, one can achieve the BME optimal solution with any input dissimilarity map. This shows that BME method is consistent with SPR moves (since TBR moves are generalization of SPR, this implies BME method is consistent with TBR moves). We would like to show if BME method is consistent with NNI moves.

Acknowledgments http://cophylogeny.net/ UK Phylogenomics group:
D. Haws, T. Hodge and R. Yoshida. "Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope” to appear in Bulletin of Mathematical Biology. Available at UK Phylogenomics group: NIGMS NSF

UK Phylogenomic Group Collaborators: - Jerzy W. Jaromczyk (UK)
- Chris Schardl (UK) David Haws (UK) David Weisrock (UK) Eric O’Niell (UK) Peter Huggins (UK) Arny Stromberg (UK) Pat Calie (EKU) Stephen Richter (EKU) Jinze Liu (UK) etc…… NIGMS NSF

“Are you done yet?” Thank you for your attention!

Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope Ruriko Yoshida.

Similar presentations

Presentation on theme: "Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope Ruriko Yoshida."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope Ruriko Yoshida.

Similar presentations

Presentation on theme: "Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope Ruriko Yoshida."— Presentation transcript:

Similar presentations

About project

Feedback