Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Slides:

Advertisements

Similar presentations

CS 336 March 19, 2012 Tandy Warnow.

Advertisements

Great Theoretical Ideas in Computer Science for Some.

Testing planarity part 1 Thomas van Dijk. Preface Appendix of Planar Graph Drawing Quite hard to read So we’ll try to explain it, not just tell you about.

Edge-connectivity and super edge-connectivity of P 2 -path graphs Camino Balbuena, Daniela Ferrero Discrete Mathematics 269 (2003) 13 – 20.

Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.

Divide and Conquer. Subject Series-Parallel Digraphs Planarity testing.

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.

Greedy Algorithms Greed is good. (Some of the time)

Presented by Yuval Shimron Course

CSC401 – Analysis of Algorithms Lecture Notes 14 Graph Biconnectivity

1 Steiner Tree on graphs of small treewidth Algorithms and Networks 2014/2015 Hans L. Bodlaender Johan M. M. van Rooij.

Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.

Combinatorial Algorithms

Graph Drawing and Information Visualization Laboratory Department of Computer Science and Engineering Bangladesh University of Engineering and Technology.

D. Gusfield, V. Bansal (Recomb 2005) A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters.

HCS Clustering Algorithm

NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.

A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.

Approximation Algorithms

Increasing graph connectivity from 1 to 2 Guy Kortsarz Joint work with Even and Nutov.

Testing Metric Properties Michal Parnas and Dana Ron.

Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 14 Strongly connected components Definition and motivation Algorithm Chapter 22.5.

Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.

Complexity ©D.Moshkovitz 1 Paths On the Reasonability of Finding Paths in Graphs.

Steiner trees Algorithms and Networks. Steiner Trees2 Today Steiner trees: what and why? NP-completeness Approximation algorithms Preprocessing.

MCA 520: Graph Theory Instructor Neelima Gupta

Graph Algorithms Using Depth First Search Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms.

Introduction to Graph Theory

Subdivision of Edge In a graph G, subdivision of an edge uv is the operation of replacing uv with a path u,w,v through a new vertex w.

Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.

V. V. Vazirani. Approximation Algorithms Chapters 3 & 22

Fixed Parameter Complexity Algorithms and Networks.

MST Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.

Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.

Modular Decomposition and Interval Graphs recognition Speaker: Asaf Shapira.

Worst-case optimal approximation algorithms for maximizing triplet consistency within phylogenetic networks Jaroslaw Byrka 1,2, Steven Kelk 2, Katharina.

Lecture 22 More NPC problems

Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R

UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.

A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.

Tree A connected graph that contains no simple circuits is called a tree. Because a tree cannot have a simple circuit, a tree cannot contain multiple.

 2004 SDU Lecture 7- Minimum Spanning Tree-- Extension 1.Properties of Minimum Spanning Tree 2.Secondary Minimum Spanning Tree 3.Bottleneck.

Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.

Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.

Phylogenetic networks: recent questions and results (or: constructing a level-2 phylogenetic network from a dense set of input triplets in polynomial time)

Unit – V Graph theory. Representation of Graphs Graph G (V, E,  ) V Set of vertices ESet of edges  Function that assigns vertices {v, w} to each edge.

Computing Branchwidth via Efficient Triangulations and Blocks Authors: F.V. Fomin, F. Mazoit, I. Todinca Presented by: Elif Kolotoglu, ISE, Texas A&M University.

Chapter 8 Maximum Flows: Additional Topics All-Pairs Minimum Value Cut Problem  Given an undirected network G, find minimum value cut for all.

The full Steiner tree problem Theoretical Computer Science 306 (2003) C. L. Lu, C. Y. Tang, R. C. T. Lee Reporter: Cheng-Chung Li 2004/06/28.

Great Theoretical Ideas in Computer Science for Some.

1 Assignment #3 is posted: Due Thursday Nov. 15 at the beginning of class. Make sure you are also working on your projects. Come see me if you are unsure.

Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.

 2004 SDU 1 Lecture5-Strongly Connected Components.

NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.

12. Lecture WS 2012/13Bioinformatics III1 V12 Menger’s theorem Borrowing terminology from operations research consider certain primal-dual pairs of optimization.

An Algorithm for the Consecutive Ones Property Claudio Eccher.

Data Structures and Algorithm Analysis Graph Algorithms Lecturer: Jing Liu Homepage:

The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.

Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)

Lecture 12 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.

Lecture 12 Algorithm Analysis

Chapter 5. Optimal Matchings

Graph Algorithms Using Depth First Search

Planarity Testing.

CS 583 Analysis of Algorithms

Lecture 12 Algorithm Analysis

Trees 11.1 Introduction to Trees Dr. Halimah Alshehri.

Lecture 12 Algorithm Analysis

Minimum Spanning Trees

Presentation transcript:

Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1) Technische Universiteit Eindhoven (TU/e) (2) Centrum voor Wiskunde en Informatica (CWI), Amsterdam Web:

Triplet-based methods (1) Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.) Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor zwxxyzyxwwzy algorithm wzxy solution

Triplet-based methods (2) Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.) Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor zwx xyzyxwwzy algorithm wzxy solution

Triplet-based methods (2) Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.) Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor zwx xyz yxwwzy algorithm wzxy solution

From trees to networks… The algorithm of Aho et al. (1981) can be used to construct trees from rooted triplets. But…what if the algorithm fails? Why might the algorithm fail? Possible reason 1: The underlying evolution is tree-like, but the input triplets contain errors. Possible reason 2: The triplets are correct, but the underlying evolution is not tree-like. Biological phenomena such as hybridization, horizontal gene transfer, recombination and gene duplication can lead to evolutionary scenarios that are not tree-like! Response: try and construct not phylogenetic trees, but phylogenetic networks

From trees to networks (2) xyzxzy For example, suppose the input is {xy|z, xz|y}. z x y (Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

From trees to networks (2) xyz xzy For example, suppose the input is {xy|z, xz|y}. z x y (Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

From trees to networks (2) xyz xzy For example, suppose the input is {xy|z, xz|y}. z x y (Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

Level-k phylogenetic networks z x y root (only one!) leaf- vertex split-vertex recombination-vertex A level-k phylogenetic network is a rooted, directed acyclic graph where every biconnected component (in the underlying undirected graph) contains at most k recombination vertices.

A set of input triplets is dense iff, for every subset of 3 species, there is at least one triplet corresponding to those 3 species. Therefore, a dense set of input triplets for n species contains O(n 3 ) triplets. Jansson & Sung (2006) showed: Level-1 Networks Given a dense set of triplets T for a set L of species, it is possible to determine in polynomial-time whether a level-1 phylogenetic network N exists such that all the triplets in T are consistent with N. (And if so, to construct such a network.) They later showed, together with Nguyen, how to do this in time linear in |T|. They also showed that, in the non-dense case, the problem is NP-hard. But what about level-2 networks, and higher?

Here is an example of a level-2 network. Main result: Given a dense set of triplets T for a set L of species, it is possible to determine in time O(|T| 3 ) whether a level-2 phylogenetic network N exists such that all the triplets in T are consistent with N. (And if so, to construct such a network.)

Algorithm, basic idea The basic idea behind Aho’s algorithm for trees is that we are able to determine, recursively, which species belong to which of the two subtrees hanging from some root vertex. For the level-1 and level-2 networks if there again exists such a clear dichotomy, we iterate on the two subsets. root Sub- network Sub- network

Algorithm, basic idea The basic idea behind Aho’s algorithm for trees is that we are able to determine, recusively, which species belong to which of the two subtrees hanging from some root vertex. For the level-1 networks if there again exists such a clear dichotomy, we iterate on the two subsets. Otherwise there must exist a network of the form Sub- network

Algorithm, basic idea The basic idea behind Aho’s algorithm for trees is that we are able to determine, recusively, which species belong to which of the two subtrees hanging from some root vertex. For the level-1 networks if there again exists such a clear dichotomy, we iterate on the two subsets. Otherwise there must exist a network of the form Sub- network Find the partition of the species (leaves) into the subnetworks Find the blue backbone network Treat each of the partition elements (sub-networks) as leaves to be hanged on the backbone Recurse on the subnetworks

Algorithm, high-level idea For level-2 networks the idea is similar: Sub- network Find the partition of the species (leaves) into the subnetworks There is a complication in level-2 Find the blue backbone network! There are more level-2 backbone forms Treat each of the partition elements (sub-networks) as (meta-)leaves to be hanged on the backbone Recurse on the subnetworks

Suppose I have a partition P = {P 1, P 2, …, P t } of the leaf set L. Suppose I have a dense set of triplets T on the leaf set L. Let T’ be a new triplet set on leaf set {q 1, q 2,…, q t } defined as follows: q i q j |q k is in T’ if and only if i≠j≠k and there exists a triplet xy|z in T such that x is in P i, y is in P j and z is in P k Then we say that T’ is the triplet set induced by the partition P of L. Critically: if T is dense, then T’ is also dense. In some sense this can be perceived as a ‘coarsening’ of the input set. Definition: inducing new triplet sets from partitions of the leaf set

Definition: simple level-2 networks Lemma: There are exactly 4 different backbone networks A simple level-2 network is any network obtained by “hanging leaves” off one of the above structures.

Here the leaves {a,b,c,d,e,f,g,h} have been ‘hung’ from structure 8a, to yield a simple level-2 network. A picture description of the simple level-2 algorithm

Level-2 network algorithm Assume some oracle gives us the partition of the leaves into subnetworks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar

Suppose we can correctly ‘guess’ that leaf g hangs directly below a recombination node If we remove g, and all triplets that contain g, then we know that a level-1 network must be possible on this new set of triplets (because now fewer recombination nodes are needed)

Level-2 network algorithm Assume some oracle gives us the partition of the leaves into subnetworks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set”

Caterpillar set A caterpillar set with respect to a dense triplet set T is the set of leaves of a caterpillar subgraph of a network consistent with T The empty set is also a caterpillar set Caterpillar

Suppose we subsequently guess that the caterpillar with h now hangs below a recombination node in the new network. If we remove the h- caterpillar, and all triplets that contain leaves of it, then we know that a level-0 network must be possible on this new set of triplets (because now even fewer recombination nodes are needed.)

Level-2 network algorithm Assume some oracle gives us the partition of the leaves into subnetworks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set” Remove it and remove the triplets that contain any element of this set Construct the unique tree for the remaining triplets [Jansson&Sung 2006]

In such a case the resulting tree is UNIQUE (J&S).

So now we have a tree. We are going to guess how to add the h-caterpillar back in, and then guess how to add leaf g back in.

Adding the h-caterpillar back in.

And finally adding leaf g back in. g

Level-2 network algorithm Assume some oracle gives us the partition of the leaves into subnetworks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set” Remove it and remove the triplets that contain any element of this set Construct the unique tree for the remaining triplets [Jansson&Sung 2006] Insert the caterpillar set and the recombination leaf in the tree in the correct way For each pair of guesses try all 4 backbone structures

Simple level-2 algorithm Theorem: The simple level-2 network algorithm works in O(|T|^3)

SN-sets to partition the set of leaves Jansson & Sung introduced the SN-set to partition the set of leaves SN-sets are special subsets of the leaves L, and are defined w.r.t. T All sets containing just a single leaf, are SN-sets. Any other SN-set is any subset of leaves obtained by taking the closure of some subset S of the leaves L w.r.t. the following operation If x,y є S and xz|y є T or yz|x є T then z є S The SN-set that is equal to the total leaf set L, is called the trivial SN-set. An SN-set that is non-trivial, and is not a strict subset of any other non-trivial SN-set, is called a maximal SN-set. (If the network is a tree there are 2 maximal SN-sets: one the set of leaves of the subtree right and the other the set of leaves of the subtree left of the root)

Jansson and Sung proved that the set of maximal SN-sets indeed partition the leaf set L. So no two maximal SN-sets overlap, and they completely cover the set of input leaves. All SN-sets and all maximal SN-sets can be found in polynomial-time. Jansson & Sung solved the level-1 problem by observing that each maximal SN-sets hangs as a ‘meta-leaf’ on the level-1 backbone network; each maximal SN-set can completely be separated from the rest of the network by removing just one edge There are maximal SN-sets in level-2 networks that can hang under more than one edge!!!! Definition: maximal SN-set

Definition highest cut-edge In a phylogenetic network N, a cut-edge (x,y) is an edge whose removal disconnects the undirected graph. A cut-edge (x,y) is said to be a trivial cut edge iff y is a leaf. A cut-edge (x,y) is said to be highest iff there is no cut-edge (p,q) such that there is a directed path from q to x in N.

Fact. Let (x,y) be a highest cut-edge and let L’ be the set of leaves reachable from y. Let L* be a strict subset of L’. Then L* is not a maximal SN-set. Proof: the set of leaves reachable from a highest cut-edge (x,y), is itself an SN-set. Clearly for any two leaves p,q in L’ and leaf r outside L’ there cannot be triplets pr|q and qr|p: the edge (x,y) forms a bottleneck. Thus pq|r must exist. y x pqr prq L’ So: each maximal SN-set can be expressed as the union of the leaves reachable by one or more highest cut-edges.

Central Theorem (simplified). Suppose there is a dense triplet set T consistent with some simple level-2 network N. Then there exists a level-2 network N’ (not necessarily simple) such that, with the exception of perhaps one maximal SN-set with respect to T, every maximal SN-set appears below a single cut-edge in N’. The remaining, ‘odd-one-out’ maximal SN-set (if it exists) will be equal to the union of leaves below two cut-edges. In other words: there exists at most one maximal SN-set which is the union of the leaves below two highest cut-edges, whereas all other SN-sets consist of the leaves below one highest cut-edge

The algorithm Determine the maximal SN-sets Guess the right SN-set to be split Treat the max SN-sets and the two split sets as leaves {S 1,S 2,…,S q } Adapt T to a new triplet set T’: S i S k |S h є T’ if and only if there exist xєS i, yєS k,zєS h s.t. xy|z є T Construct a simple level-2 network for T’ Recursively find the sub-networks for the sets S 1,S 2,…,S q

Conclusions & open problems So we know how to efficiently construct level-2 networks from dense triplet sets. What’s next? Applicability: how useful is it? Initial implementation: programming and fine-tuning Improving running time: in the spirit of the “SN-tree” of J&S&N Complexity: what about level-3 and higher? Bounds: worst-case, best-case scenarios Building all networks Properties of output networks as function of input Different triplet restrictions Confidence: how good are the solutions? Exponential-time exact algorithms for NP-hard problems