Presentation is loading. Please wait.

Presentation is loading. Please wait.

Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Similar presentations


Presentation on theme: "Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)"— Presentation transcript:

1 Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1) Technische Universiteit Eindhoven (TU/e) (2) Centrum voor Wiskunde en Informatica (CWI), Amsterdam Email: S.M.Kelk@cwi.nl Web: http://homepages.cwi.nl/~kelk

2 Triplet-based methods (1) Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.) Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor zwxxyzyxwwzy algorithm wzxy solution

3 Triplet-based methods (2) Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.) Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor zwx xyzyxwwzy algorithm wzxy solution

4 Triplet-based methods (2) Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.) Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor zwx xyz yxwwzy algorithm wzxy solution

5 From trees to networks… The algorithm of Aho et al. (1981) can be used to construct trees from rooted triplets. But…what if the algorithm fails? Why might the algorithm fail? Possible reason 1: The underlying evolution is tree-like, but the input triplets contain errors. Possible reason 2: The triplets are correct, but the underlying evolution is not tree-like. Biological phenomena such as hybridization, horizontal gene transfer, recombination and gene duplication can lead to evolutionary scenarios that are not tree-like! Response: try and construct not phylogenetic trees, but phylogenetic networks

6 From trees to networks (2) xyzxzy For example, suppose the input is {xy|z, xz|y}. z x y (Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

7 From trees to networks (2) xyz xzy For example, suppose the input is {xy|z, xz|y}. z x y (Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

8 From trees to networks (2) xyz xzy For example, suppose the input is {xy|z, xz|y}. z x y (Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

9 Level-k phylogenetic networks z x y root (only one!) leaf- vertex split-vertex recombination-vertex A level-k phylogenetic network is a rooted, directed acyclic graph where every biconnected component (in the underlying undirected graph) contains at most k recombination vertices.

10 A set of input triplets is dense iff, for every subset of 3 species, there is at least one triplet corresponding to those 3 species. Therefore, a dense set of input triplets for n species contains O(n 3 ) triplets. Jansson & Sung (2006) showed: Level-1 Networks Given a dense set of triplets T for a set L of species, it is possible to determine in polynomial-time whether a level-1 phylogenetic network N exists such that all the triplets in T are consistent with N. (And if so, to construct such a network.) They later showed, together with Nguyen, how to do this in time linear in |T|. They also showed that, in the non-dense case, the problem is NP-hard. But what about level-2 networks, and higher?

11 Here is an example of a level-2 network. Main result: Given a dense set of triplets T for a set L of species, it is possible to determine in time O(|T| 3 ) whether a level-2 phylogenetic network N exists such that all the triplets in T are consistent with N. (And if so, to construct such a network.)

12 Algorithm, basic idea The basic idea behind Aho’s algorithm for trees is that we are able to determine, recursively, which species belong to which of the two subtrees hanging from some root vertex. For the level-1 and level-2 networks if there again exists such a clear dichotomy, we iterate on the two subsets. root Sub- network Sub- network

13 Algorithm, basic idea The basic idea behind Aho’s algorithm for trees is that we are able to determine, recusively, which species belong to which of the two subtrees hanging from some root vertex. For the level-1 networks if there again exists such a clear dichotomy, we iterate on the two subsets. Otherwise there must exist a network of the form Sub- network

14 Algorithm, basic idea The basic idea behind Aho’s algorithm for trees is that we are able to determine, recusively, which species belong to which of the two subtrees hanging from some root vertex. For the level-1 networks if there again exists such a clear dichotomy, we iterate on the two subsets. Otherwise there must exist a network of the form Sub- network Find the partition of the species (leaves) into the subnetworks Find the blue backbone network Treat each of the partition elements (sub-networks) as leaves to be hanged on the backbone Recurse on the subnetworks

15 Algorithm, high-level idea For level-2 networks the idea is similar: Sub- network Find the partition of the species (leaves) into the subnetworks There is a complication in level-2 Find the blue backbone network! There are more level-2 backbone forms Treat each of the partition elements (sub-networks) as (meta-)leaves to be hanged on the backbone Recurse on the subnetworks

16 Suppose I have a partition P = {P 1, P 2, …, P t } of the leaf set L. Suppose I have a dense set of triplets T on the leaf set L. Let T’ be a new triplet set on leaf set {q 1, q 2,…, q t } defined as follows: q i q j |q k is in T’ if and only if i≠j≠k and there exists a triplet xy|z in T such that x is in P i, y is in P j and z is in P k Then we say that T’ is the triplet set induced by the partition P of L. Critically: if T is dense, then T’ is also dense. In some sense this can be perceived as a ‘coarsening’ of the input set. Definition: inducing new triplet sets from partitions of the leaf set

17 Definition: simple level-2 networks Lemma: There are exactly 4 different backbone networks A simple level-2 network is any network obtained by “hanging leaves” off one of the above structures.

18 Here the leaves {a,b,c,d,e,f,g,h} have been ‘hung’ from structure 8a, to yield a simple level-2 network. A picture description of the simple level-2 algorithm

19 Level-2 network algorithm Assume some oracle gives us the partition of the leaves into sub- networks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar

20 Suppose we can correctly ‘guess’ that leaf g hangs directly below a recombination node If we remove g, and all triplets that contain g, then we know that a level-1 network must be possible on this new set of triplets (because now fewer recombination nodes are needed)

21

22 Level-2 network algorithm Assume some oracle gives us the partition of the leaves into sub- networks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set”

23 Caterpillar set A caterpillar set with respect to a dense triplet set T is the set of leaves of a caterpillar subgraph of a network consistent with T The empty set is also a caterpillar set Caterpillar

24 Suppose we subsequently guess that the caterpillar with h now hangs below a recombination node in the new network. If we remove the h- caterpillar, and all triplets that contain leaves of it, then we know that a level-0 network must be possible on this new set of triplets (because now even fewer recombination nodes are needed.)

25 Level-2 network algorithm Assume some oracle gives us the partition of the leaves into sub- networks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set” Remove it and remove the triplets that contain any element of this set Construct the unique tree for the remaining triplets [Jansson&Sung 2006]

26 In such a case the resulting tree is UNIQUE (J&S).

27 So now we have a tree. We are going to guess how to add the h-caterpillar back in, and then guess how to add leaf g back in.

28 Adding the h-caterpillar back in.

29 And finally adding leaf g back in. g

30 Level-2 network algorithm Assume some oracle gives us the partition of the leaves into sub- networks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set” Remove it and remove the triplets that contain any element of this set Construct the unique tree for the remaining triplets [Jansson&Sung 2006] Insert the caterpillar set and the recombination leaf in the tree in the correct way For each pair of guesses try all 4 backbone structures

31 Simple level-2 algorithm Theorem: The simple level-2 network algorithm works in O(|T|^3)

32 SN-sets to partition the set of leaves Jansson & Sung introduced the SN-set to partition the set of leaves SN-sets are special subsets of the leaves L, and are defined w.r.t. T All sets containing just a single leaf, are SN-sets. Any other SN-set is any subset of leaves obtained by taking the closure of some subset S of the leaves L w.r.t. the following operation If x,y є S and xz|y є T or yz|x є T then z є S The SN-set that is equal to the total leaf set L, is called the trivial SN-set. An SN-set that is non-trivial, and is not a strict subset of any other non-trivial SN-set, is called a maximal SN-set. (If the network is a tree there are 2 maximal SN-sets: one the set of leaves of the subtree right and the other the set of leaves of the subtree left of the root)

33 Jansson and Sung proved that the set of maximal SN-sets indeed partition the leaf set L. So no two maximal SN-sets overlap, and they completely cover the set of input leaves. All SN-sets and all maximal SN-sets can be found in polynomial-time. Jansson & Sung solved the level-1 problem by observing that each maximal SN-sets hangs as a ‘meta-leaf’ on the level-1 backbone network; each maximal SN-set can completely be separated from the rest of the network by removing just one edge There are maximal SN-sets in level-2 networks that can hang under more than one edge!!!! Definition: maximal SN-set

34 Definition highest cut-edge In a phylogenetic network N, a cut-edge (x,y) is an edge whose removal disconnects the undirected graph. A cut-edge (x,y) is said to be a trivial cut edge iff y is a leaf. A cut-edge (x,y) is said to be highest iff there is no cut-edge (p,q) such that there is a directed path from q to x in N.

35 Fact. Let (x,y) be a highest cut-edge and let L’ be the set of leaves reachable from y. Let L* be a strict subset of L’. Then L* is not a maximal SN-set. Proof: the set of leaves reachable from a highest cut-edge (x,y), is itself an SN-set. Clearly for any two leaves p,q in L’ and leaf r outside L’ there cannot be triplets pr|q and qr|p: the edge (x,y) forms a bottleneck. Thus pq|r must exist. y x pqr prq L’ So: each maximal SN-set can be expressed as the union of the leaves reachable by one or more highest cut-edges.

36 Central Theorem (simplified). Suppose there is a dense triplet set T consistent with some simple level-2 network N. Then there exists a level-2 network N’ (not necessarily simple) such that, with the exception of perhaps one maximal SN-set with respect to T, every maximal SN-set appears below a single cut-edge in N’. The remaining, ‘odd-one-out’ maximal SN-set (if it exists) will be equal to the union of leaves below two cut-edges. In other words: there exists at most one maximal SN-set which is the union of the leaves below two highest cut-edges, whereas all other SN-sets consist of the leaves below one highest cut-edge

37 The algorithm Determine the maximal SN-sets Guess the right SN-set to be split Treat the max SN-sets and the two split sets as leaves {S 1,S 2,…,S q } Adapt T to a new triplet set T’: S i S k |S h є T’ if and only if there exist xєS i, yєS k,zєS h s.t. xy|z є T Construct a simple level-2 network for T’ Recursively find the sub-networks for the sets S 1,S 2,…,S q

38 Conclusions & open problems So we know how to efficiently construct level-2 networks from dense triplet sets. What’s next? Applicability: how useful is it? Initial implementation: programming and fine-tuning Improving running time: in the spirit of the “SN-tree” of J&S&N Complexity: what about level-3 and higher? Bounds: worst-case, best-case scenarios Building all networks Properties of output networks as function of input Different triplet restrictions Confidence: how good are the solutions? Exponential-time exact algorithms for NP-hard problems


Download ppt "Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)"

Similar presentations


Ads by Google