Download presentation
Presentation is loading. Please wait.
Published byHendri Dharmawijaya Modified over 6 years ago
1
Speaker: Chuang-Chieh Lin National Chung Cheng University
Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships A. Ben-Dor, B. Chor, D. Graur, R. Ophir, D. Pelleg Journal of Computational Biology, Vol. 5, 1998, pp. 377390. Elucidation: 說明、解釋 Speaker: Chuang-Chieh Lin National Chung Cheng University 2019/1/14
2
Computation Theory Lab, CSIE, CCU, Taiwan
Outline Introduction and preliminaries Problem description The dynamic programming algorithm The space complexity and the time complexity 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
3
Computation Theory Lab, CSIE, CCU, Taiwan
Evolutionary trees Let S be a set of taxa and | S | = n. An evolutionary tree T on S is an unrooted, leaf-labeled tree such that the leaves of T are bijectively labeled by the taxa in S, and each internal node of T has degree 3. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
4
Computation Theory Lab, CSIE, CCU, Taiwan
Evolutionary trees For 4 taxa a, b, c, d, we have 3 possible topologies: a c a b a c b d c d d b [ad|bc] [ab|cd] [ac|bd] 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
5
Evolutionary trees (contd.)
For 5 taxa a, b, c, d, e, how many possible evolutionary trees can we derive? The answer is: 5 3 = 15. a c There are 5 possible positions for e to be inserted. b d 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
6
Evolutionary trees (contd.)
For n taxa, how many possible evolutionary trees can we derive? The answer is (2n 5)!! This observation can be verified by induction on n. For an odd positive integer n, it is defined that n!! = n (n 2) (n 4) … 3 1. If n = 15, (2n 5)!! is approximately 8 1012. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
7
Computation Theory Lab, CSIE, CCU, Taiwan
Let us analyze n!! in another way. For a nonnegative integer m 0, let n = 2m + 1. Then we have ( 2 m + 1 ) ! = P . S o ( 2 m + 1 ) ! = O . 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
8
Computation Theory Lab, CSIE, CCU, Taiwan
(2n−5)!! = O(nn−2) For n taxa, we have (2n 5)!! = O((n 3)n2) = O(nn−2) possible evolutionary trees. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
9
Computation Theory Lab, CSIE, CCU, Taiwan
Quartet topologies A set of four taxa is called a quartet. Given an evolutionary tree T and a quartet {a, b, c, d}, the quartet topology of {a, b, c, d} induced by T is obtained by the following procedure. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
10
Computation Theory Lab, CSIE, CCU, Taiwan
Step 1: All leaves but a, b, c and d are deleted from the tree. Edges adjacent to these leaves are also deleted. a b c d f e g T a b c d 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
11
Computation Theory Lab, CSIE, CCU, Taiwan
Step 2: Internal nodes with degree two are contracted and deleted, so their two adjacent nodes become connected. This process is repeated until no internal nodes of degree two are left. a b c d a b c d For simplicity, we denote the quartet topology above by [bc|ad], which is a kind of bipartition of {a, b, c, d}. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
12
Computation Theory Lab, CSIE, CCU, Taiwan
d For simplicity, we denote the quartet topology above by [bc|ad], which is a kind of bipartition of {a, b, c, d}. Note that each input quartet topology t is accom-panied by a positive weight Ct . 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
13
Computation Theory Lab, CSIE, CCU, Taiwan
Problem description Input: A list of weighted quartet topologies over n taxa. Output: A binary tree with n leaves such that the total weight of the satisfied quartet topologies is maximized. This problem was shown to be NP-hard. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
14
Computation Theory Lab, CSIE, CCU, Taiwan
Quartet method The fact that small phylogenies are easier to infer than large ones leads to another approach – the quartet method. First, consider subsets of 4 taxa, one at a time, and infer the phylogenies (i.e., quartet topologies) for these subsets. The next stage combines the multiple quartet topologies into a single phylogeny. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
15
Computation Theory Lab, CSIE, CCU, Taiwan
Given a set of quartet topologies Q, how to determine whether an evolutionary tree T is “good” or “bad”? 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
16
Computation Theory Lab, CSIE, CCU, Taiwan
Given an evolutionary tree T and a set of quartet topologies Q. We say that T satisfies a quartet topology tq of a quartet q if the induced quartet topology of q by T is exactly tq. a b c d f e g T For example, T satisfies [ab|dg], [ce|fg], [ad|bc], etc. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
17
Computation Theory Lab, CSIE, CCU, Taiwan
Score We denote by S, where S Q, the set of quartet topologies that are satisfied by T, and let U = Q S. We define the score of the evolutionary tree T as follows. P s 2 S C + 1 3 u U . 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
18
Computation Theory Lab, CSIE, CCU, Taiwan
Score (contd.) The latter term was chosen because there are three possible topologies for every quartet. Therefore this term equals the expected increase. In a variant of the same method, the latter term is zeroed, so the quartet topologies which are not satisfied by T do not contribute to the score. P s 2 S C + 1 3 u U . 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
19
Computation Theory Lab, CSIE, CCU, Taiwan
Score (contd.) It can be easily derived that is an upper bound on the score of any evolutionary tree T. P q 2 Q C 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
20
Preliminaries for the dynamic programming algorithm
For technical reasons, the following discussion deals with rooted evolutionary trees. For a node v, its left and right children are denoted by vl and vr respectively. v vl vr 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
21
Preliminaries for the dynamic programming algorithm (contd.)
Given a rooted evolutionary tree T and a node v in it we denote by T(v) the subtree of T rooted at v. u v w T(v) 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
22
Preliminaries for the dynamic programming algorithm (contd.)
We denote by L(T) the set of leaves (i.e., taxa) of the tree T. u v w L(Tv) … 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
23
Preliminaries for the dynamic programming algorithm (contd.)
For a pair of nodes u, v, the least common ancestor of u and v, lca(u, v), is defined as an ancestor p of both u and v such that no node in T(p) other than p is an ancestor of both u and v. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
24
Preliminaries for the dynamic programming algorithm (contd.)
The lca of a and c. a b c d a b c d 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
25
Preliminaries for the dynamic programming algorithm (contd.)
Definition: Given a quartet topology t = [ab|cd] and an evolutionary tree T, the quartet least common ancestor of t, qlca(t) is defined as a node p that is the lca of two or more pairs of elements from {a, b, c, d}, and no node in T(p) except p is the lca of two or more pairs of elements from {a, b, c, d}. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
26
Preliminaries for the dynamic programming algorithm (contd.)
The qlca for [ab|cd]. a b c d a b c d 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
27
Another equivalent definition for the quartet least common ancestor
Definition: Given a quartet topology t = [ab|cd] and an evolutionary tree T, the qlca of t is a node p such that |L(T(p)){a, b, c, d}| 3. For any child s of p, |L(T(s)){a, b, c, d}| 2. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
28
Computation Theory Lab, CSIE, CCU, Taiwan
Some observations Every quartet topology t has a unique qlca(t). Given a tree T and a quartet topology t, the subtree rooted at qlca(t) determines whether t is satisfied in the evolutionary tree T. Let t = [ab|cd] and v = qlca(t). We look at vl , vr , T(vl) and T(vr). At least one of these subtrees contains exactly two taxa e, f from {a, b, c, d}. Then t is satisfied iff the pair {e, f} is either {a, b} or {c, d}. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
29
Some observations (contd.)
Given a quartet topology t = [ab|cd] and an evolutionary tree T, let v = qlca(t). Then T satisfies t if and only if at least one of the following holds: {a, b} L(T(s)). {c, d} L(T(s)). where s = vl or s = vr. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
30
Computation Theory Lab, CSIE, CCU, Taiwan
The algorithm We denote by SATQ(T(v)) the set of quartet topologies t Q such that t is satisfied by T, and qlca(t) is a node in T(v). Let TOPQ(T(v)) SATQ(T(v)) be the set of quartet topologies in Q that have v as their qlca and are satisfied by T. 最底下等式(recursive formula)右邊三項為disjoint,所以等一下的score加總沒問題。 W e t h n a v S A T Q ( ) = O P [ l r : 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
31
Computation Theory Lab, CSIE, CCU, Taiwan
The algorithm (contd.) For a set A Q of quartet topologies, let denote the sum of their weights. The score of the subtree T(v) (with respect to Q) is defined as s u m ( A ) = P t 2 C s c o r e Q ( T v ) = u m S A : 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
32
Computation Theory Lab, CSIE, CCU, Taiwan
The algorithm (contd.) By the above equation, we have s c o r e Q ( T v ) = u m O P + l : 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
33
Computation Theory Lab, CSIE, CCU, Taiwan
The algorithm (contd.) Let S be a set of three or more taxa. Denote by opt_scoreQ(S) the maximum score with respect to Q among all trees that have S as their set of leaves. We denote by opt_treeQ(S) a tree which attains the maximum score. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
34
Computation Theory Lab, CSIE, CCU, Taiwan
The algorithm (contd.) For every proper partition of S into two subsets S1 and S2, let T(S1, S2) denote a tree whose left subtree equals opt_treeQ(S1) and its right subtree equals opt_treeQ(S2). We then have s c o r e Q ( T S 1 ; 2 ) = u m O P + p t l : 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
35
Computation Theory Lab, CSIE, CCU, Taiwan
The algorithm (contd.) This implies that By employing the dynamic programming paradigm, we can avoid wasteful repetitions. To do this, we scan the subsets S {1 ,2 …, n} by increasing size of S. o p t s c r e Q ( S ) = m a x 1 [ 2 u T O P ; + l : 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
36
Computation Theory Lab, CSIE, CCU, Taiwan
The algorithm (contd.) For simplicity, the details of implementing the dynamic programming algorithm are omitted. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
37
The space complexity and the time complexity
= 2 k O ( 3 ) ; w h e r s t z o f p u q a l g . 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
38
Thank you.
39
Computation Theory Lab, CSIE, CCU, Taiwan
References [S92] M. Steel: The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification, 9 (1992), pp. 91−116. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.