Speaker: Chuang-Chieh Lin National Chung Cheng University

Slides:



Advertisements
Similar presentations
Longest Common Subsequence
Advertisements

Divide and Conquer. Subject Series-Parallel Digraphs Planarity testing.
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
Greedy Algorithms Greed is good. (Some of the time)
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
CMPS 2433 Discrete Structures Chapter 5 - Trees R. HALVERSON – MIDWESTERN STATE UNIVERSITY.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
CS Data Structures Chapter 10 Search Structures (Selected Topics)
Greedy method for inferring tandem duplication history Louxin Zhang, Bin Ma, Lusheng Wang and Ying Xu. BIOINFORMATICS 2003 reference: 1.Elemento,O.,(2002)
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
Bioinformatics Algorithms and Data Structures
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.
Important Problem Types and Fundamental Data Structures
Induction and recursion
Graph Algorithms Using Depth First Search Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms.
Computational Complexity Polynomial time O(n k ) input size n, k constant Tractable problems solvable in polynomial time(Opposite Intractable) Ex: sorting,
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Introduction Of Tree. Introduction A tree is a non-linear data structure in which items are arranged in sequence. It is used to represent hierarchical.
CS Data Structures Chapter 5 Trees. Chapter 5 Trees: Outline  Introduction  Representation Of Trees  Binary Trees  Binary Tree Traversals 
Week 12 - Friday.  What did we talk about last time?  Asymptotic notation.
Binary Trees. Binary Tree Finite (possibly empty) collection of elements A nonempty binary tree has a root element The remaining elements (if any) are.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
 Rooted tree and binary tree  Theorem 5.19: A full binary tree with t leaves contains i=t-1 internal vertices.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
4.3 Recursive Definitions and Structural Induction Sometimes it is difficult to define an object explicitly. However, it may be easy to define this object.
ICS 253: Discrete Structures I Induction and Recursion King Fahd University of Petroleum & Minerals Information & Computer Science Department.
CS 103 Discrete Structures Lecture 13 Induction and Recursion (1)
Foundation of Computing Systems
LIMITATIONS OF ALGORITHM POWER
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
Packet Classification Using Dynamically Generated Decision Trees
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
Chapter 10 NP-Complete Problems.
CSCE 210 Data Structures and Algorithms
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
CMSC 341 Introduction to Trees 8/3/2007 CMSC 341 Tree Intro.
Copyright © Cengage Learning. All rights reserved.
The Evolution Trees (Part I)
12. Graphs and Trees 2 Summary
Design and Analysis of Algorithm
CMSC 341 Introduction to Trees.
CHAPTER 4 Trees.
Character-Based Phylogeny Reconstruction
Algorithms and networks
Speaker: Chuang-Chieh Lin National Chung Cheng University
Graph Algorithms Using Depth First Search
Chapter 6 Transform and Conquer.
Analysis and design of algorithm
ICS 353: Design and Analysis of Algorithms
Randomized Algorithms Markov Chains and Random Walks
Algorithms and networks
3.5 Minimum Cuts in Undirected Graphs
Divide-and-Conquer 7 2  9 4   2   4   7
CS 581 Tandy Warnow.
Chapter 6: Transform and Conquer
Dynamic Data Structures for Simplicial Thickness Queries
Data Structures – Week #5
Chapter 11 Limitations of Algorithm Power
CMSC 341 Introduction to Trees CMSC 341 Tree Intro.
Phylogeny.
September 1, 2009 Tandy Warnow
Important Problem Types and Fundamental Data Structures
Divide-and-Conquer 7 2  9 4   2   4   7
Switching Lemmas and Proof Complexity
NATURE VIEW OF A TREE leaves branches root. NATURE VIEW OF A TREE leaves branches root.
Presentation transcript:

Speaker: Chuang-Chieh Lin National Chung Cheng University Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships A. Ben-Dor, B. Chor, D. Graur, R. Ophir, D. Pelleg Journal of Computational Biology, Vol. 5, 1998, pp. 377390. Elucidation: 說明、解釋 Speaker: Chuang-Chieh Lin National Chung Cheng University 2019/1/14

Computation Theory Lab, CSIE, CCU, Taiwan Outline Introduction and preliminaries Problem description The dynamic programming algorithm The space complexity and the time complexity 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Evolutionary trees Let S be a set of taxa and | S | = n. An evolutionary tree T on S is an unrooted, leaf-labeled tree such that the leaves of T are bijectively labeled by the taxa in S, and each internal node of T has degree 3. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Evolutionary trees For 4 taxa a, b, c, d, we have 3 possible topologies: a c a b a c b d c d d b [ad|bc] [ab|cd] [ac|bd] 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Evolutionary trees (contd.) For 5 taxa a, b, c, d, e, how many possible evolutionary trees can we derive? The answer is: 5  3 = 15. a c There are 5 possible positions for e to be inserted. b d 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Evolutionary trees (contd.) For n taxa, how many possible evolutionary trees can we derive? The answer is (2n  5)!! This observation can be verified by induction on n. For an odd positive integer n, it is defined that n!! = n (n  2)  (n  4)  …  3  1. If n = 15, (2n  5)!! is approximately 8  1012. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Let us analyze n!! in another way. For a nonnegative integer m  0, let n = 2m + 1. Then we have ( 2 m + 1 ) ! = ¢ ¡ P . S o ( 2 m + 1 ) ! · ¢ = O . 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan (2n−5)!! = O(nn−2) For n taxa, we have (2n  5)!! = O((n  3)n2) = O(nn−2) possible evolutionary trees. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Quartet topologies A set of four taxa is called a quartet. Given an evolutionary tree T and a quartet {a, b, c, d}, the quartet topology of {a, b, c, d} induced by T is obtained by the following procedure. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Step 1: All leaves but a, b, c and d are deleted from the tree. Edges adjacent to these leaves are also deleted. a b c d f e g T a b c d 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Step 2: Internal nodes with degree two are contracted and deleted, so their two adjacent nodes become connected. This process is repeated until no internal nodes of degree two are left. a b c d a b c d For simplicity, we denote the quartet topology above by [bc|ad], which is a kind of bipartition of {a, b, c, d}. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan d For simplicity, we denote the quartet topology above by [bc|ad], which is a kind of bipartition of {a, b, c, d}. Note that each input quartet topology t is accom-panied by a positive weight Ct . 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Problem description Input: A list of weighted quartet topologies over n taxa. Output: A binary tree with n leaves such that the total weight of the satisfied quartet topologies is maximized. This problem was shown to be NP-hard. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Quartet method The fact that small phylogenies are easier to infer than large ones leads to another approach – the quartet method. First, consider subsets of 4 taxa, one at a time, and infer the phylogenies (i.e., quartet topologies) for these subsets. The next stage combines the multiple quartet topologies into a single phylogeny. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Given a set of quartet topologies Q, how to determine whether an evolutionary tree T is “good” or “bad”? 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Given an evolutionary tree T and a set of quartet topologies Q. We say that T satisfies a quartet topology tq of a quartet q if the induced quartet topology of q by T is exactly tq. a b c d f e g T For example, T satisfies [ab|dg], [ce|fg], [ad|bc], etc. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Score We denote by S, where S  Q, the set of quartet topologies that are satisfied by T, and let U = Q  S. We define the score of the evolutionary tree T as follows. P s 2 S C + 1 3 u U . 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Score (contd.) The latter term was chosen because there are three possible topologies for every quartet. Therefore this term equals the expected increase. In a variant of the same method, the latter term is zeroed, so the quartet topologies which are not satisfied by T do not contribute to the score. P s 2 S C + 1 3 u U . 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Score (contd.) It can be easily derived that is an upper bound on the score of any evolutionary tree T. P q 2 Q C 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Preliminaries for the dynamic programming algorithm For technical reasons, the following discussion deals with rooted evolutionary trees. For a node v, its left and right children are denoted by vl and vr respectively. v vl vr 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Preliminaries for the dynamic programming algorithm (contd.) Given a rooted evolutionary tree T and a node v in it we denote by T(v) the subtree of T rooted at v. u v w T(v) 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Preliminaries for the dynamic programming algorithm (contd.) We denote by L(T) the set of leaves (i.e., taxa) of the tree T. u v w L(Tv) … 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Preliminaries for the dynamic programming algorithm (contd.) For a pair of nodes u, v, the least common ancestor of u and v, lca(u, v), is defined as an ancestor p of both u and v such that no node in T(p) other than p is an ancestor of both u and v. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Preliminaries for the dynamic programming algorithm (contd.) The lca of a and c. a b c d a b c d 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Preliminaries for the dynamic programming algorithm (contd.) Definition: Given a quartet topology t = [ab|cd] and an evolutionary tree T, the quartet least common ancestor of t, qlca(t) is defined as a node p that is the lca of two or more pairs of elements from {a, b, c, d}, and no node in T(p) except p is the lca of two or more pairs of elements from {a, b, c, d}. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Preliminaries for the dynamic programming algorithm (contd.) The qlca for [ab|cd]. a b c d a b c d 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Another equivalent definition for the quartet least common ancestor Definition: Given a quartet topology t = [ab|cd] and an evolutionary tree T, the qlca of t is a node p such that |L(T(p)){a, b, c, d}|  3. For any child s of p, |L(T(s)){a, b, c, d}|  2. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan Some observations Every quartet topology t has a unique qlca(t). Given a tree T and a quartet topology t, the subtree rooted at qlca(t) determines whether t is satisfied in the evolutionary tree T. Let t = [ab|cd] and v = qlca(t). We look at vl , vr , T(vl) and T(vr). At least one of these subtrees contains exactly two taxa e, f from {a, b, c, d}. Then t is satisfied iff the pair {e, f} is either {a, b} or {c, d}. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Some observations (contd.) Given a quartet topology t = [ab|cd] and an evolutionary tree T, let v = qlca(t). Then T satisfies t if and only if at least one of the following holds: {a, b}  L(T(s)). {c, d}  L(T(s)). where s = vl or s = vr. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan The algorithm We denote by SATQ(T(v)) the set of quartet topologies t  Q such that t is satisfied by T, and qlca(t) is a node in T(v). Let TOPQ(T(v)) SATQ(T(v)) be the set of quartet topologies in Q that have v as their qlca and are satisfied by T. 最底下等式(recursive formula)右邊三項為disjoint,所以等一下的score加總沒問題。 W e t h n a v S A T Q ( ) = O P [ l r : 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan The algorithm (contd.) For a set A  Q of quartet topologies, let denote the sum of their weights. The score of the subtree T(v) (with respect to Q) is defined as s u m ( A ) = P t 2 C s c o r e Q ( T v ) = u m S A : 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan The algorithm (contd.) By the above equation, we have s c o r e Q ( T v ) = u m O P + l : 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan The algorithm (contd.) Let S be a set of three or more taxa. Denote by opt_scoreQ(S) the maximum score with respect to Q among all trees that have S as their set of leaves. We denote by opt_treeQ(S) a tree which attains the maximum score. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan The algorithm (contd.) For every proper partition of S into two subsets S1 and S2, let T(S1, S2) denote a tree whose left subtree equals opt_treeQ(S1) and its right subtree equals opt_treeQ(S2). We then have s c o r e Q ( T S 1 ; 2 ) = u m O P + p t l : 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan The algorithm (contd.) This implies that By employing the dynamic programming paradigm, we can avoid wasteful repetitions. To do this, we scan the subsets S {1 ,2 …, n} by increasing size of S. o p t s c r e Q ( S ) = m a x 1 [ 2 ¡ u T O P ; + l ¢ : 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Computation Theory Lab, CSIE, CCU, Taiwan The algorithm (contd.) For simplicity, the details of implementing the dynamic programming algorithm are omitted. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

The space complexity and the time complexity = µ ¶ 2 k O ( 3 ) ; w h e r s t z o f p u q a l g . 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan

Thank you.

Computation Theory Lab, CSIE, CCU, Taiwan References [S92] M. Steel: The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification, 9 (1992), pp. 91−116. 2019/1/14 Computation Theory Lab, CSIE, CCU, Taiwan