Quartet distance between general trees Chris Christiansen Thomas Mailund Christian N.S. Pedersen Martin Randers.

Slides:



Advertisements
Similar presentations
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Advertisements

Fractional Cascading CSE What is Fractional Cascading anyway? An efficient strategy for dealing with iterative searches that achieves optimal.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Data Mining Association Analysis: Basic Concepts and Algorithms
5 - 1 Chap 5 The Evolution Trees Evolutionary Tree.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Testing Metric Properties Michal Parnas and Dana Ron.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Extending TreeJuxtaposer Nicholas Chen Maryam Farboodi May 9, 2006.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignment S 1 = AGGTC S 2 = GTTCG S 3 = TGAAC Possible alignment A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- AG-AG- GTTGTT GTGGTG.
Chapter 2 Graph Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Shortest Path. What is Shortest Path In a network, there are often many possible paths from one node in a network to another. It is often desirable (for.
Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng.
BINF6201/8201 Molecular phylogenetic methods
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 5 Graph Algorithms.
1 Section 1.4 Graphs and Trees A graph is set of objects called vertices or nodes where some pairs of objects may be connected by edges. (A directed graph.
1 Chapter Elementary Notions and Notations.
Vertices and Edges Introduction to Graphs and Networks Mills College Spring 2012.
Constructing evolutionary trees from rooted triples Bang Ye Wu Dept. of Computer Science and Information Engineering Shu-Te University.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
Constant-Time LCA Retrieval Presentation by Danny Hermelin, String Matching Algorithms Seminar, Haifa University.
A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝.
On the Scalability of Computing Triplet and Quartet Distances Morten Kragelund Holt Jens Johansen Gerth Stølting Brodal 1 Aarhus University.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Foundation of Computing Systems
Barnes Hut N-body Simulation Martin Burtscher Fall 2009.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Why use phylogenetic networks?
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Fitch-Margoliash Algorithm 1.From the distance matrix find the closest pair, e.g., A & B 2.Treat the rest of the sequences as a single composite sequence.
Computing the all-pairs quartet distance on a set of evolutionary trees Martin Stissing 1 Thomas Mailund 1,2 Christian Nørgaard Storm Pedersen 1 Gerth.
CS552: Computer Graphics Lecture 28: Solid Modeling.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
1 Minimum Routing Cost Tree Definition –For two nodes u and v on a tree, there is a path between them. –The sum of all edge weights on this path is called.
CSE 554 Lecture 5: Contouring (faster)
Computational Geometry
Trees Chapter 15.
Data Mining Association Analysis: Basic Concepts and Algorithms
Ariel Rosenfeld Bar-Ilan Uni.
Frequent Pattern Mining
Clustering methods Tree building methods for distance-based trees
Binary Tree Applications
TREES General trees Binary trees Binary search trees AVL trees
Multiple Alignment and Phylogenetic Trees
Orthogonal Range Searching and Kd-Trees
CS212: Data Structures and Algorithms
Hierarchical clustering approaches for high-throughput data
Network Notes Ms Allan 2012 AS91260 (2.5) Designed to teach from,
Design and Analysis of Multi-Factored Experiments
CS 581 Tandy Warnow.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Speaker: Chuang-Chieh Lin National Chung Cheng University
Multiple Sequence Alignment
Copyright ©2012 by Pearson Education, Inc. All rights reserved
Counting II: Recurring Problems And Correspondences
June 12, 2003 (joint work with Umut Acar, and Guy Blelloch)
CS 261 – Data Structures Trees.
Backtracking and Branch-and-Bound
Phylogeny.
Counting II: Recurring Problems And Correspondences
Introduction to Trees Chapter 6 Objectives
Presentation transcript:

Quartet distance between general trees Chris Christiansen Thomas Mailund Christian N.S. Pedersen Martin Randers

Evolutionary trees ● An evolutionary tree describes relationship for a set of species ● Can be constructed using different traits → slightly different trees ● Wish to compare these ● This talk: Quartet distance between trees over same set of n species, S

Quartets and quartet topologies ● A quartet is a sub-set of species of size 4, e.g. {a,b,c,d} ⊆ S ● The quartet topology of a quartet {a,b,c,d} is the induced subtree a b e a b f d c c d

Possible topologies of {a,b,c,d} a b c d a c b d a d c b a b c d Butterfly topologies Star topology

Quartet distance ● The quartet distance between T 1 and T 2 is the number of quartets, {a,b,c,d}, where the topology differ b a e c d c a d e b – ab|cd ≠ ac|bd – ab|ce ≠ ac|be – ab|de = ab|de – ac×ed ≠ ac|de – bc×ed ≠ bc|de – qdist(T 1,T 2 ) = 4

Binary trees and general trees ● Results for binary trees: – Steel and Penny (1993): O(n 3 ) – Bryant et al. (2000): O(n 2 ) – Brodal et al. (2003): O(n log n) – n number of leaves ● For arbitrary degree: – This paper: O(n 3 ) and O(n 2 d 2 ) – n number of leaves – d maximal degree c a d e b b a e c d

A triplet approach ● A triplet based approach: – For each fixed triplet {a,b,c} a b c

A triplet approach ● A triplet based approach: – For each fixed triplet {a,b,c} – And each leaf x in S-{a,b,c} a b c x a b c x a b c x a b c x

Shared topologies per triplet ● A triplet based approach: – For each fixed triplet {a,b,c} – And each leaf x in S-{a,b,c} – Compute shared quartet topologies {a,b,c,x} – The sum of these, for each triplet {a,b,c}, divided by 4 is all shared topologies a b c x

Triplet centers ● Consider the triplet centers, C 1 abc and C 2 abc – The “meeting point” of the paths between a, b, and c a b c C 1 abc a c b C 2 abc a b c – Induced subtrees: T1aT1a T1bT1b T1cT1c T2cT2c a b c T 1 rest

Shared butterfly topologies ● Shared butterfly topologies: a b c a c b a b c a c b a b c a c b x x x x x x

Shared star topologies ● Shared star topologies: a b c a c b x x ● With a bit of algebra, expressible through sizes and intersections of T i a, T i b, T i c, i=1,2

Preprocessing ● As a preprocessing step – Tabulate all |T i x | and |T 1 x ∩ T 2 y | – Time and space O(n 2 ) – Bryant et al. (2000) ● Shared quartet topologies of {a,b,c,x} (fixed a,b,c) computable in O(1)

The triplet algorithm ● For each pair a,b: – Calculate the paths between a and b a b a b

The triplet algorithm ● For each pair a,b: – Calculate the paths between a and b – Tabulate C i abc for each c a b a b c C 1 abc C 2 abc c C 1 [c] := C 1 abc C 2 [c] := C 2 abc

The triplet algorithm ● For each pair a,b: – Calculate the paths between a and b – Tabulate C i abc for each c – Calculate shared(a,b) as

Time complexity ● For each pair a,b: – Calculate the paths between a and b – Tabulate C i abc for each c – Calculate shared(a,b) as Four tree traversals: O(n)

Time complexity ● For each pair a,b: – Calculate the paths between a and b – Tabulate C i abc for each c – Calculate shared(a,b) as Four tree traversals: O(n) n constant time summations: O(n)

Time complexity: O(n 3 ) ● For each pair a,b: – Calculate the paths between a and b – Tabulate C i abc for each c – Calculate shared(a,b) as Four tree traversals: O(n) n constant time summations: O(n) Repeat n 2 times: Total O(n 3 )

Reducing to butterflies ● Notation: – S(T 1,T 2 ) = |one tree has star topology| – shared B (T 1,T 2 ) = |equal butterfly topology| – diff B (T 1,T 2 ) = |different butterfly topology| ● Indirect computation: – qdist(T 1,T 2 ) = S(T 1,T 2 )+diff B (T 1,T 2 ) = B 1 + B 2 -2 ⋅ shared B (T 1,T 2 ) – diff B (T 1,T 2 ) – B i = shared B (T i,T i ), – shared B (T 1,T 2 )and diff B (T 1,T 2 ) computed in a similar way S 1 S 2 B 1 B 2 {a,b,c,d} {a,b,c,e} {a,b,c,f} {x,y,z,w} S(T 1,T 2 ) = B 1 + B ⋅ (shared B (T 1,T 2 )+diff B (T 1,T 2 ))

Claims and butterflies ● Shared butterflies: a b c d T 1 T 2 T 3 ● A claim is an oriented split in three trees – Defines a set of butterflies: ab|cd – Each butterfly claimed by exactly two claims ● A pair of claims share butterflies:

High-degree nodes ● A node of degree d has (d-1)(d-2)/2 claims per ingoing edge ● O(n 2 d 4 ) pairs of claims

Expanding nodes ● Computable in time O(n 2 ) ● O(d) new edges per original node – Still binary, so O(n 2 ) nodes and edges – O(n 2 ) pairs of claims

Too many butterflies ● The new edges introduce new butterflies – ab|cd in original tree – ac|de not in original tree a b c e d

Extended claims old edge new path new or old split ● Each original butterfly claimed by exactly two extended claims – Shared butterflies counted similar to regular claims – O(n 2 d 2 ) pairs of extended claims – Can be enumerated in time O(n 2 d 2 )

Conclusions ● Two algorithms for quartet distance for trees with arbitrary degree ● Complexity: – Time O(n 3 ), space O(n 2 ) – Time O(n 2 d 2 ), space O(n 2 ) ● Java implementation of both at: