PRESENTED BY SUNIL MANJERI Maximum sub-triangulation in pre- processing phylogenetic data Anne Berry * Alain Sigayret * Christine Sinoquet.

Slides:



Advertisements
Similar presentations
CS 336 March 19, 2012 Tandy Warnow.
Advertisements

Constraint Satisfaction Problems
Edge-connectivity and super edge-connectivity of P 2 -path graphs Camino Balbuena, Daniela Ferrero Discrete Mathematics 269 (2003) 13 – 20.
Divide and Conquer. Subject Series-Parallel Digraphs Planarity testing.
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Introduction to Graphs
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Minimum Spanning Tree Sarah Brubaker Tuesday 4/22/8.
1 Discrete Structures & Algorithms Graphs and Trees: III EECE 320.
A Separator Theorem for Graphs with an Excluded Minor and its Applications Paul Seymour Noga Alon Robin Thomas Lecturer : Daniel Motil.
CompSci 102 Discrete Math for Computer Science April 19, 2012 Prof. Rodger Lecture adapted from Bruce Maggs/Lecture developed at Carnegie Mellon, primarily.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Bioinformatics Algorithms and Data Structures
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
1 Relations: The Second Time Around Chapter 7 Equivalence Classes.
CS5371 Theory of Computation Lecture 1: Mathematics Review I (Basic Terminology)
1 Separator Theorems for Planar Graphs Presented by Shira Zucker.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Graph Algorithms Using Depth First Search Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms.
Design and Analysis of Computer Algorithm September 10, Design and Analysis of Computer Algorithm Lecture 5-2 Pradondet Nilagupta Department of Computer.
Graph Partitioning Problem Kernighan and Lin Algorithm
Chapter 2 Graph Algorithms.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
Modular Decomposition and Interval Graphs recognition Speaker: Asaf Shapira.
PHYLOGENETIC TREES Dwyane George February 24,
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Sets.
BINF6201/8201 Molecular phylogenetic methods
Stephane Durocher 1 Debajyoti Mondal 1 Md. Saidur Rahman 2 1 Department of Computer Science, University of Manitoba 2 Department of Computer Science &
On Leaf Powers Andreas Brandstädt University of Rostock, Germany (joint work with Van Bang Le, Peter Wagner, Christian Hundt, and R. Sritharan)
Benjamin Loyle 2004 Cse 397 Solving Phylogenetic Trees Benjamin Loyle March 16, 2004 Cse 397 : Intro to MBIO.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Introduction to Algorithms Jiafen Liu Sept
Chapter 10 Graph Theory Eulerian Cycle and the property of graph theory 10.3 The important property of graph theory and its representation 10.4.
Twenty Years of EPT Graphs: From Haifa to Rostock Martin Charles Golumbic Caesarea Rothschild Institute University of Haifa With thanks to my research.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
Unit – V Graph theory. Representation of Graphs Graph G (V, E,  ) V Set of vertices ESet of edges  Function that assigns vertices {v, w} to each edge.
Relation. Combining Relations Because relations from A to B are subsets of A x B, two relations from A to B can be combined in any way two sets can be.
Computing Branchwidth via Efficient Triangulations and Blocks Authors: F.V. Fomin, F. Mazoit, I. Todinca Presented by: Elif Kolotoglu, ISE, Texas A&M University.
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
The full Steiner tree problem Theoretical Computer Science 306 (2003) C. L. Lu, C. Y. Tang, R. C. T. Lee Reporter: Cheng-Chung Li 2004/06/28.
Great Theoretical Ideas in Computer Science for Some.
 2004 SDU 1 Lecture5-Strongly Connected Components.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
COMPSCI 102 Introduction to Discrete Mathematics.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
1 Closures of Relations Based on Aaron Bloomfield Modified by Longin Jan Latecki Rosen, Section 8.4.
1 GRAPH Learning Outcomes Students should be able to: Explain basic terminology of a graph Identify Euler and Hamiltonian cycle Represent graphs using.
Algorithm Design and Analysis June 11, Algorithm Design and Analysis Pradondet Nilagupta Department of Computer Engineering This lecture note.
Dr Nazir A. Zafar Advanced Algorithms Analysis and Design Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar.
An introduction to chordal graphs and clique trees
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Graph Algorithms Using Depth First Search
Computability and Complexity
ICS 353: Design and Analysis of Algorithms
Basic Graph Algorithms
On (k,l)-Leaf Powers Peter Wagner University of Rostock, Germany
Phylogeny.
September 1, 2009 Tandy Warnow
Closures of Relations Epp, section 10.1,10.2 CS 202.
Presentation transcript:

PRESENTED BY SUNIL MANJERI Maximum sub-triangulation in pre- processing phylogenetic data Anne Berry * Alain Sigayret * Christine Sinoquet

Outline Introduction Phylogeny Preliminaries Chordal Graphs Preliminaries Threshold Family of Graphs Maintaining a family of chordal graphs Composition Scheme Algorithm References

Introduction The best evidence strongly support that all life currently on earth is descended from a single common ancestor In last 3.8 million years the single ancestor has split repeatedly into new species The evolutionary relationship between these species is referred to as phylogeny Phylogenetic trees illustrates the phylogeny of groups of organisms Basics of Phylogeny

Introduction A sample data set and phylogeny for it is shown below Basics of Phylogeny abcdef lamprey shark salmon lizard lampreyshark salmonlizard a, b f c d de Characters TaxaTaxa a – paired fins, b – jaws, c – large dermal bones, d – fin rays, e – lungs, f – rasping tongue

Introduction Data for Phylogeny  Numerical  Distance between objects or species distance (man, mouse) = 500 distance (man, chimp) = 100  Discrete characters  Each character has finite number of states Number of legs = 1, 2, 4 DNA = {A, C, T, G} Basics of Phylogeny

Introduction Distance method of reconstructing Phylogeny trees Basics of Phylogeny Input: Given a n x n matrix M where M ij >= 0 and M ij is the distance between objects or species i and j Goal: Build and edge-weighted tree where each leaf corresponds to one object of M and so that the distances measured on the tree between leaves i and j correspond to M ij MAbcde a b c0610 d08 a b c e d Fig. 1

Phylogeny Preliminaries Definitions and properties Dissimilarity on a finite set X is a function δ:X 2 -> IR + such that for all x, y є X δ(x, y) = δ(y, x) Distance is a dissimilarity such that  for all x, y є X δ(x, y) = 0 for x=y  for all x, y, z є X δ(x, y) + δ(y, z) ≥ δ(x, z) In Fig. 1 let £ the set of leaves representing the taxa. For a,b є £, denote d(a,b) be the length of the ab-path or the evolutionary distance between a and b. This distance is called additive distance and the associated matrix on £ x £ is called an additive matrix Additive Matrices MAbcde a b c0610 d08

Phylogeny Preliminaries The set of values of a dissimilarity matrix M can be ordered from 0 (as M[x, y] = 0) to the maximal value. This defines a number of different thresholds (θ): 0,1,…k in increasing order The 6 dissimilarity values are: θ -1 (0)=0, θ -1 (1)=6, θ -1 (2)=8, θ -1 (3)=10, θ -1 (4)=12, θ -1 (5)=16 The 6 threshold values are: θ(0)=0, θ(6)=1, θ(8)=2, θ(10)=3, θ(12)=4, θ(16)=5 Ordinal Matrix of a dissimilarity matrix is defined as the matrix obtained by replacing each dissimilarity value by its threshold Ordinal Matrices Mabcde a b c0610 d08 Dissimilarity matrix M Mabcde a01445 b0445 c013 d02 Ordinal matrix W

Phylogeny Preliminaries Characterization 2.1 From [3], a distance matrix M on a set of taxa is additive if and only if for any quadruple {a, b, c, d} of taxa, from the 3 sums d(a, b)+d(c, d), d(a, c)+d(b, d) and d(a, d)+d(b, c), the two largest are equal Additive Matrices Mabcde a b c0610 d08 Dissimilarity matrix M d(a, b)+d(c, d) = 12 d(a, c)+d(b, d) = 24 d(a, d)+d(b, c) = 24

The Problems Reconstructing the tree is easy and can be done in polynomial time Experimental results usually does not always generate additive matrices, and inferring phylogeny remains costly and inaccurate Instead examine the ordinal properties of the dissimilarity matrix thereby examining the structure of the thresholds rather than depending only the values themselves. This approach seems to be less sensitive to small data variations. Huson, Nettles and Warnow in [2] proved that if the matrix is additive, all the graphs of the threshold family are chordal or triangulated Problem: Experimental results show that not only do the dissimilarity matrices biologists have to work with fail to be additive, but the corresponding graphs very often fail to be chordal.

Chordal Graphs Preliminaries A graph G = (V, E) is said to be chordal or triangulated if it contains no chordless cycle on more that 3 vertices Characterization A graph is chordal if and only if it is the intersection graph of a family of subtrees of a tree [4] Graph Inclusion – If G=(V, E) is a graph and G`=(V, E`) is another graph on the same vertex set, we can write G ⊆ G` if and only if E ⊆ E` and G ⊂ G` if and only if E ⊂ E`

Chordal Graphs Preliminaries Methods of correcting non-chordal graph  Minimal triangulation  Adding an inclusion-minimum set of edges to the graph in order to make it chordal  For a given graph of n vertices and m edges, computing minimum triangulation can be done in O(nm) time  Adding edges to a graph of threshold family means lowering the thresholds of the corresponding edges.  Maximal triangulation  Removing edges rather than adding them to make a graph chordal  Maximum triangulation can be computed in O(Δm) time, where Δ is the maximum degree in the graph Correcting Chordal Graphs

Chordal graphs Preliminaries Rose, Tarjan and Lueker gave the following definition of minimal triangulation Definition 2.4 – From [5] If G = (V, E) is a non-chordal graph, a chordal graph H = (V, E + F) is said to be a minimal triangulation of G if ∀ F` ⊂ F, graph ( V, E+F` ) fails to be chordal Minimal Triangulation a b c de f g H a b c de f g G F = {bd, af} F` = {bd} or {af}

Chordal graphs Preliminaries Rose, Tarjan and Lueker also proved that only one edge needs to be removed and the resulting graph becomes non-chordal Theorem 2.5 – From [5] Let G = (V, E) be a non-chordal graph, let H = (V, E + F) be a chordal graph; H is minimum triangulation of G iff ∀ f ∈ F, graph ( V, (E+ (F \ {f}))) fails to be chordal Minimal Triangulation a b c de f g H a b c de f g G F = {bd, af} f = {bd} or {af}

Chordal graphs Preliminaries The above theorem relies on the following Lemma, which ensures that, given two chordal graphs which are mutually inclusive, there is an ordering on the edges which need to be added to the smaller graph which will maintain chordality at each edge-addition step Lemma 2.6 – From [5] Let G 1 = (V, E 1 ) be a chordal graph, let G 2 = (V, E 2 ) be a chordal graph such that G 1 ⊂ G 2. Then ∃f ∈ E 2 \ E 1 such that G` = (V, E 2 \ {f}) is chordal Minimal Triangulation a b c de f g G1G1 a b c de f g G2G2 E 2 \ E 1 = {ce, dg, bf, af, ag} Proper Ordering: ce, dg, bf, af, ag In-Proper Ordering: ce, dg, ag, af, bf

Chordal graphs Preliminaries Definition 2.8 – Let G = (V, E) be a non-chordal graph, let H = (V, E \ F) be a chordal graph. We will say that H is a maximal sub- triangulation of G if ∀F`⊂ F, (V, (E \ F) + F`) fails to be chordal Maximal sub-triangulation a b c de f g G a b c de f g H F = {cb, fb} F` = {cb} or {fb}

Maintaining Chordality Given a dissimilarity matrix, we use the associated ordinal matrix to define the corresponding threshold family of graphs Let A be a set of taxa, M be the dissimilarity matrix, W be the corresponding ordinal matrix, on thresholds be 0,1,…,k; We can define a family of graphs G 0 ⊂ G 1 ⊂ … ⊂ G k, called threshold family of graphs associated with W (and thus with M), with G i = (V, E i ), V = A and ab ∈ E i iff W A [a, b] ≤ I Example The threshold matrix induces a preorder relation ℛ: ab ℛ cd iff W[a, b] ≤ W[c, d] ℛ defines an ordered partition of edges of G k ; Each class F i of edges is defines by F i = E i – E i-1 = {xy |W[x, y] = i] Graph G i is obtained from graph G i-1 by adding set of edges F i Threshold Family of Graphs

Maintaining Chordality Threshold Family of Graphs Mabcde a b c0610 d08 Dissimilarity matrix M Mabcde a01445 b0445 c013 d02 Ordinal matrix W a b dc e G0G0 a b dc e G2G2 a b dc e G3G3 a b dc e G4G4 G i = (V, E i ), V = A and ab ∈ E i iff W A [a, b] ≤ i a b dc e G1G1

Maintaining Chordality Property 3.4 If M is an additive matrix then the threshold family of graphs defined by M is a family of chordal graphs Proof o Let T be the phylogeny associated with an additive matrix M o Let G i be the graph corresponding to threshold i ∈ [0…k] o Add internal nodes to T in order obtain a tree T`(where there is a node at mid-distance between any pair {a, b} of vertices o Consider family of subtrees of T` defined by: for each leaf x, T` x is the subtree containing all nodes at distance θ -1 (i)/2 or less from x; ExampleExample o Then G i is the intersection graph of the family of subtrees o By virtue of Characterization 2.3 (Gavril’s theorem), G i is Chordal Threshold family of graphs / Chordal graphs a b c e d

Example For i=1, θ -1 (1)/2 =3 For i=2, θ -1 (1)/2 =4 Threshold family of graphs Vs. Chordal graphs a b c e d a b dc e G1G1 a b dc e G2G2 T` 1 a b c e d T` 2

Composition Scheme To compute a threshold family of graphs which are chordal, such that each graph G i is a sub graph of the original graph G, we construct a clique G k from independent set G 0 by adding at each step an inclusion-maximal set of edges which maintains Chordality. Definition 3.7 From [6], a pair {a, b} of non-adjacent vertices is called a 2- pair iff every chordless path from a to b is of length exactly 2 An edge-addition composition scheme for chordal graphs a b {a, b} is a 2-pair

Composition Scheme Theorem 3.8 Let G 1 be a chordal graph, let {a, b} be a pair of non-adjacent vertices of G 1, let G 2 be the graph obtained from G 1 by adding edge ab; then G 2 is chordal iff {a, b} is a 2-pair of G 1 Proof o Let G 1 be a chordal graph o Let {a, b} be a pair of non-adjacent vertices of G 1 o Let G 2 be the graph obtained from G 1 by adding edge ab o Let μ = ax 1 x 2 …x k b be a longest chordless path from a to b in G 1 o In G 2, ax 1 x 2 …x k ba will be chordless path on more than 3 vertices iff μ is of length greater than 2, i.e. iff {a, b} fails to be a 2-pair of G 1. This contradicts the fact that G 1 is chordal. o Hence {a, b} is a 2-pair of G 1 An edge-addition composition scheme for chordal graphs a b

Composition Scheme Property 3.9 Let G 1 be a chordal graph, let G 2 be a chordal graph such that G 1 ⊂ G 2. Then G 2 can be obtained from G 1 by repeatedly adding an edge between the two vertices forming a 2-pair. Proof o Let G 1 be a chordal graph, let G 2 be a chordal graph such that G 1 ⊂ G 2 o By Lemma 2.6, ∃xy ∈ E 2 \ E 1 Such that (V, E 2 \ {xy}) is chordal. o By theorem 3.8, {x, y} is a 2-pair of G 2 \ {xy} o Repeat this until we obtain graph G 1. We have constructed (in reverse) a 2-pair edge addition ordering which enables us to construct G 2 from G 1 An edge-addition composition scheme for chordal graphs a b c de f g G1G1 a b c de f g G2G2 E 2 \ E 1 = {ce, dg, bf, af, ag}

Composition Scheme Composition Scheme 3.10 From above theorem, a graph on n vertices is chordal iff it can be constructed by starting with an independent set on n vertices, and by adding at each step an edge between the two vertices forming a 2-pair.

Algorithm Input: A dissimilarity matrix M on n taxa, with threshold 0,1,…,k Output: A dissimilarity matrix M`, such that every graph in the threshold family is chordal Initialization: G 0 is an independent set on n vertices; Create an empty FIFO queue Q; begin For i = 1 to k-1 do Assign G i-1 to G i Compute the set F i of pairs of {a, b} such that M[a, b] = θ -1 (i); Add F i to the queue Q; Repeat Scan Q and remove the first pair of ab which is a 2-pair Add edge ab to graph G i ; Set the value of M`[a, b] with θ -1 (i); Until Q contains no 2-pair of G i Give all remaining edges in Q value θ -1 (k) in M`; Add all remaining edges in Q to G k-1 to form G k, a clique on n vertices end An additive data pre-processing algorithm

Threshold family of graphs Mabcde a b c0610 d08 Dissimilarity matrix M Mabcde a01425 b0245 c013 d02 Ordinal matrix W Example: Consider an incorrect matrix M`abcde a b c0610 d08 Dissimilarity matrix M` Computing the Algorithm will generate the following corrected dissimilarity matrix Complexity of running the above algorithm is O(n 5 )

Reference [1] – Anne Berry, Alain Sigayret, Christine Sinoquet (2005) Maximal sub- triangulation in pre-processing phylogenetic data [2] –Huson D, Nettles S, Warnow T (1999) Obtaining highly accurate topology estimates of evolutionary trees from very short sequences. [3] – Barthelemy J-P, Guenoche A (1991) Trees and proximity representations [4] – Gavril F (1974) The intersection graphs of subtrees of trees are exactly the chordal graphs [5] – Rose D, Tarjan RE, Lueker G (1976) Algorithmic aspects of vertex elimination on graphs [6] – Hayward R, Hoang C, Maffray F (1989) Optimizing weakly triangulated graphs