PRESENTED BY SUNIL MANJERI Maximum sub-triangulation in pre- processing phylogenetic data Anne Berry * Alain Sigayret * Christine Sinoquet
Outline Introduction Phylogeny Preliminaries Chordal Graphs Preliminaries Threshold Family of Graphs Maintaining a family of chordal graphs Composition Scheme Algorithm References
Introduction The best evidence strongly support that all life currently on earth is descended from a single common ancestor In last 3.8 million years the single ancestor has split repeatedly into new species The evolutionary relationship between these species is referred to as phylogeny Phylogenetic trees illustrates the phylogeny of groups of organisms Basics of Phylogeny
Introduction A sample data set and phylogeny for it is shown below Basics of Phylogeny abcdef lamprey shark salmon lizard lampreyshark salmonlizard a, b f c d de Characters TaxaTaxa a – paired fins, b – jaws, c – large dermal bones, d – fin rays, e – lungs, f – rasping tongue
Introduction Data for Phylogeny Numerical Distance between objects or species distance (man, mouse) = 500 distance (man, chimp) = 100 Discrete characters Each character has finite number of states Number of legs = 1, 2, 4 DNA = {A, C, T, G} Basics of Phylogeny
Introduction Distance method of reconstructing Phylogeny trees Basics of Phylogeny Input: Given a n x n matrix M where M ij >= 0 and M ij is the distance between objects or species i and j Goal: Build and edge-weighted tree where each leaf corresponds to one object of M and so that the distances measured on the tree between leaves i and j correspond to M ij MAbcde a b c0610 d08 a b c e d Fig. 1
Phylogeny Preliminaries Definitions and properties Dissimilarity on a finite set X is a function δ:X 2 -> IR + such that for all x, y є X δ(x, y) = δ(y, x) Distance is a dissimilarity such that for all x, y є X δ(x, y) = 0 for x=y for all x, y, z є X δ(x, y) + δ(y, z) ≥ δ(x, z) In Fig. 1 let £ the set of leaves representing the taxa. For a,b є £, denote d(a,b) be the length of the ab-path or the evolutionary distance between a and b. This distance is called additive distance and the associated matrix on £ x £ is called an additive matrix Additive Matrices MAbcde a b c0610 d08
Phylogeny Preliminaries The set of values of a dissimilarity matrix M can be ordered from 0 (as M[x, y] = 0) to the maximal value. This defines a number of different thresholds (θ): 0,1,…k in increasing order The 6 dissimilarity values are: θ -1 (0)=0, θ -1 (1)=6, θ -1 (2)=8, θ -1 (3)=10, θ -1 (4)=12, θ -1 (5)=16 The 6 threshold values are: θ(0)=0, θ(6)=1, θ(8)=2, θ(10)=3, θ(12)=4, θ(16)=5 Ordinal Matrix of a dissimilarity matrix is defined as the matrix obtained by replacing each dissimilarity value by its threshold Ordinal Matrices Mabcde a b c0610 d08 Dissimilarity matrix M Mabcde a01445 b0445 c013 d02 Ordinal matrix W
Phylogeny Preliminaries Characterization 2.1 From [3], a distance matrix M on a set of taxa is additive if and only if for any quadruple {a, b, c, d} of taxa, from the 3 sums d(a, b)+d(c, d), d(a, c)+d(b, d) and d(a, d)+d(b, c), the two largest are equal Additive Matrices Mabcde a b c0610 d08 Dissimilarity matrix M d(a, b)+d(c, d) = 12 d(a, c)+d(b, d) = 24 d(a, d)+d(b, c) = 24
The Problems Reconstructing the tree is easy and can be done in polynomial time Experimental results usually does not always generate additive matrices, and inferring phylogeny remains costly and inaccurate Instead examine the ordinal properties of the dissimilarity matrix thereby examining the structure of the thresholds rather than depending only the values themselves. This approach seems to be less sensitive to small data variations. Huson, Nettles and Warnow in [2] proved that if the matrix is additive, all the graphs of the threshold family are chordal or triangulated Problem: Experimental results show that not only do the dissimilarity matrices biologists have to work with fail to be additive, but the corresponding graphs very often fail to be chordal.
Chordal Graphs Preliminaries A graph G = (V, E) is said to be chordal or triangulated if it contains no chordless cycle on more that 3 vertices Characterization A graph is chordal if and only if it is the intersection graph of a family of subtrees of a tree [4] Graph Inclusion – If G=(V, E) is a graph and G`=(V, E`) is another graph on the same vertex set, we can write G ⊆ G` if and only if E ⊆ E` and G ⊂ G` if and only if E ⊂ E`
Chordal Graphs Preliminaries Methods of correcting non-chordal graph Minimal triangulation Adding an inclusion-minimum set of edges to the graph in order to make it chordal For a given graph of n vertices and m edges, computing minimum triangulation can be done in O(nm) time Adding edges to a graph of threshold family means lowering the thresholds of the corresponding edges. Maximal triangulation Removing edges rather than adding them to make a graph chordal Maximum triangulation can be computed in O(Δm) time, where Δ is the maximum degree in the graph Correcting Chordal Graphs
Chordal graphs Preliminaries Rose, Tarjan and Lueker gave the following definition of minimal triangulation Definition 2.4 – From [5] If G = (V, E) is a non-chordal graph, a chordal graph H = (V, E + F) is said to be a minimal triangulation of G if ∀ F` ⊂ F, graph ( V, E+F` ) fails to be chordal Minimal Triangulation a b c de f g H a b c de f g G F = {bd, af} F` = {bd} or {af}
Chordal graphs Preliminaries Rose, Tarjan and Lueker also proved that only one edge needs to be removed and the resulting graph becomes non-chordal Theorem 2.5 – From [5] Let G = (V, E) be a non-chordal graph, let H = (V, E + F) be a chordal graph; H is minimum triangulation of G iff ∀ f ∈ F, graph ( V, (E+ (F \ {f}))) fails to be chordal Minimal Triangulation a b c de f g H a b c de f g G F = {bd, af} f = {bd} or {af}
Chordal graphs Preliminaries The above theorem relies on the following Lemma, which ensures that, given two chordal graphs which are mutually inclusive, there is an ordering on the edges which need to be added to the smaller graph which will maintain chordality at each edge-addition step Lemma 2.6 – From [5] Let G 1 = (V, E 1 ) be a chordal graph, let G 2 = (V, E 2 ) be a chordal graph such that G 1 ⊂ G 2. Then ∃f ∈ E 2 \ E 1 such that G` = (V, E 2 \ {f}) is chordal Minimal Triangulation a b c de f g G1G1 a b c de f g G2G2 E 2 \ E 1 = {ce, dg, bf, af, ag} Proper Ordering: ce, dg, bf, af, ag In-Proper Ordering: ce, dg, ag, af, bf
Chordal graphs Preliminaries Definition 2.8 – Let G = (V, E) be a non-chordal graph, let H = (V, E \ F) be a chordal graph. We will say that H is a maximal sub- triangulation of G if ∀F`⊂ F, (V, (E \ F) + F`) fails to be chordal Maximal sub-triangulation a b c de f g G a b c de f g H F = {cb, fb} F` = {cb} or {fb}
Maintaining Chordality Given a dissimilarity matrix, we use the associated ordinal matrix to define the corresponding threshold family of graphs Let A be a set of taxa, M be the dissimilarity matrix, W be the corresponding ordinal matrix, on thresholds be 0,1,…,k; We can define a family of graphs G 0 ⊂ G 1 ⊂ … ⊂ G k, called threshold family of graphs associated with W (and thus with M), with G i = (V, E i ), V = A and ab ∈ E i iff W A [a, b] ≤ I Example The threshold matrix induces a preorder relation ℛ: ab ℛ cd iff W[a, b] ≤ W[c, d] ℛ defines an ordered partition of edges of G k ; Each class F i of edges is defines by F i = E i – E i-1 = {xy |W[x, y] = i] Graph G i is obtained from graph G i-1 by adding set of edges F i Threshold Family of Graphs
Maintaining Chordality Threshold Family of Graphs Mabcde a b c0610 d08 Dissimilarity matrix M Mabcde a01445 b0445 c013 d02 Ordinal matrix W a b dc e G0G0 a b dc e G2G2 a b dc e G3G3 a b dc e G4G4 G i = (V, E i ), V = A and ab ∈ E i iff W A [a, b] ≤ i a b dc e G1G1
Maintaining Chordality Property 3.4 If M is an additive matrix then the threshold family of graphs defined by M is a family of chordal graphs Proof o Let T be the phylogeny associated with an additive matrix M o Let G i be the graph corresponding to threshold i ∈ [0…k] o Add internal nodes to T in order obtain a tree T`(where there is a node at mid-distance between any pair {a, b} of vertices o Consider family of subtrees of T` defined by: for each leaf x, T` x is the subtree containing all nodes at distance θ -1 (i)/2 or less from x; ExampleExample o Then G i is the intersection graph of the family of subtrees o By virtue of Characterization 2.3 (Gavril’s theorem), G i is Chordal Threshold family of graphs / Chordal graphs a b c e d
Example For i=1, θ -1 (1)/2 =3 For i=2, θ -1 (1)/2 =4 Threshold family of graphs Vs. Chordal graphs a b c e d a b dc e G1G1 a b dc e G2G2 T` 1 a b c e d T` 2
Composition Scheme To compute a threshold family of graphs which are chordal, such that each graph G i is a sub graph of the original graph G, we construct a clique G k from independent set G 0 by adding at each step an inclusion-maximal set of edges which maintains Chordality. Definition 3.7 From [6], a pair {a, b} of non-adjacent vertices is called a 2- pair iff every chordless path from a to b is of length exactly 2 An edge-addition composition scheme for chordal graphs a b {a, b} is a 2-pair
Composition Scheme Theorem 3.8 Let G 1 be a chordal graph, let {a, b} be a pair of non-adjacent vertices of G 1, let G 2 be the graph obtained from G 1 by adding edge ab; then G 2 is chordal iff {a, b} is a 2-pair of G 1 Proof o Let G 1 be a chordal graph o Let {a, b} be a pair of non-adjacent vertices of G 1 o Let G 2 be the graph obtained from G 1 by adding edge ab o Let μ = ax 1 x 2 …x k b be a longest chordless path from a to b in G 1 o In G 2, ax 1 x 2 …x k ba will be chordless path on more than 3 vertices iff μ is of length greater than 2, i.e. iff {a, b} fails to be a 2-pair of G 1. This contradicts the fact that G 1 is chordal. o Hence {a, b} is a 2-pair of G 1 An edge-addition composition scheme for chordal graphs a b
Composition Scheme Property 3.9 Let G 1 be a chordal graph, let G 2 be a chordal graph such that G 1 ⊂ G 2. Then G 2 can be obtained from G 1 by repeatedly adding an edge between the two vertices forming a 2-pair. Proof o Let G 1 be a chordal graph, let G 2 be a chordal graph such that G 1 ⊂ G 2 o By Lemma 2.6, ∃xy ∈ E 2 \ E 1 Such that (V, E 2 \ {xy}) is chordal. o By theorem 3.8, {x, y} is a 2-pair of G 2 \ {xy} o Repeat this until we obtain graph G 1. We have constructed (in reverse) a 2-pair edge addition ordering which enables us to construct G 2 from G 1 An edge-addition composition scheme for chordal graphs a b c de f g G1G1 a b c de f g G2G2 E 2 \ E 1 = {ce, dg, bf, af, ag}
Composition Scheme Composition Scheme 3.10 From above theorem, a graph on n vertices is chordal iff it can be constructed by starting with an independent set on n vertices, and by adding at each step an edge between the two vertices forming a 2-pair.
Algorithm Input: A dissimilarity matrix M on n taxa, with threshold 0,1,…,k Output: A dissimilarity matrix M`, such that every graph in the threshold family is chordal Initialization: G 0 is an independent set on n vertices; Create an empty FIFO queue Q; begin For i = 1 to k-1 do Assign G i-1 to G i Compute the set F i of pairs of {a, b} such that M[a, b] = θ -1 (i); Add F i to the queue Q; Repeat Scan Q and remove the first pair of ab which is a 2-pair Add edge ab to graph G i ; Set the value of M`[a, b] with θ -1 (i); Until Q contains no 2-pair of G i Give all remaining edges in Q value θ -1 (k) in M`; Add all remaining edges in Q to G k-1 to form G k, a clique on n vertices end An additive data pre-processing algorithm
Threshold family of graphs Mabcde a b c0610 d08 Dissimilarity matrix M Mabcde a01425 b0245 c013 d02 Ordinal matrix W Example: Consider an incorrect matrix M`abcde a b c0610 d08 Dissimilarity matrix M` Computing the Algorithm will generate the following corrected dissimilarity matrix Complexity of running the above algorithm is O(n 5 )
Reference [1] – Anne Berry, Alain Sigayret, Christine Sinoquet (2005) Maximal sub- triangulation in pre-processing phylogenetic data [2] –Huson D, Nettles S, Warnow T (1999) Obtaining highly accurate topology estimates of evolutionary trees from very short sequences. [3] – Barthelemy J-P, Guenoche A (1991) Trees and proximity representations [4] – Gavril F (1974) The intersection graphs of subtrees of trees are exactly the chordal graphs [5] – Rose D, Tarjan RE, Lueker G (1976) Algorithmic aspects of vertex elimination on graphs [6] – Hayward R, Hoang C, Maffray F (1989) Optimizing weakly triangulated graphs