Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees Yufeng Wu and Jiayin Wang Department of Computer Science and Engineering University.

Similar presentations


Presentation on theme: "Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees Yufeng Wu and Jiayin Wang Department of Computer Science and Engineering University."— Presentation transcript:

1 Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees Yufeng Wu and Jiayin Wang Department of Computer Science and Engineering University of Connecticut ISBRA 2010

2 Phylogenetic Tree and Hybridization Network Phylogenetic Tree: rooted, binary trees 1 1 2 2 3 3 4 4 delete two yellow edges delete two red edges Hybridization Number Problem : compute the minimum hybridization events needed to construct a hybridization network displaying two trees Hybridization event: nodes with in-degree two or more 1 1 2 2 3 3 4 4 Input phylogenies ρ ρ 1 1 3 3 2 2 4 4 ρ ρ Hybridization Network: a directed acyclic graph displays two phylogenetic trees in a compact way Reticulate Evolution: tree model no longer sufficient: e.g. hybrid speciation, horizontal gene transfer, recombination T T’

3 rSPR distance problem : the minimum number of rooted Subtree Prune and Regraft operations to transform T to T’ A Related Problem: rSPR Distance Problem 1 1 2 2 3 3 4 4 ρ ρ 1 1 2 2 3 3 4 4 Input phylogenies ρ ρ 1 1 3 3 2 2 4 4 ρ ρ Prune 3 Regraft 3 One rSPR operation rSPR distance of two phylogenies = the number of subtrees in Maximum Agreement Forest (MAF) - 1 (Hein, et al and Bordewich, et al) Two rSPR operations 1 1 2 2 3 3 4 4 ρ ρ Prune 2 Regraft 2 T T’

4 Maximum Agreement Forest (MAF) Agreement Forest of T and T’: a set of subtrees s.t. – the two subtrees in AF have same topology in T and T’ – subtrees partition the given taxa – any two subtrees are vertex-disjoint Maximum Agreement Forest is an agreement forest of two trees where the number of subtrees is minimized ρ ρ Maximum Agreement Forest Number of subtrees is 3 1 1 2 2 3 3 4 4 Input phylogenies ρ ρ 1 1 6 6 2 2 5 5 ρ ρ TT’ 5 5 6 6 3 3 4 4 1 1 2 2 3 3 4 4 6 6 5 5 ρ ρ Agreement Forest 1 1 2 2 3 3 4 4 6 6 5 5

5 Maximum Acyclic Agreement Forest (MAAF) Maximum Acyclic Agreement Forest : subtrees in MAF are acyclic 1 1 2 2 3 3 4 4 5 5 3 3 4 4 1 1 2 2 5 5 5 5 1 1 2 2 3 3 4 4 1 1 2 2 5 5 3 3 4 4 Maximum Acyclic Agreement Forest MAF T 12 T 34 Cyclic Graph of Agreement Forest: GF(T,T’) nodes in graph G correspond to trees in the AF an edge from T i to T j if T i is ancestral to T j in the AF When graph of the AF is acyclic, the AF is said to be acyclic Input phylogenies TT’ Ti in AF is ancestral to Tj if the root of Ti s ancestral to the root of Tj in either T or T’ Graph of AF

6 Hybridization Number and Size of MAAF Hybridization Number of two original trees = the number of subtrees in a MAAF -1 (Baroni, et al, 2005) 1 1 2 2 5 5 3 3 4 4 Maximum Acyclic Agreement Forest 1 1 2 2 3 3 4 4 5 5 3 3 4 4 1 1 2 2 5 5 2 2 1 1 3 3 4 4 5 5 For example, the size of the Maximum Acyclic Agreement Forest is 3, so the hybridization number is 3-1=2 Node 3 and 4 are hybridization events Hybridization Network Keep two yellow edges Keep two red edges Input phylogenies TT’

7 Computation of the Exact Hybridization Number HybridNumber Previous Work: Bordewich, Semple, et al, (2007), HybridNumber Triple incompatible ILP constraint for triple 1,2,3: C 1 +C 2 +C 3 +C 4 +C 5 ≤1 1 1 2 2 3 3 4 4 Input phylogenies ρ ρ 1 1 3 3 2 2 4 4 ρ ρ e1e1 e2e2 e3e3 e4e4 e5e5 Our Idea: Find a minimum collection of edge-cuts to break down the tree into MAAF Our Approach: Use Integer Linear Programming (ILP) to minimize the number of subtrees Object C i =1 if edge e i is cut Subject to 3 groups of constraints to ensure the result AF is MAAF Triple Constraint Pathway Constraint Cyclic Constraint More details for Triple Constraint and Pathway Constraint in Wu (2009)

8 Difficulty: Graph of AF depends on AF Graph of AF and Leaf Pair (LP) Graph 1,2 3,4 Part of the Leaf Pair (LP) Graph Leaf Pair (LP) Graph: a node corresponds to a pair of two distinct leaves create an edge from lp(i,j) to lp(p,q) if: the path between i and j is disjoint with that of p and q in both T and T’; and lp(i,j) is ancestral to lp(p,q) in either T or T’ 1 1 2 2 3 3 4 4 5 5 3 3 4 4 1 1 2 2 5 5 Input phylogenies TT’ MRCA(1,2) MRCA(3,4) leaf pair lp(i,j) is ancestral to lp(p,q) if Most Recent Common Ancestor (MRCA) of (i,j) is ancestral to MRCA of (p,q)

9 Acyclicity of Leaf Pair Graph Realized Leaf Pair: if the two leaves are in the same subtree Reduced LP Graph: A LP Graph for a certain AF Lemma: For an AF, say F, GF(T,T’) is acyclic iff LP Graph(F) is acyclic Add constraints naively: enumerate all cycles – impractical in most cases 1 1 2 2 5 5 3 3 4 4 Maximum Acyclic Agreement Forest 1 1 2 2 3 3 4 4 5 5 3 3 4 4 1 1 2 2 5 5 Input phylogenies TT’ Maximum Agreement Forest 1 1 2 2 3 3 4 4 5 5 1,2 3,4

10 deal with Infeasible twin pair: M i,j + M p,q ≤ 1 M i,j =1 if the path between i and j is not cut Enumerate all possible elementary cycles after reduce infeasible twin pairs in biological data, it seems a great reduction An Easy Way for Acyclic Constraints 4,5 1,2 4,6 1 1 2 2 3 3 4 4 5 5 6 6 7 7 Input phylogenies T T’ 4 4 5 5 6 6 1 1 2 2 3 3 7 7 3,7 ILP Constraint: M 1,3 + M 4,5 ≤ 1 1,3

11 Speed up by Divide and Conquer Approach Subtree Reduction: replace a pendant subtree occurs identically in T and T’ with a new label Subtree reduction keeps the Hybridization Number 1 1 2 2 3 3 4 4 5 5 6 6 7 7 1 1 2 2 3 3 5 5 4 4 7 7 6 6 Input phylogenies TT’ Cluster Reduction: replace a cluster common to T and T’, say T 1 and T’ 1 with a new label, the rest part of two trees are T 2 and T’ 2 h(T,T’)=h(T 1,T’ 1 )+h(T 2,T’ 2 ) See Bordewich, et al (2007) for detail 8 8 8 8 T1T1 T’ 1 T2T2 T’ 2 9 9 9 9

12 Simulation datasets are from Beiko and Hamilton (2006) Each pair of phylogenies has 100 leaves and generated by applying 10 rSPR operations on one tree HybridNumber is another software tool to compute exact Hybridization Number This version of HybridNumber downloaded in Oct. 2009 Later version of HybridNumber appears faster, but still very slow for EEEP data Results on Simulation Datasets Running time (s)

13 Pair#Taxa #Hybrid- ization SPRDist (CPLEX) Hybrid Number 140145s3s 2361310s3s 334127s6s 41991s 5461951s667s 62140s1s 72173s1s 81431s 93081s 10261314s16s 111271s 12291480s4h2716s 131010s1s 143115115s7h776s 15 81s2s Tree pairs for a Grass (Poaceas) dataset from the Grass Phylogeny Working Group (2001) The results are gained under CPLEX environment Results on Biological Datasets The later version of HybridNumber gives roughly the same running time with ours but still not so scalable

14 Acknowledgment Research is supported by National Science Foundation [IIS-0803440] and the Research Foundation of University of Connecticut


Download ppt "Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees Yufeng Wu and Jiayin Wang Department of Computer Science and Engineering University."

Similar presentations


Ads by Google