An Algorithm for Constructing Parsimonious Hybridization Networks with Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University.

Slides:



Advertisements
Similar presentations
1 Modeling Query-Based Access to Text Databases Eugene Agichtein Panagiotis Ipeirotis Luis Gravano Computer Science Department Columbia University.
Advertisements

Greening Backbone Networks Shutting Off Cables in Bundled Links Will Fisher, Martin Suchara, and Jennifer Rexford Princeton University.
Vehicle Routing & Job Shop Scheduling: Whats the Difference? ICAPS03, June 13, 2003 J. Christopher Beck, Patrick Prosser, & Evgeny Selensky Dept. of Computing.
DCSP-20 Jianfeng Feng Department of Computer Science Warwick Univ., UK
A New Recombination Lower Bound and The Minimum Perfect Phylogenetic Forest Problem Yufeng Wu and Dan Gusfield UC Davis COCOON07 July 16, 2007.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut DIMACS Workshop on Algorithmics in Human.
Efficient Computation of Close Upper and Lower Bounds on the Minimum Number of Recombinations in Biological Sequence Evolution Yun S. Song, Yufeng Wu,
B-Trees. Motivation When data is too large to fit in the main memory, then the number of disk accesses becomes important. A disk access is unbelievably.
Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population Yufeng Wu Dept. of Computer Science and Engineering University of.
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007.
Branch and Bound Optimization In an exhaustive search, all possible trees in a search space are generated for comparison At each node, if the tree is optimal.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Molecular Evolution Revised 29/12/06
D. Gusfield, V. Bansal (Recomb 2005) A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters.
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
Close Lower and Upper Bounds for the Minimum Reticulate Network of Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
CSB Efficient Computation of Minimum Recombination With Genotypes (Not Haplotypes) Yufeng Wu and Dan Gusfield University of California, Davis.
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees Yufeng Wu and Jiayin Wang Department of Computer Science and Engineering University.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
FPGA Acceleration of Gene Rearrangement Analysis Jason D. Bakos Dept. of Computer Science and Engineering University of South Carolina Columbia, SC USA.
Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &
Building Phylogenies Parsimony 2.
Tree-Building. Methods in Tree Building Phylogenetic trees can be constructed by: clustering method optimality method.
1 Section 9.2 Tree Applications. 2 Binary Search Trees Goal is implementation of an efficient searching algorithm Binary Search Tree: –binary tree in.
CIS786, Lecture 4 Usman Roshan.
Phylogeny Estimation: Why It Is "Hard", and How to Design Methods with Good Performance Tandy Warnow Department of Computer Sciences University of Texas.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
PERFORMANCE ANALYSIS cont. End-to-End Speedup  Execution time includes communication costs between FPGA and host machine  FPGA consistently outperforms.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Constructing evolutionary trees from rooted triples Bang Ye Wu Dept. of Computer Science and Information Engineering Shu-Te University.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin.
Phylogenetic Trees - Parsimony Tutorial #13
Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
CSCE350 Algorithms and Data Structure Lecture 21 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
Yufeng Wu and Dan Gusfield University of California, Davis
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
WABI: Workshop on Algorithms in Bioinformatics
ReCombinatorics The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination Dan Gusfield U. Oregon , May 8, 2012.
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
Analysis & Design of Algorithms (CSCE 321)
Recombination, Phylogenies and Parsimony
Backtracking and Branch-and-Bound
CSE 373: Data Structures and Algorithms
Phylogeny.
CS 394C: Computational Biology Algorithms
Algorithms for Inferring the Tree of Life
Presentation transcript:

An Algorithm for Constructing Parsimonious Hybridization Networks with Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University of Connecticut, USA RECOMB

Keep two red edges Keep two black edges Reticulation event(s): nodes with in-degree two or more TATA TBTB Hybridization Networks Gene trees: phylogenetic history for individual genes - Inferred from gene sequences -Assume: Binary and rooted - Different topologies at different genes Reticulate evolution: one explanation - Hybrid speciation, horizontal gene transfer Gene A 1: T C G 2: T C A 3: C G G 4: C C G Gene B 1: A G C 2: T G T 3: A A C 4: A G T Hybridization network: A directed acyclic graph displaying each gene tree 2

The Minimum Hybridization Network Problem Given: a set of K gene trees G. Problem: reconstruct hybridization networks with Rmin(G), the minimum number, reticulation events displaying each gene tree. NP complete: even for K=2 Most current approaches: exact methods for K=2 case (see Semple, et al) impose topological constraints (e.g. galled networks, see Huson, et al.) or work on small-scale topologies T1T T2T2 T3T N 2 reticulation events. Minimum! 3

What if K  3? R min (G) LB(G) < < UB(G) The lower and upper bounds approach (Wu, 2010) for Rmin(G): ( G: K gene trees) If LB(G)=UB(G), then R min (G) = LB(G) = UB(G) 4 Problem: if LB(G) < UB(G), then do not know the exact value of Rmin This talk: the first exact algorithm for constructing the most parsimonious hybridization network for the K  3 case. The K  3 case is much harder than the two tree case

Backward in Time View and Ancestral Configurations Backward in time: 1.Two lineages coalesce into one lineage. 2.One lineage reticulates into two lineages Three input trees Hybridization Network T0T0 T1T1 T2T2 Time Coalescence Reticulation T3T3 T4T4 T5T5 {1,2,3,4,5} {1,3,4,5,a,b} {3,4,5,a,c} {3,5,a,c,d,e} {3,5,a,d,f} {5,a,d,g} a b c de f g h i j {5,g,h} {5,i} {j} Ancestral configuration (AC): set of lineages in the network that are alive at time t. AC Hybridization network = A series of ACs Search for ACs: guided by the input trees

Lineage in AC: Display Input Subtrees Each lineage in network displays one or more input subtrees Which reticulation edge to follow? Lineage represented by the set of displayed subtrees Subtrees labeled T0T0 T1T1 T2T2 T3T3 Progress of displayed subtrees when moving back: The more move backward, the larger input subtrees obtained. When a single lineage displays each complete input tree, done. T4T4 T5T     T1T     T2T    T3T3 a b c {1,  } Display 1 Display  Display  6

Search for Optimal ACs High-level idea: breath-first style search for optimal ancestral configurations     T1T     T2T    T3T3 (1),(2),(3),(4),(5)(1),(2),(3),(3),(4),(5)(1),(2),(2),(3),(4),(5)(1),(1),(2),(3),(4),(5)(1),(2),(3),(4),(4),(5)(1),(2),(3),(4),(5),(5) (1,  ),(2),(3),(4),(5)(1),(2,  ),(3),(4),(5)(1,  ),(2),(3),(4),(5)(1),(2),(3),(4,  ),(5)(1,  ),(2),(3),(4),(5)(1),(2,  ),(3),(4),(5)(1),(2),(3),(4,  ),(5)(1),(2),(3,  ),(4),(5) Initial AC at level 0 Level 1 Level 2... ACs found by one reticulation from the initial AC ACs found by one or more coalescences Level k: all ACs reachable from initial AC with k reticulation (and any number of coalescences) Stop when reaching a final configuration displaying each complete input tree.... 7

The configuration search algorithm gives optimal network Efficiency: space of ACs is huge. For an AC with n lineages, E.g. n = 30, up to 465 new ACs with one reticulation or coalescence. Infeasible: for data with even moderate size Prune infeasible ACs: sometimes a coalescence lead s to an AC that is incompatible with the input trees Key to make the AC search feasible for relatively large data Works when Rmin is relatively small Issues in Searching for Optimal ACs 8

Techniques for Pruning ACs T1T T2T2 a a b b c c Compatible Coalesce 1 and 2 Incompatible Coalesce 3 and 4 Incompatible AC: if some input subtrees can not be displayed A leaf under a lineage: covered Incompatible: some leaf not covered by any lineage. There are stronger rules (see paper) a,c b b Compatible Reticulate 3 Compatible ,b Coalesce 3 and 4 9

Implementation and Simulation 10 Simulation Data: from Wu (2010) Simulate a hybridization network N backwards in time for n species Randomly select K trees embedded in N. Evaluation Creteria: Compare with the original lower and upper bound approach: do the bounds give optimal network? The algorithm is implemented in a downloadable open-source software tool: An exact method: PIRN C : Can find exact Rmin when Rmin is relatively small (say 5 or less). Also a heuristic method for larger data: PIRN Ch. Search in a smaller space of ACs with a greedy approach.

Performance of Exact Method: PIRNc Only datasets with Rmin  4 are used. 100 datasets in total. Number of taxa: fixed to 10. K: number of gene trees, between 3 to 5 PIRNc better: % of datasets PIRNc finds optimal Rmin but not the bounds approach. LB: existing lower bound method UB: existing upper bond method PIRNc: always find optimal solution (if run to end) # of datasets PIRNc is better 11

Performance of Heuristic Method: PIRN ch PIRNc becomes slow when Rmin increases. PIRNch  UB: # of datasets among 100 datasets PIRNch < Upper Bound PIRNch outperforms the original lower bound/upper bound approach for larger daaets among 100 datasets Larger data with taxa number: 30, 40 or datasets each. PIRNch: heuristic for larger data. 12

13 Acknowledgement More information available at: Research supported by US National Science Foundation