Iterative-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees Usman Roshan and Tandy Warnow U. of Texas at Austin Bernard Moret.

Slides:



Advertisements
Similar presentations
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Advertisements

Challenges in computational phylogenetics Tandy Warnow Radcliffe Institute for Advanced Study University of Texas at Austin.
Computer Science and Reconstructing Evolutionary Trees Tandy Warnow Department of Computer Science University of Illinois at Urbana-Champaign.
Large-Scale Phylogenetic Analysis Tandy Warnow Associate Professor Department of Computer Sciences Graduate Program in Evolution and Ecology Co-Director.
Computational biology and computational biologists Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular and Molecular Biology.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Multiple sequence alignment methods: evidence from data CS/BioE 598 Tandy Warnow.
CIS786, Lecture 5 Usman Roshan.
CIS786, Lecture 7 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
BNFO 602 Phylogenetics Usman Roshan.
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.
How to See a Tree for a Forest? Combining Phylogenetic Trees – Reasons, Methods, and Consequences Tanya Y. Berger-Wolf Laboratory for High-Performance.
BNFO 602, Lecture 3 Usman Roshan Some of the slides are based upon material by David Wishart of University.
CIS786, Lecture 3 Usman Roshan.
Phylogeny reconstruction BNFO 602 Roshan. Simulation studies.
BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.
CIS786, Lecture 4 Usman Roshan.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow.
Computational and mathematical challenges involved in very large-scale phylogenetics Tandy Warnow The University of Texas at Austin.
Combinatorial and graph-theoretic problems in evolutionary tree reconstruction Tandy Warnow Department of Computer Sciences University of Texas at Austin.
Phylogeny Estimation: Why It Is "Hard", and How to Design Methods with Good Performance Tandy Warnow Department of Computer Sciences University of Texas.
CIPRES: Enabling Tree of Life Projects Tandy Warnow The Program in Evolutionary Dynamics at Harvard University The University of Texas at Austin.
CIPRES: Enabling Tree of Life Projects Tandy Warnow The University of Texas at Austin.
Disk-Covering Methods for phylogeny reconstruction Tandy Warnow The University of Texas at Austin.
Phylogenetic Tree Reconstruction Tandy Warnow The Program in Evolutionary Dynamics at Harvard University The University of Texas at Austin.
Complexity and The Tree of Life Tandy Warnow The University of Texas at Austin.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Barking Up the Wrong Treelength Kevin Liu, Serita Nelesen, Sindhu Raghavan, C. Randal Linder, and Tandy Warnow IEEE TCCB 2009.
Computer Science Research for The Tree of Life Tandy Warnow Department of Computer Sciences University of Texas at Austin.
Gene Order Phylogeny Tandy Warnow The Program in Evolutionary Dynamics, Harvard University The University of Texas at Austin.
Rec-I-DCM3: A Fast Algorithmic Technique for Reconstructing Large Evolutionary Trees Usman Roshan Department of Computer Science New Jersey Institute of.
NP-hardness and Phylogeny Reconstruction Tandy Warnow Department of Computer Sciences University of Texas at Austin.
Genome alignment Usman Roshan. Applications Genome sequencing on the rise Whole genome comparison provides a deeper understanding of biology – Evolutionary.
P ROBLEM Write an algorithm that calculates the most efficient route between two points as quickly as possible.
CIPRES: Enabling Tree of Life Projects Tandy Warnow The University of Texas at Austin.
How to See a Tree for a Forest? Combining Phylogenetic Trees – Reasons, Methods, and Consequences Tanya Y. Berger-Wolf DIMACS and UIC The affinities of.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:
CIPRES: Enabling Tree of Life Projects Tandy Warnow The University of Texas at Austin.
Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin.
Algorithms research Tandy Warnow UT-Austin. “Algorithms group” UT-Austin: Warnow, Hunt UCB: Rao, Karp, Papadimitriou, Russell, Myers UCSD: Huelsenbeck.
GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular.
Using Divide-and-Conquer to Construct the Tree of Life Tandy Warnow University of Illinois at Urbana-Champaign.
SupreFine, a new supertree method Shel Swenson September 17th 2009.
The Big Issues in Phylogenetic Reconstruction Randy Linder Integrative Biology, University of Texas
Problems with large-scale phylogeny Tandy Warnow, UT-Austin Department of Computer Sciences Center for Computational Biology and Bioinformatics.
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Multiple Sequence Alignment with PASTA Michael Nute Austin, TX June 17, 2016.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
Scaling BAli-Phy to Large Datasets June 16, 2016 Michael Nute 1.
The Disk-Covering Method for Phylogenetic Tree Reconstruction
New Approaches for Inferring the Tree of Life
Challenges in constructing very large evolutionary trees
Techniques for MSA Tandy Warnow.
Buffer Insertion with Adaptive Blockage Avoidance
CIPRES: Enabling Tree of Life Projects
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
CS 581 Tandy Warnow.
Tandy Warnow Department of Computer Sciences
New methods for simultaneous estimation of trees and alignments
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
CS 394C: Computational Biology Algorithms
Algorithms for Inferring the Tree of Life
Tandy Warnow The University of Texas at Austin
Tandy Warnow The University of Texas at Austin
Presentation transcript:

Iterative-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees Usman Roshan and Tandy Warnow U. of Texas at Austin Bernard Moret and Tiffani Williams U. of New Mexico

Large-Scale Phylogeny Reconstruction Challenge: New techniques are required which can find optimal MP/ML trees quickly (especially on large datasets)

Speeding up MP/ML heuristics Time MP score of best trees Performance of current heuristics Desired Performance Fake study

This talk New technique, based upon a particular divide-and-conquer strategy (DCM3), for speeding up heuristics for MP Comparison against current MP heuristics on real datasets Future research

DCM3 Decompositions Input: Set S of sequences, and guide-tree T 1. Compute “short subtree” graph G(S,T), based upon T 2. Find clique separator in the graph G(S,T), and form subproblems

New technique: Iterative DCM3 Repeat: 1. Apply TBR-based local search till a local optimum is reached. 2. Obtain a DCM3-decomposition based upon the local optimum (the “guide tree” ). 3. Apply base method to subproblems, and merge subtrees using the Strict Consensus Merger. 4. Randomly refine the tree. Variants we have examined: I-DCM3(TBR) and I-DCM3(Ratchet).

Comparison of MP heuristics Datasets: All datasets have uninformative sites removed 429 Eukaryotes rDNA (Lipscomb et. al.) 576 Metazoa DNA (Goloboff) 500 rbcL DNA (Rice et. al.) 567 rbcL, atpb, and 18s DNA (three-gene; Soltis et. al.) 854 rbcL DNA (Goloboff) 921 Avian Cytochrome DNA (birds; Johnson) 2000 Eukaryotes sRNA (Gutell et. al.) 2594 rbcL DNA (Kallersjo et. al.)

Comparison of MP heuristics Methods: TBR search, Ratchet, I-DCM3(TBR), I-DCM3(Ratchet) Datasets: Biological data Experimental Methodology: –On each dataset we ran 10 trials of each method (each trial for 24 hours). –We then plotted avg. best MP scores after fixed time intervals. Implementation: Ratchet was implemented using PAUP*4.0 and I-DCM3 was implemented by us using C++. We used Linux Pentium machines for our experiments.

2000 Eukaryotes sRNA (Gutell et. al.)

2594 rbcL DNA (Kallersjo et. al.)

Conclusions I-DCM3(Ratchet) finds best known trees faster than Ratchet. On larger trees the improvement of I-DCM3 (Ratchet) over Ratchet is more pronounced. Out of 10 trials, on the two largest datasets, best I-DCM3(Ratchet) tree is 9 and 7 steps better then best Ratchet tree

Future work Use recursive I-DCM3 for analyzing very large datasets Biological analysis of real datasets Use I-DCM3 for boosting ML heuristics