Presentation is loading. Please wait.

Presentation is loading. Please wait.

New Approaches for Inferring the Tree of Life

Similar presentations


Presentation on theme: "New Approaches for Inferring the Tree of Life"— Presentation transcript:

1 New Approaches for Inferring the Tree of Life
Tandy Warnow Associate Professor Department of Computer Sciences Graduate Program in Ecology, Evolution, and Behavior Co-Director The Center for Computational Biology and Bioinformatics The University of Texas at Austin

2 Packard Proposal 1996 I observed that DNA and RNA sequences are low in phylogenetic signal, as currently analyzed, and I proposed to seek out and model new sources of significant phylogenetic signal, and then develop efficient algorithms to extract that signal, so that the inference of evolutionary history could be made with greater accuracy.

3 What I did instead Developed methods for use with biomolecular sequences that recover the true tree with high probability from polynomial length sequences. (Last two years): Developed methods for reconstructing phylogenies from gene order and content within whole genomes. (Last year): Started looking at inferring non-tree models of evolution.

4 DNA Sequence Evolution
-3 mil yrs -2 mil yrs -1 mil yrs today AAGACTT AAGACTT TGGACTT AAGGCCT AGGGCAT TAGCCCT AGCACTT AAGGCCT TGGACTT AAGGCCT AAGGCCT TGGACTT TGGACTT TAGCCCA TAGACTT AGCGCTT AGCACAA AGGGCAT TAGCCCT AGCACTT AGGGCAT TAGCCCT AGCACTT AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT

5 Major Phylogenetic Reconstruction Methods
Polynomial-time distance-based methods (neighbor joining, perhaps the most popular) NP-hard sequence-based methods Maximum Parsimony Maximum Likelihood that can take years on real datasets Heated debates over the relative performance of these methods

6 Quantifying Error FN: false negative (missing edge) FP: false positive
(incorrect edge) 50% error rate FP

7 Absolute fast convergence vs. exponential convergence

8 DCM-Boosting [Warnow et al. 2001]
DCM+SQS is a two-phase procedure which reduces the sequence length requirement of methods. Exponentially converging method Absolute fast converging method DCM SQS We modify the second phase to improve the empirical performance, replacing SQS with ML (maximum likelihood) or MP (maximum parsimony).

9 DCMNJ+ML vs. other methods on a fixed model tree
500-taxon rbcL tree K2P+ model (=2, =1) Avg. branch length = 0.278 Relative performance is typical in our studies

10 Comparison of methods on random trees as a function of number of taxa
K2P+ model (=2, =1) Avg. branch length = 0.05 Seq. length = 1000

11 Summary These are the first polynomial time methods that improve upon NJ (with respect to topological accuracy) and are never worse than NJ. The advantage obtained with DCMNJ+MP and DCMNJ+ML increases with number of taxa. In practice these new methods are slower than NJ (minutes vs. seconds), but still much faster than MP and ML (which can take days). Conjecture: DCMNJ+ML is AFC.

12 II. Whole-Genome Phylogeny
A B C D E F A B C D E F X Y Z W

13 Genomes As Signed Permutations
1 – or –3 5 –1 etc.

14 Genomes Evolve by Rearrangements
Inversion: –8 –7 –6 – Transposition: Inverted Transposition: –8 –7 –6 –

15 Genome Rearrangement Has A Huge State Space
DNA sequences : 4 states per site Signed circular genomes with n genes: states, 1 site Circular genomes (1 site) with 37 genes: states with 120 genes: states

16 Our Approaches Statistically-based genomic distance estimators so that NJ analyses are more accurate, recovering 90% of the edges even for datasets close to saturation. Improved bounds for tree length. GRAPPA: high performance implementation for the maximum parsimony problems for rearranged genomes, achieving up to 200,000-fold speedup.

17 Accuracy of Neighbor Joining Using Distance Estimators
120 genes Inversion-only evolution (other models of evolution show the same relative performance) 10, 20, 40, 80, and 160 genomes

18 Consensus of 216 MP Trees for the Campanulaceae dataset
Trachelium Campanula Adenophora Symphandra Legousia Asyneuma Triodanus Wahlenbergia Merciera Codonopsis Cyananthus Platycodon Tobacco Strict Consensus of 216 trees; 6 out of 10 internal edges recovered.

19 Future Work New focus on Rare Genomic Changes New data New models
New methods New techniques for large-scale analyses Divide-and-conquer methods Non-tree models Visualization of large trees and large sets of trees

20 Acknowledgements Funding: The David and Lucile Packard Foundation,
The National Science Foundation, and Paul Angello Collaborators: Robert Jansen (U. Texas) Bernard Moret, David Bader, Mi-Yan (U. New Mexico) Daniel Huson (Celera) Katherine St. John (CUNY) Linda Raubeson (Central Washington U.) Luay Nakhleh, Usman Roshan, Jerry Sun, Li-San Wang, Stacia Wyman (Phylolab, U. Texas)

21 Phylolab, U. Texas Please visit us at


Download ppt "New Approaches for Inferring the Tree of Life"

Similar presentations


Ads by Google