Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin.

Similar presentations


Presentation on theme: "Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin."— Presentation transcript:

1 Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin

2 Phylogeny Orangutan GorillaChimpanzee Human From the Tree of the Life Website, University of Arizona

3 Reconstructing the “Tree” of Life Handling large datasets: millions of species NSF funds many projects towards this goal, under the Assembling the Tree of Life (ATOL) program

4 Current projects Heuristics for NP-hard optimization problems for phylogeny reconstruction “Phylogenetic” multiple sequence alignment Detecting and reconstruction horizontal gene transfer and hybridization Constructing phylogenies on languages Graph-theory, combinatorial optimization, probabilistic analysis, are fundamental to algorithm development in this area. But all methods are extensively tested in simulation and on real data as well. Collaborations with biologists or linguists are essential.

5 DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT TAGCCCATAGACTTAGCGCTTAGCACAAAGGGCAT TAGCCCTAGCACTT AAGACTT TGGACTTAAGGCCT AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT

6 Phylogeny Problem TAGCCCATAGACTTTGCACAATGCGCTTAGGGCAT UVWXY U VW X Y

7 Solving NP-hard problems exactly is … unlikely Number of (unrooted) binary trees on n leaves is (2n-5)!! If each tree on 1000 taxa could be analyzed in 0.001 seconds, we would find the best tree in 2890 millennia #leaves#trees 43 515 6105 7945 810395 9135135 102027025 202.2 x 10 20 1004.5 x 10 190 10002.7 x 10 2900

8 1.Hill-climbing heuristics (which can get stuck in local optima) 2.Randomized algorithms for getting out of local optima 3.Approximation algorithms (give bounds on what is possible) Approaches for “solving” hard optimization problems (like maximum parsimony) Phylogenetic trees Cost Global optimum Local optimum

9 Problems with current techniques for MP Shown here is the performance of a heuristic maximum parsimony analysis on a real dataset of almost 14,000 sequences. (“Optimal” here means best score to date, using any method for any amount of time.) Acceptable error is below 0.01%. Performance of TNT with time

10 Performance of NJ, a popular polynomial time method [Nakhleh et al. ISMB 2001] Simulation study based upon fixed edge lengths, K2P model of evolution, sequence lengths fixed to 1000 nucleotides. Error rates reflect proportion of incorrect edges in inferred trees. NJ 0 40080016001200 No. Taxa 0 0.2 0.4 0.6 0.8 Error Rate

11 DCMs (Disk-Covering Methods) DCMs for polynomial time methods improve topological accuracy (empirical observation), and have provable theoretical guarantees under Markov models of evolution DCMs for hard optimization problems reduce running time needed to achieve good levels of accuracy (empirically observation)

12 DCMs: Divide-and-conquer for improving phylogeny reconstruction

13 “Boosting” phylogeny reconstruction methods DCMs “boost” the performance of phylogeny reconstruction methods. DCM Base method MDCM-M

14 Iterative-DCM3 T T’ Base method DCM3

15 Rec-I-DCM3 significantly improves performance Comparison of TNT to Rec-I-DCM3(TNT) on one large dataset Current best techniques DCM boosted version of best techniques

16 DCM1-boosting distance-based methods [Nakhleh et al. ISMB 2001] DCM1-boosting makes distance- based methods more accurate Theoretical guarantees that DCM1-NJ converges to the true tree from polynomial length sequences NJ DCM1-NJ 0 40080016001200 No. Taxa 0 0.2 0.4 0.6 0.8 Error Rate

17 General comments Everything in phylogeny (just about) is NP-hard Graph-theory, probability, and optimization are the basic tools for algorithmic advances Algorithms are tested on both real and simulated data. Collaborations with domain experts (biologists or linguists) essential to success. (At UT, we have wonderful biologists to work with, and all my students collaborate with them.)

18 For more information Send me email to make an appointment Check my webpage for tutorials on the subject See http://www.phylo.org and http://www.cs.utexas.edu/~tandy for more infohttp://www.phylo.org http://www.cs.utexas.edu/~tandy


Download ppt "Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin."

Similar presentations


Ads by Google