Download presentation
Presentation is loading. Please wait.
Published byScarlett Copeland Modified over 9 years ago
1
Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow
2
Phylogeny From the Tree of the Life Website, University of Arizona Orangutan GorillaChimpanzee Human
3
DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT TAGCCCATAGACTTAGCGCTTAGCACAAAGGGCAT TAGCCCTAGCACTT AAGACTT TGGACTTAAGGCCT AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT TAGCCCATAGACTT AGCGCTT AGCACAAAGGGCAT TAGCCCTAGCACTT
4
Molecular Phylogenetics TAGCCCATAGACTTTGCACAATGCGCTTAGGGCAT UVWXY U VW X Y (Tree is unrooted)
5
Phylogeny Orangutan GorillaChimpanzee Human From the Tree of the Life Website, University of Arizona
6
Evolution informs about everything in biology Big genome sequencing projects just produce data -- so what? Evolutionary history relates all organisms and genes, and helps us understand and predict –interactions between genes (genetic networks) –drug design –predicting functions of genes –influenza vaccine development –origins and spread of disease –origins and migrations of humans
7
Evolutionary trees and the pharmaceutical industry Big genome sequencing projects just produce data -- so what? Evolutionary history relates all organisms and genes, and evolutionary trees are used to make important biological discoveries. The pharmaceutical industry uses phylogenies for many applications, such as the development of influenza vaccine! Inaccuracies in the phylogenies lead to inaccurate predictions (e.g., vaccines that don’t work, drugs that don’t have the required properties). Current software isn’t accurate enough, or fast enough! This means $$$!
8
NSF’s program for Assembling the Tree of Life “The Tree of Life has proven useful in many fields, such as: choosing experimental systems for biological research, determining which genes are common to many kinds of organisms and which are unique, tracking the origin and spread of emerging diseases and their vectors, bio-prospecting for pharmaceutical and agrochemical products, Developing databases for genetic information, and evaluating risk factors for species conservation and ecosystem restoration.”
9
Computational challenges for Assembling the Tree of Life (NSF) 8 million species for the Tree of Life -- cannot currently analyze more than a few hundred (and even this takes years) We need new methods for inferring large phylogenies - hard optimization problems! We need new software for visualizing large trees We need new database technology
10
We are world leaders in research in Computational Phylogenetics “DCM-boosting” for phylogeny reconstruction - improves accuracy and speeds up heuristics for NP-hard problems (Warnow, UT-Austin) GRAPPA -- software for whole genome phylogeny (Moret, UNM) Visualization of large trees, and sets of trees (Amenta, UC Davis) Phylogenetic databases (Miranker)
11
DCM-boosting improves methods [Nakhleh et al. ISMB 2001] Random trees K2P+Gamma model Sequence length=1000 Average branch length=0.05 DCM-NJ+SQS NJ DCM-NJ+MP HGT-FP 0 40080016001200 # leaves 0 0.2 0.4 0.6 0.8 Error Rate
12
(Figure) [Nakhleh et al. ISMB 2001] Random trees K2P+Gamma model Sequence length=1000 Average branch length=0.05 DCM*-NJ NJ DCM-NJ + MP HGT-FP 0 40080016001200 No. Taxa 0 0.2 0.4 0.6 0.8 Avg. RF
13
DCM-boosting phylogenetic reconstruction methods [Nakhleh et al. ISMB 2001] DCM-boosting makes fast methods more accurate DCM-boosting speeds- up heuristics for hard optimization problems NJ DCM-NJ 0 40080016001200 No. Taxa 0 0.2 0.4 0.6 0.8 Error Rate
14
Whole-Genome Phylogenetics A B C D E F X Y Z W A B C D E F
15
Benchmark gene order dataset: Campanulaceae 12 genomes + 1 outgroup (Tobacco), 105 gene segments NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.)
16
Benchmark gene order dataset: Campanulaceae 12 genomes + 1 outgroup (Tobacco), 105 gene segments NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.) 2000: Using GRAPPA v1.1 on the 512-processor Los Lobos Supercluster machine: 2 minutes (200,000-fold speedup per processor)
17
Benchmark gene order dataset: Campanulaceae 12 genomes + 1 outgroup (Tobacco), 105 gene segments NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.) 2000: Using GRAPPA v1.1 on the 512-processor Los Lobos Supercluster machine: 2 minutes (200,000-fold speedup per processor) 2003: Using latest version of GRAPPA: 2 minutes on a single processor (1-billion-fold speedup per processor)
18
GRAPPA (Genome Rearrangement Analysis under Parsimony and other Phylogenetic Algorithms) http://www.cs.unm.edu/~moret/GRAPPA/ Heuristics for NP-hard optimization problems Fast polynomial time distance-based methods Contributors: U. New Mexico,U. Texas at Austin, Universitá di Bologna, Italy Fastest and most accurate software for whole genome phylogeny worldwide
19
Opportunities New phylogenetic reconstruction software can improve pharmaceutical R&D (making more accurate solutions achievable in hours or days, rather than months or years) Software for researchers is available as free (open source), but users need the latest tools now, with proper interfaces -- business opportunity.
20
Participants and Funding University of Texas Computer Scientists: Warnow, Dhillon, Hunt, and Miranker University of Texas biologists: Jansen, Linder, and Hillis Other institutions: UNM, UC Davis, Central Washington, CUNY, JGI Funding: Three NSF ITR grants, NSF Biocomplexity, David and Lucile Packard Foundation
21
Phylolab, U. Texas Please visit us at http://www.cs.utexas.edu/users/phylo/
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.