Tandy Warnow The University of Illinois

Slides:



Advertisements
Similar presentations
CS 598AGB What simulations can tell us. Questions that simulations cannot answer Simulations are on finite data. Some questions (e.g., whether a method.
Advertisements

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Profile HMMs Tandy Warnow BioE/CS 598AGB. Profile Hidden Markov Models Basic tool in sequence analysis Look more complicated than they really are Used.
Phylogenomics Symposium and Software School Tandy Warnow Departments of Computer Science and Bioengineering The University of Illinois at Urbana-Champaign.
Protein Sequence Classification Using Neighbor-Joining Method
Estimating species trees from multiple gene trees in the presence of ILS Tandy Warnow Joint work with Siavash Mirarab, Md. S. Bayzid, and others.
From Gene Trees to Species Trees Tandy Warnow The University of Texas at Austin.
Phylogenomics Symposium and Software School Co-Sponsored by the SSB and NSF grant
Phylogenetic Reconstruction based on RNA Secondary Structural Alignment Benny Chor, Tel-Aviv Univ. Joint work with Moran Cabili, Assaf Meirovich, and Metsada.
Software for Scientists Tandy Warnow Department of Computer Science University of Texas at Austin.
From Gene Trees to Species Trees Tandy Warnow The University of Texas at Austin.
New methods for inferring species trees in the presence of incomplete lineage sorting Tandy Warnow The University of Illinois.
From Gene Trees to Species Trees Tandy Warnow The University of Texas at Austin.
New methods for inferring species trees in the presence of incomplete lineage sorting Tandy Warnow The University of Illinois.
New methods for inferring species trees in the presence of incomplete lineage sorting Tandy Warnow The University of Illinois.
TIPP: Taxon Identification and Phylogenetic Profiling Tandy Warnow The Department of Computer Science.
Orangutan GorillaChimpanzee Human From the Tree of the Life Website, University of Arizona Species Tree.
New methods for estimating species trees from genome-scale data Tandy Warnow The University of Illinois.
New methods for estimating species trees from genome-scale data Tandy Warnow The University of Illinois.
From Gene Trees to Species Trees Tandy Warnow The University of Texas at Austin.
Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin.
394C, October 2, 2013 Topics: Multiple Sequence Alignment
BBCA: Improving the scalability of *BEAST using random binning Tandy Warnow The University of Illinois at Urbana-Champaign Co-authors: Theo Zimmermann.
Constructing the Tree of Life: Divide-and-Conquer! Tandy Warnow University of Illinois at Urbana-Champaign.
New methods for estimating species trees from genome-scale data Tandy Warnow The University of Illinois.
Using Divide-and-Conquer to Construct the Tree of Life Tandy Warnow University of Illinois at Urbana-Champaign.
New methods for estimating species trees from genome-scale data Tandy Warnow The University of Illinois.
Bayesian Evolutionary Analysis by Sampling Trees (BEAST) LEE KIM-SUNG Environmental Health Institute National Environment Agency.
The Mathematics of Estimating the Tree of Life Tandy Warnow The University of Illinois.
Three approaches to large- scale phylogeny estimation: SATé, DACTAL, and SEPP Tandy Warnow Department of Computer Science The University of Texas at Austin.
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
TIPP: Taxon Identification using Phylogeny-Aware Profiles Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign.
CS/BIOE 598: Algorithmic Computational Genomics Tandy Warnow Departments of Bioengineering and Computer Science
Ultra-large alignments using Ensembles of HMMs Nam-phuong Nguyen Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Progress and Challenges for Large-Scale Phylogeny Estimation Tandy Warnow Departments of Computer Science and Bioengineering Carl R. Woese Institute for.
New methods for inferring species trees in the presence of incomplete lineage sorting Tandy Warnow The University of Illinois.
New methods for inferring species trees in the presence of incomplete lineage sorting Tandy Warnow The University of Illinois.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
Scaling BAli-Phy to Large Datasets June 16, 2016 Michael Nute 1.
Lecture 19 – Species Tree Estimation
Constrained Exact Optimization in Phylogenetics
Introduction to Bioinformatics Resources for DNA Barcoding
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
New Approaches for Inferring the Tree of Life
CS 581 / BIOE 540: Algorithmic Computational Genomics
Chalk Talk Tandy Warnow
Distance based phylogenetics
TIPP: Taxon Identification using Phylogeny-Aware Profiles
Multiple Sequence Alignment Methods
Techniques for MSA Tandy Warnow.
Algorithm Design and Phylogenomics
Mathematical and Computational Challenges in Reconstructing Evolution
Large-Scale Multiple Sequence Alignment
Mathematical and Computational Challenges in Reconstructing Evolution
TIPP and SEPP: Metagenomic Analysis using Phylogeny-Aware Profiles
CS 581 Tandy Warnow.
CS 581 Algorithmic Computational Genomics
TIPP: Taxon Identification using Phylogeny-Aware Profiles
Tandy Warnow Founder Professor of Engineering
Benchmarking Statistical Multiple Sequence Alignment
Gene Tree Estimation Through Affinity Propagation
CS 394C: Computational Biology Algorithms
September 1, 2009 Tandy Warnow
Taxonomic identification and phylogenetic profiling
Algorithms for Inferring the Tree of Life
Advances in Phylogenomic Estimation
Advances in Phylogenomic Estimation
Scaling Species Tree Estimation to Large Datasets
Presentation transcript:

Tandy Warnow The University of Illinois New Scalable Coalescent-Based Species Tree Estimation Methods: BBCA, ASTRAL, and ASTRID Tandy Warnow The University of Illinois

BBCA, ASTRAL, and ASTRID BBCA is a simple way of making *BEAST scalable to large numbers of genes (but doesn’t address large numbers of species) ASTRAL and ASTRID: summary methods that are statistically consistent in the presence of ILS, and that run in polynomial time. Both can analyze very large datasets (1000 species and 1000 genes – or more) with high accuracy. The relative accuracy depends on the model condition – sometimes ASTRAL is better, sometimes ASTRID is better.

Main competing approaches gene 1 gene 2 . . . gene k . . . Species Concatenation . . . Analyze separately point out that supertree methods take overlaping trees and produce a tree, and that the whole process of first generating small trees and then applying a supertree method is often referred to as the “supertree approach”. Summary Method

Incomplete Lineage Sorting (ILS) is a dominant cause of gene tree heterogeneity

*BEAST Heled and Drummond, MBE 2010 Input: set of multiple sequence alignments for collection of genes Techique: Uses MCMC to co-estimate gene trees and species trees Highly accurate Limited in practice to small numbers of genes and species, due to convergence issues

BBCA: improving *BEAST Zimmermann, Mirarab, and Warnow, BMC Genomics 2014: Randomly partition genes into bins of at most 25 genes Apply *BEAST to each bin, and take the gene trees it computes Apply favored summary method to the gene trees Matches accuracy of *BEAST Improves scalability to large # genes

ASTRAL Mirarab and Warnow, Bioinformatics 2014 https://github.com/smirarab/ASTRAL Tutorial in Species Tree Workshop

ASTRID ASTRID: Accurate species trees using internode distances, Vachaspati and Warnow, RECOMB-CG 2015 and BMC Genomics 2015 Algorithmic design: Computes a matrix of average leaf-to-leaf topological distances, and then computes a tree using FastME (more accurate than neighbor Joining and faster, too). Related to NJst (Liu and Yu, 2010), which computes the same matrix but then computes the tree using neighbor joining (NJ). Statistically consistent under the MSC O(kn2 + n3) time where there are k gene trees and n species

BBCA, ASTRAL, and ASTRID BBCA is a simple way of making *BEAST scalable to large numbers of genes (but doesn’t address large numbers of species) ASTRAL and ASTRID: summary methods that are statistically consistent in the presence of ILS, and that run in polynomial time. Both can analyze very large datasets (1000 species and 1000 genes – or more) with high accuracy. The relative accuracy depends on the model condition – sometimes ASTRAL is better, sometimes ASTRID is better.

Acknowledgments Software ASTRAL: Available at https://github.com/smirarab ASTRID: Available at https://github.com/pranjalv123 Others at http://tandy.cs.illinois.edu/software.html NSF grant DBI-1461364 (joint with Noah Rosenberg at Stanford and Luay Nakhleh at Rice): http://tandy.cs.illinois.edu/PhylogenomicsProject.html NSF graduate fellowship to Pranjal Vachaspati HHMI graduate fellowship to Siavash Mirarab Papers available at http://tandy.cs.illinois.edu/papers.html