Ultra-large alignments using Ensembles of HMMs Nam-phuong Nguyen Institute for Genomic Biology University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

Computer Science and Reconstructing Evolutionary Trees Tandy Warnow Department of Computer Science University of Illinois at Urbana-Champaign.
Profile HMMs Tandy Warnow BioE/CS 598AGB. Profile Hidden Markov Models Basic tool in sequence analysis Look more complicated than they really are Used.
Lichens and Ascomycota broadly Alternative markers to COI ITS.
Phylogenomics Symposium and Software School Tandy Warnow Departments of Computer Science and Bioengineering The University of Illinois at Urbana-Champaign.
Multiple sequence alignment methods: evidence from data CS/BioE 598 Tandy Warnow.
11 Ch6 multiple sequence alignment methods 1 Biologists produce high quality multiple sequence alignment by hand using knowledge of protein sequence evolution.
Expected accuracy sequence alignment
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Recent breakthroughs in mathematical and computational phylogenetics
New HMM-based methods for Ultra-large Alignment and Phylogeny Estimation Tandy Warnow Departments of Bioengineering and Computer Science The University.
Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow.
Introduction to Phylogenomics and Metagenomics Tandy Warnow The Department of Computer Science The University of Texas at Austin.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Computational Phylogenomics and Metagenomics Tandy Warnow Departments of Bioengineering and Computer Science The University of Illinois at Urbana-Champaign.
Software for Scientists Tandy Warnow Department of Computer Science University of Texas at Austin.
New techniques that “boost” methods for large-scale multiple sequence alignment and phylogenetic estimation Tandy Warnow Department of Computer Science.
Ultra-large Multiple Sequence Alignment Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign
1 Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics CH-1015 Lausanne, Switzerland Borrowed from Heinz Stockinger June.
CS 394C Algorithms for Computational Biology Tandy Warnow Spring 2012.
TIPP: Taxon Identification and Phylogenetic Profiling Tandy Warnow The Department of Computer Science.
Challenge and novel aproaches for multiple sequence alignment and phylogenetic estimation Tandy Warnow Department of Computer Science The University of.
Expected accuracy sequence alignment Usman Roshan.
New methods for estimating species trees from genome-scale data Tandy Warnow The University of Illinois.
Big Data Bioinformatics By: Khalifeh Al-Jadda. Is there any thing useful?!
SEPP and TIPP for metagenomic analysis Tandy Warnow Department of Computer Science University of Texas.
BBCA: Improving the scalability of *BEAST using random binning Tandy Warnow The University of Illinois at Urbana-Champaign Co-authors: Theo Zimmermann.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
TIPP: Taxon Identification and Phylogenetic Profiling Tandy Warnow The Department of Computer Science The University of Texas at Austin.
Constructing the Tree of Life: Divide-and-Conquer! Tandy Warnow University of Illinois at Urbana-Champaign.
New HMM-based methods for Ultra-large Alignment and Phylogeny Estimation Tandy Warnow Departments of Bioengineering and Computer Science The University.
Using Divide-and-Conquer to Construct the Tree of Life Tandy Warnow University of Illinois at Urbana-Champaign.
Family of HMMs Nam Nguyen University of Texas at Austin.
Expected accuracy sequence alignment Usman Roshan.
Bayesian Evolutionary Analysis by Sampling Trees (BEAST) LEE KIM-SUNG Environmental Health Institute National Environment Agency.
Three approaches to large- scale phylogeny estimation: SATé, DACTAL, and SEPP Tandy Warnow Department of Computer Science The University of Texas at Austin.
TIPP: Taxon Identification using Phylogeny-Aware Profiles Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign.
Quan Zou ( PH.D. & Prof. ) Tianjin Univ, School of Computer Reconstructing phylogenetic trees for.
Algorithms for Ultra-large Multiple Sequence Alignment and Phylogeny Estimation Tandy Warnow Department of Computer Science The University of Texas at.
SEPP and TIPP for metagenomic analysis Tandy Warnow Department of Computer Science University of Texas.
Progress and Challenges for Large-Scale Phylogeny Estimation Tandy Warnow Departments of Computer Science and Bioengineering Carl R. Woese Institute for.
New HMM-based methods for Ultra-large Alignment and Phylogeny Estimation Tandy Warnow Departments of Bioengineering and Computer Science The University.
Ensembles of HMMs and their use in biomolecular sequence analysis Nam-phuong Nguyen Carl R. Woese Institute for Genomic Biology University of Illinois.
Advancing Genome-Scale Phylogenomic Analysis Tandy Warnow Departments of Computer Science and Bioengineering Carl R. Woese Institute for Genomic Biology.
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
Scaling BAli-Phy to Large Datasets June 16, 2016 Michael Nute 1.
TIPP: Taxonomic Identification And Phylogenetic Profiling
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
Advances in Ultra-large Phylogeny Estimation
Chalk Talk Tandy Warnow
TIPP: Taxon Identification using Phylogeny-Aware Profiles
Techniques for MSA Tandy Warnow.
A Hybrid Algorithm for Multiple DNA Sequence Alignment
Tandy Warnow The University of Illinois
Large-Scale Multiple Sequence Alignment
TIPP and SEPP: Metagenomic Analysis using Phylogeny-Aware Profiles
TIPP: Taxon Identification using Phylogeny-Aware Profiles
Tandy Warnow Founder Professor of Engineering
Genes to Trees Daniel Ayres and Adam Bazinet
Benchmarking Statistical Multiple Sequence Alignment
Taxonomic identification and phylogenetic profiling
Ultra-large Multiple Sequence Alignment
Phylogenetic analyses of alphacoronaviruses based on complete genome and ORF1ab protein sequence. Phylogenetic analyses of alphacoronaviruses based on.
Advances in Phylogenomic Estimation
Advances in Phylogenomic Estimation
TIPP and SEPP (plus PASTA)
Scaling Species Tree Estimation to Large Datasets
Presentation transcript:

Ultra-large alignments using Ensembles of HMMs Nam-phuong Nguyen Institute for Genomic Biology University of Illinois at Urbana-Champaign

UPP: Ultra-large alignment UPP: Ultra-large alignments using Phylogeny- aware Profiles Objective: Estimate accurate alignments on large datasets, which may be evolutionarily divergent and contain fragmentary sequences Nguyen N., Mirarab S., Kumar K., and Warnow, T. RECOMB 2015.

UPP Algorithmic Strategy

RNASim: alignment error Note: All methods given 24 hrs on a 12-core machine. Mafft fails to complete on 200K sequences. Clustal-Omega only completes on 10K dataset. 1 Million RNASim: UPP(Fast) generated an alignment in 12 days compared to 15 days for PASTA. UPP(Fast) resulted in a better alignment (5.7% lower error), but PASTA resulted in a better tree (1.5% lower error).

Running Time Wall-clock time used (in hours) given 12 processors

Ensemble of HMMs Use a collection of HMMs instead of a single HMM to represent a backbone alignment Improves alignment accuracy, which can lead to better downstream analyses – Phylogenetic placement (SEPP; PSB 2012) – Taxonomic identification (TIPP, Bioinformatics 2014)