Download presentation
Presentation is loading. Please wait.
Published byKerry Golden Modified over 8 years ago
1
Ultra-large alignments using Ensembles of HMMs Nam-phuong Nguyen Institute for Genomic Biology University of Illinois at Urbana-Champaign
2
UPP: Ultra-large alignment UPP: Ultra-large alignments using Phylogeny- aware Profiles Objective: Estimate accurate alignments on large datasets, which may be evolutionarily divergent and contain fragmentary sequences Nguyen N., Mirarab S., Kumar K., and Warnow, T. RECOMB 2015.
3
UPP Algorithmic Strategy
4
RNASim: alignment error Note: All methods given 24 hrs on a 12-core machine. Mafft fails to complete on 200K sequences. Clustal-Omega only completes on 10K dataset. 1 Million RNASim: UPP(Fast) generated an alignment in 12 days compared to 15 days for PASTA. UPP(Fast) resulted in a better alignment (5.7% lower error), but PASTA resulted in a better tree (1.5% lower error).
5
Running Time Wall-clock time used (in hours) given 12 processors
6
Ensemble of HMMs Use a collection of HMMs instead of a single HMM to represent a backbone alignment Improves alignment accuracy, which can lead to better downstream analyses – Phylogenetic placement (SEPP; PSB 2012) – Taxonomic identification (TIPP, Bioinformatics 2014)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.