J AMES A. FOSTER And Luke Sheneman 1 October 2008 I NITIATIVE FOR B IOINFORMATICS AND E VOLUTIONARY S TUDIES (IBEST) Guide Trees and Progressive Multiple.

Slides:



Advertisements
Similar presentations
CS 598AGB What simulations can tell us. Questions that simulations cannot answer Simulations are on finite data. Some questions (e.g., whether a method.
Advertisements

An Introduction to Phylogenetic Methods
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Molecular Evolution Revised 29/12/06
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
Multiple sequence alignment methods: evidence from data CS/BioE 598 Tandy Warnow.
BNFO 602 Phylogenetics Usman Roshan.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Geometric Crossovers for Supervised Motif Discovery Rolv Seehuus NTNU.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 2.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Lecture 8 – Searching Tree Space. The Search Tree.
MCB 5472 Lecture #6: Sequence alignment March 27, 2014.
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple sequence alignment
Characterizing the Phylogenetic Tree-Search Problem Daniel Money And Simon Whelan ~Anusha Sura.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Terminology of phylogenetic trees
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Molecular phylogenetics
Barking Up the Wrong Treelength Kevin Liu, Serita Nelesen, Sindhu Raghavan, C. Randal Linder, and Tandy Warnow IEEE TCCB 2009.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
© Wiley Publishing All Rights Reserved. Building Multiple- Sequence Alignments.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Ben Stöver WS 2013/2014 Maximum parsimony with MEGA and PAUP* Molecular Phylogenetics – exercise.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:
EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Sequence alignment CS 394C: Fall 2009 Tandy Warnow September 24, 2009.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
Phylogenetic basis of systematics
Distance based phylogenetics
Inferring a phylogeny is an estimation procedure.
The ideal approach is simultaneous alignment and tree estimation.
A Hybrid Algorithm for Multiple DNA Sequence Alignment
Phylogenetic Inference
Goals of Phylogenetic Analysis
Summary and Recommendations
BNFO 602 Phylogenetics Usman Roshan.
CS 581 Tandy Warnow.
Maximum parsimony with MEGA and PAUP
New methods for simultaneous estimation of trees and alignments
Reed A. Cartwright Department of Genetics University of Georgia
#30 - Phylogenetics Distance-Based Methods
Lecture 8 – Searching Tree Space
CS 394C: Computational Biology Algorithms
Algorithms for Inferring the Tree of Life
Summary and Recommendations
Presentation transcript:

J AMES A. FOSTER And Luke Sheneman 1 October 2008 I NITIATIVE FOR B IOINFORMATICS AND E VOLUTIONARY S TUDIES (IBEST) Guide Trees and Progressive Multiple Sequence Alignment

Multiple Sequence Alignment Abstract representation of sequence homology Homologous molecular characters (nucleotides/residues) organized in columns Gaps (-) represent sequence indels

Multiple Sequence Alignment Many bioinformatics analyses depend on MSA. First step in inferring phylogenetic trees  MSA technique is at least as important as inference method and model parameters (Morrison & Ellis, 1997) Structural and functional sequence analyses

Progressive Alignment Idea: align “closely related” sequences first, two at a time with “optimal” subalignments (dynamic programming) Problem: once a gap, always a gap Advantage: fast

Guide Trees and Alignment Quality How important is it to find “good” guide trees? How much time should be spent looking for “better” guide trees?

Hypothesis Guide trees that are closer to the true phylogeny lead to better sequence alignments  Guide trees that are further from the true tree produce less accurate alignments.  The effect is measurable.  The correlation is significant.

Previous Work Folk wisdom, intuition: it matters, a lot  Basis for Clustal, and most other pMSA implementations Nelesen et al. (PSB ’08): doesn’t matter, much  No strong correlation  No large effect Edgar (2004): bad trees are sometimes better  UPGMA guide trees ultrametric but outperform NJ

Experimental Design: strategy For both natural data and simulation data, with reliable alignments and phylogenies: Explore the space of possible guide trees, moving outward from the “true tree”  Use each tree as a guide tree, perform pMSA  Compare quality of resulting alignment with known optimal value

Experimental Design: Naturally Evolved Case

Experimental Design: Degrading Guide Trees Random Nearest Neighbor Interchange (NNI)  Swaps two neighboring internal branches Random Tree Bisect/Reconnect (TBR) Randomly bisect tree Randomly reconnect two trees Images: hyphy.org

TreeBASE (“natural”) Input Datasets

Experimental Design: Simulated Evolution Case

Conclusions Statistically significant correlation between guide tree quality and alignment quality  Independent of tree transformation operator  Independent of alignment distance metric But very small absolute change in quality Non-linear / logarithmic  Largest alignment quality effect 5-10 steps from phylogeny The lesson: it helps to improve a really good guide tree, otherwise it helps but only a little

Acknowledgements  Dr. Luke Sheneman (mostly his slides!)  Faculty, staff, and students of BCB  Jason Evans  Darin Rokyta  Funding sources:  NIH P20 RR16454  NIH NCRR 1P20 RR16448  NSF EPS

Experimental Design: metrics  =pmsa(S, T)  where S is the set of input sequences  where T is the guide tree  (hidden parameters: pairwise algorithm, tie breaking strategy) A Q = CompareAlignments(A*, Â)  QSCORE (A*, Â) -> TC-error, SP-error  Nelesen had a nicer metric: error of estimated phylogeny T dist = TreeDistance(T*, T)  Upper bound estimate of edit distance via NNI or TBR

Alternative Scoring metric Idea: “quality” of an alignment is distance from the phylogeny it produces to the “true” phylogeny A Q = KTreeDist(ML_est(A*),ML_est( Â))  ML_est(A): max likelihood estimate of the phylogeny behind MSA A (we used RAXML)  KTreeDist(T1,T2): scales T2 to T2, measures Branch Length Distance (Sorio-Kurasko et al. 07; Kuhner & Felsenstein 94) Data sets: from L1 sequences in mammals, bats, humans, hand aligned A*

All methods pretty are good