C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing.

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

Alignments Why do Alignments?. Detecting Selection Evolution of Drug Resistance in HIV.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
COFFEE: an objective function for multiple sequence alignments
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Molecular Evolution Revised 29/12/06
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
Heuristic alignment algorithms and cost matrices
Sequence analysis lecture 6 Sequence analysis course Lecture 6 Multiple sequence alignment 2 of 3 Multiple alignment methods.
Sequence analysis course Lecture 7 Multiple sequence alignment 3 of 3 Optimizing progressive multiple alignment methods.
1 Protein Multiple Alignment by Konstantin Davydov.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Multiple alignment: heuristics
Multiple sequence alignment
Sequence Alignment III CIS 667 February 10, 2004.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 23rd, 2014.
Multiple Sequence Alignments
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 16/11/06 Multiple sequence alignment 1 Sequence analysis 2006 Multiple.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignment
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Pair-wise alignment quality versus sequence identity (Vogt et al., JMB 249, ,1995)
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Protein Sequence Alignment and Database Searching.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight.
Multiple sequence alignment
Copyright OpenHelix. No use or reproduction without express written consent1.
MUSCLE An Attractive MSA Application. Overview Some background on the MUSCLE software. The innovations and improvements of MUSCLE. The MUSCLE algorithm.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Medical Natural Sciences Year 2: Introduction to Bioinformatics Lecture 9: Multiple sequence alignment (III) Centre for Integrative Bioinformatics VU.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Grundlagen der Bioinformatik Multiples Sequenzalignment Juni 2007.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Introduction to bioinformatics Lecture 7 Multiple sequence alignment (1)
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
T-COFFEE, a novel method for combining biological information Cédric Notredame.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Topic 3: MSA Iterative Algorithms in Multiple Sequence Alignment Prepared By: 1. Chan Wei Luen 2. Lim Chee Chong 3. Poon Wei Koot 4. Xu Jin Mei 5. Yuan.
Pairwise alignment Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial.
Multiple sequence alignment (msa)
The ideal approach is simultaneous alignment and tree estimation.
Multiple Sequence Alignment
Introduction to bioinformatics 2007 Lecture 9
Introduction to bioinformatics Lecture 8
Presentation transcript:

C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing progressive multiple alignment methods

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [2] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Pair-wise alignment quality versus sequence identity Vogt et al., JMB 249, ,1995

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [3] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Do you remember the steps of the progressive alignment algorithm ?

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [4] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Remember the greediness of the progressive alignment algorithm? d root W C V V A Y W W C V V A Y F W C M V L Y W W C M V L Y F W C V V A Y W W C V V A Y F W C M V L Y W W C M V L Y F W C V V A Y W W C V V A Y F W C M V L Y W W C M V L Y F W C I V A L Y W ?

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [5] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Strategies for progressive alignment optimization Heuristics Profile pre-processing Secondary structure-guided alignment Globalised local alignment Matrix extension Objective: try to avoid (early) errors

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [6] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Clustal, ClustalW, ClustalX Probably most well-known progressive alignment program Neighbour Joining (NJ) algorithm (Saitou and Nei, 1984) to construct guide tree NJ does not require that all lineages have diverged by equal amounts Sequence blocks are represented by profiles Individual sequences are additionally weighted according to the branch lengths in the NJ tree Higgins et al. 1994

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [7] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Clustal, ClustalW, ClustalX Further carefully crafted heuristics include: local gap penalties automatic selection of the amino acid substitution matrix automatic gap penalty adjustment mechanism to delay alignment of sequences that appear to be distant at the time they are considered. Clustal (W/X) does not allow iteration (Hogeweg and Hesper, 1984; Corpet, 1988, Gotoh, 1996; Heringa, 1999, 2002) Still limited by “greedy” nature of progressive algorithm

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [8] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 ClustalW web-interface

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [9] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Protein structure hierarchical levels (a) Primary structure —Ala—Glu—Val — Thr—Asp—Pro—Gly— (b) Secondary structure (c) Tertiary structure (fold)(d) Quartenary structure (oligomers) α helix β sheet

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [10] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Flavodoxin fold: helix-beta-helix mainly bacterial electron-transfer proteins (transport systems) alpha-helical layers sandwiching 5 stranded parallel beta-sheet

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [11] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Flavodoxin cheY NJ tree

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [12] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 ClustalW Flavodoxin-cheY

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [13] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 T-COFFEE Performs progressive alignment as in Clustal But there are many differences … Integrating different pair-wise alignment techniques (NW, SW,..) Combining different multiple alignment methods (consensus multiple alignment) Combining sequence alignment methods with structural alignment techniques Plug in user knowledge Notredame, Higgins and Heringa 2000

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [14] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 T-COFFEE Step1: Global alignment: Clustal Local alignment: Lalign (pick top ten) Step 2: Put collection of local & global in a library Step 3: Extend the library by aligning each pairwise alignment with a possible third sequence (Search matrix or library extension) Notredame, Higgins and Heringa 2000

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [15] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Search matrix extension

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [16] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Search matrix extension

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [17] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Search matrix extension

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [18] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 So for T-COFFEE … Original pairwise alignments are refined based on the consistency with other “intermediate” sequences Distance matrix and guide tree are built based on the refined alignments

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [19] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 The T-COFFEE effect Although a direst alignment might choose one path, using information from the other sequences (T-COFFEE) finds an alternate and usually better one Minimize errors in the early stages of alignment assembly Extra computation time for the calculation of the consistency scores

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [20] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 but … T-COFFEE (v1.23) Flavodoxin-cheY

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [21] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 T-COFFEE web-interface

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [22] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis D-COFFEE Computes structural based alignments Structures related to the sequences are retrieved More accurate … but for many (many) proteins we do not have the structure!

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [23] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 PRALINE Again a progressive alignment method Like in T-COFFEE the idea is minimise error- propagation by including prior knowledge about the other sequences when making pairwise alignments PRALINE does this by making pre-profiles: the sequences to be aligned are represented by pre-profiles instead of single sequences Heringa 1999

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [24] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Pre-profile generation

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [25] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 PRALINE Flavodoxin-cheY Pre-processing (cut-off  1500)

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [26] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Using structural information Structure is more conserved than sequence CASP (Critical Assessment of Techniques for Protein Structure Prediction) Tertiary structure prediction: de novo prediction is not accurate at all ! (especially for larger folds)

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [27] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Using secondary structure 10 years SS prediction method development: Accuracy ± 5% 10 years MSA method development: Accuracy can be ± 40% Amino acid patterns Secondary structure patterns Super-secondary structure patterns Alternate matrices with associated gap penalties according to region

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [28] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 How to combine ss and aa info Dynamic programming search matrix Amino acid substitution matrices MDAGSTVILCFV HHHCCCEEEEEE M D A A S T I L C G S H H H H C C E E E C C C H E HC E Default

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [29] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 In terms of scoring … So how would you score a profile using this extra information? Same formula as in lecture 6, but you can use sec. Structure-specific substitution scores in various combinations. Where does it fit in? Very important: structure is always more conserved than sequence so it can help with the insertion(or not) of gaps.

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [30] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 SS-PRALINE Sequences to be aligned Predict secondary structure or DSSP HHHHCCEEECCCEEECCHH HHHCCCCEECCCEEHHH HHHHHHHHHHHHHCCCEEEE CCCCCCEECCCEEEECCHH HHHHHCCEEEECCCEECCC Align sequences using secondary structure Secondary structure Multiple alignment

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [31] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Using predicted secondary structure

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [32] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 PRALINE web-interface

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [33] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Multiple alignment methods  Multi-dimensional dynamic programming  Progressive alignment  Iterative alignment correct for problems with progressive alignment by repeatedly realigning subgroups of sequence From here on refer to the PRALINE paper for help (further reading section online)

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [34] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Iterative alignment Idea: an optimal might be found by repeatedly modifying existing suboptimal solutions Start with producing low-quality alignment and gradually improve by iterative refinement Continue until no more improvements in the alignment scores can be achieved

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [35] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Iteration fates Convergence Limit cycle Divergence

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [36] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Consistency scoring

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [37] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Consistency iteration Pre - - profiles Multiple alignment positional consistency scores

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [38] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Pre-profile update iteration Pre - - profiles Multiple alignment

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [39] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Is the initial SS prediction good enough?

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [40] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 SS iteration

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [41] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 MUSCLE Edgar 2004

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [42] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 PRALINE and MUSCLE method PRALINE and MUSCLE use almost the same formalism to compare two profiles: MUSCLE: PRALINE: The difference is the position of the log in the above equations: Edgar calls the Muscle scoring scheme “Log-expectation score (LE)”

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [43] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 So what do we do ? A single shot for a good alignment without thinking: MUSCLE, T-COFFEE (maybe POA) If you want to experiment with making alignments for a given sequence set: PRALINE Profile pre-processing Iteration Secondary structure-induced alignment Globalised local alignment There is no single method that always generates the best alignment Therefore best is to use more than one method: e.g. include Dialign2 (local)

CENTRFORINTEGRATIVE BIOINFORMATICSVU E [44] 07/01/2008 – Multiple sequence alignment 2 – Sequence analysis 2007 Summary Weighting schemes simulating simultaneous multiple alignment Profile pre-processing (global/local) Matrix extension (well balanced scheme) Using additional information secondary structure driven alignment Schemes strike balance between speed and sensitivity