Lab 4.11 Lab 4.1: Multiple Sequence Alignment Jennifer Gardy Molecular Biology & Biochemistry Simon Fraser University.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
COFFEE: an objective function for multiple sequence alignments
BNFO 602 Multiple sequence alignment Usman Roshan.
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Multiple sequence alignment
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignment
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Multiple Sequence Alignment School of B&I TCD May 2010.
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
Protein Sequence Alignment and Database Searching.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Multiple sequence alignment and their reliability The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel January 2013 By.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Multiple sequence alignment
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Copyright OpenHelix. No use or reproduction without express written consent1.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Manually Adjusting Multiple Alignments Chris Wilton.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Sequence Alignment.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Multiple Sequence Alignment Carlow IT Bioinformatics November 2006.
Lab 4.31 Lab 4.3: Molecular Evolution Jennifer Gardy Molecular Biology & Biochemistry Simon Fraser University.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Multiple Sequence Alignment
Lesson: Sequence processing
PerformanceI Q User Guide
Multiple sequence alignment (msa)
Lab 4.1: Multiple Sequence Alignment & Phylogenetic Trees
The ideal approach is simultaneous alignment and tree estimation.
Sequence based searches:
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Exploring Microsoft® Excel® 2016 Series Editor Mary Anne Poatsy
Sequence Based Analysis Tutorial
BLAST.
Explore Evolution: Instrument for Analysis
Sequence comparison: Local alignment
Protein structure prediction.
Computational Genomics Lecture #3a
MULTIPLE SEQUENCE ALIGNMENT
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

Lab 4.11 Lab 4.1: Multiple Sequence Alignment Jennifer Gardy Molecular Biology & Biochemistry Simon Fraser University

Lab

Lab 4.13 Goals Learn the basics of multiple sequence alignments (MSAs) and the Clustal program Understand how alignment settings can significantly affect an alignment Complete questions 1 & 2 in the phylogeny assignment

Lab 4.14 Outline MSAs: –Purpose –Automated alignment considerations –Clustal’s alignment strategy –Manual editing Research Question ClustalX with default parameters Varying alignment settings Deleting sequences/regions of sequences

Lab 4.15 MSAs: A Quick Review Why perform an MSA? –Visualize trends between homologous sequences Shared regions of homology Regions unique to a sequence within a family Structural/functional motif –As the first step in a phylogenetic analysis –Useful for improving accuracy of structure predictions How does one perform an MSA? –By hand: too hard! –Automated alignment: Fast, but doesn’t necessarily produce the “correct” alignment Best approach = Automated alignment with manual editing

Lab 4.16 Automated alignment Technical considerations: –Select sequences carefully Homologous over length, no unrelated sequences The algorithm will align everything you give it! –Use an appropriate objective function Most common = simple sum-of-pairs w/ gap penalties Not evolutionarily ideal, but shown to perform well –Computational intensity No current methods guarantee full optimization 3 categories of heuristics: –Exact: close to optimal, can only use small number of sequences and sum-of-pairs OF –Progressive: most common, adds sequences to an alignment one-by- one, fast, no great potential for optimization –Iterative: produces an alignment, refines it through a series of cycles until no more improvements can be made “Recent progress in MSAs: a survey. C. Notredame. Pharmacogenomics. PMID:

Lab 4.17 Clustal One of the most common MSA tools Uses sum-of-pairs with gaps OF Progressive alignment strategy: –Sequences used to make guide tree –Least dissimilar 2 seqs aligned, make consensus –Next closest seq aligned to consensus

Lab 4.18 Manual Editing “Human-assisted quasi-optimization”: –Fine adjustment of particular columns May incorporate specific knowledge about sequences –Removal of gappy bits Important for phylogenetic analysis –Removal of parts of sequences or whole sequences Non-homologous regions Sequence included by error

Lab 4.19 Research Question: Background BacterialCell Peptidoglycan Peptidoglycan-associated Lipoproteins (PAL proteins) What part of the PAL protein is involved in peptidoglycan binding?

Lab Research Question: Strategy Used 1 PAL protein you identified to search NCBI databases for more PAL family proteins Found 4 more proteins from different bacteria Do all 5 sequences contain a domain that may be involved in peptidoglycan binding? Where in these proteins is this domain located? Which residues in particular would you potentially target for further laboratory study for their possible role in PG binding? Next Step = Multiple Sequence Alignment

Lab Starting up ClustalX PALproteins.txtDay 4 website > PALproteins.txt $ clustalxStart ClustalX - $ clustalx Name Window Sequence Window File: -Load sequences Edit: -Remove all gaps Alignment: -Do complete alignment -Alignment parameters Trees: -Bootstrapped NJ -Output format options

Lab Starting up ClustalX File > Load sequences > PALproteins.txtFile > Load sequences > PALproteins.txt Examine the sequences: –How are unaligned sequences displayed? –Do the sequences look similar to each other?

Lab PAL Proteins in ClustalX Left-aligned, in order of input Default colouring (identity) – see help file for details Conservation score graph One long sequence

Lab Let’s Do An Alignment! Alignment > Do complete alignmentAlignment > Do complete alignment Generates an.aln file

Lab Examine Your Alignment –Is there a difference in the order of the sequences? –Could the order of the input sequences affect your alignment? –What effect does the large N-terminal domain have on your alignment? –What effect will increasing the gap penalty have on your alignment? Decreasing it?

Lab Sequence Order Order has changed, & input order affects alignment: –Clustal’s “pairwise” strategy generates similarity values for each pair of sequences The most similar pair is selected to build a consensus The consensus is re-compared to the other sequences and new similarity values are generated Lather, rinse, repeat –BUT… if two sequences have equal similarity values, Clustal orders them based on the order they were inputted in! Let’s see that in pictorial form…

Lab Sequence Order A B C D ABCD A- B.7- C.8.2- D BC A D A A-.75- D.6.45 BC and BD both show the lowest dissimilarity index However the BC and BD consensus sequences can be quite different: Affects further similarity calculation B= ELVIS C= LIVES D= EVILS BC= ELVIS LIVES --V-S BD= ELVIS EVILS E---S

Lab Unusually Long Sequences Including 1 much longer sequence may affect the alignment: –Evolutionarily, it indicates an insertion or deletion event –Not part of the homologous region(s) –Program will attempt to align it anyway –N-terminal aligned regions are unreliable

Lab Gap Penalties Shift-clickShift-click each sequence name to select Edit > Remove all gapsEdit > Remove all gaps Alignment > Alignment parameters > Multiple alignment parametersAlignment > Alignment parameters > Multiple alignment parameters Gap Opening PenaltyTry a Gap Opening Penalty of 1, then 30 Answer Question 1 in the phylogeny assignment Important: Every time you make a new alignment, a new.aln file will be created. If you do not change the filename, the previous file will be overwritten.

Lab The Effect of Removing Sequences Open PALproteins.txt in an editor Delete CmPAL and YpLIP, save the file Load this file in ClustalX Do an alignment with the default parameters Print this alignment, answer Question 2 –What effect did removing the sequences have on your alignment?

Lab The Effect of Removing Sequences Increased N-terminal alignment What might this indicate? Signal peptide Not a meaningful homologous sequence Best to remove such regions: –Signal peptides –Other domains

Lab Remainder of Lab Time Finish your assignment questions –Q1: Effect of changing gap penalties (have your team try out different values) –Q2: Annotated printout Begin the MSA for Module 3 of the Integrated Assignment (Section 3.2, Task 1) –Need to have completed Module 2 –You have PLENTY of time for the IA and if you’d like to save it for later, that’s OK!!! Use Clustal to check out your favourite gene/protein family Try web-based Clustal: –