Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Measuring the degree of similarity: PAM and blosum Matrix
Phylogenetic reconstruction
DNA sequences alignment measurement
Molecular Evolution Revised 29/12/06
Multiple alignment: heuristics. Consider aligning the following 4 protein sequences S1 = AQPILLLV S2 = ALRLL S3 = AKILLL S4 = CPPVLILV Next consider the.
Heuristic alignment algorithms and cost matrices
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Introduction to Bioinformatics Algorithms Sequence Alignment.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures CLUSTAL W Algorithm Lecturer:
Sequence Analysis Tools
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Sequence similarity.
Multiple sequence alignments and motif discovery Tutorial 5.
Multiple alignment: heuristics
Multiple sequence alignment
BLAST and Multiple Sequence Alignment
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Multiple Sequence Alignments
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Biology 4900 Biocomputing.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Eidhammer et al. Protein Bioinformatics Chapter 4 1 Multiple Global Sequence Alignment and Phylogenetic trees Inge Jonassen and Ingvar Eidhammer.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Chapter 8 Molecular Phylogenetics: Measuring Evolution.
Construction of Substitution Matrices
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
Multiple Sequence Alignment. How to score a MSA? Very commonly: Sum of Pairs = SP Compute the pairwise score of all pairs of sequences and sum them. Gap.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Construction of Substitution matrices
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
BLAST and Psi-BLAST and MSA Nov. 1, 2012 Workshop-Use BLAST2 to determine local sequence similarities. Homework #6 due Nov 8 Chapter 5, Problem 8 Chapter.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Multiple sequence alignment (msa)
Multiple Sequence Alignment
Presentation transcript:

Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due May 26

Multiple Sequence Alignment Collection of three or more amino acid (or nucleic acid) sequences partially or completely aligned. Aligned residues tend to occupy corresponding positions in the 3-D structure of each aligned protein.

General steps to multiple alignment. Create Alignment Edit the alignment to ensure that regions of functional or structural similarity are preserved Phylogenetic Analysis Structure Analysis Find conserved motifs to deduce function Design of PCR primers USED FOR:

Practical use of MSA Helps to place protein into a group of related proteins. It will provide insight into function, structure and evolution. Helps to detect homologs Identifies sequencing errors Identifies important regulatory regions in the promoters of genes.

Clustal W (Thompson et al., 1994) CLUSTAL=Cluster alignment The underlying concept is that groups of sequences are phylogenetically related. If they can be aligned, then one can construct a phylogenetic tree. Phylogenetic tree-a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor.

Flowchart of computation steps in Clustal W (Thompson et al., 1994) Pairwise alignment: calculation of distance matrix Creation of unrooted neighbor-joining tree Rooted NJ tree (guide tree) and calculation of sequence weights Progressive alignment following the guide tree

Preliminary pairwise alignments Compare each pair of sequences. A - B.87 - C A B C Each number represents the number of exact matches divided by the sequence length (ignoring gaps). Thus, the higher the number the more closely related the two sequences are. In this matrix, sequence A is 87% identical to sequence B Different sequences

Step 1-Calculation of Distance Matrix Use the Distance Matrix to create a Guide Tree to determine the “order” of the sequences. I = D = 1 – (I) D = Difference score # of identical aa’s in pairwise global alignment total number of aa’s in shortest sequence Hbb-Hu1- Hbb-Ho2.17- Hba-Hu Hba-Ho Myg-Ph Gib-Pe Lgb-Lu

Step 2-Create an unrooted NJ tree Hba-Ho Hba-Hu Hbb-Ho Hbb-Hu Myg-Ph Gib-Pe Lgb-Lu

Step 3-Create Rooted NJ Tree Weight Alignment Order of alignment: 1 Hba-Hu vs Hba-Ho 2 Hbb-Hu vs Hbb-Ho 3 A vs B 4 Myg-Ph vs C 5 Gib-Pe vs D 6 Lgh-Lu vs E

Step 4-Progressive alignment

Scoring during progressive alignment

Rules for alignment Short stretches of 5 hydrophilic residues often indicate loop or random coil regions (not essential for structure) and therefore gap penalties are reduced reduced for such stretches. Gap penalties for closely related sequences are lowered compared to more distantly related sequences (“once a gap always a gap” rule). It is thought that those gaps occur in regions that do not disrupt the structure or function. Alignments of proteins of known structure show that proteins gaps do not occur more frequently than every eight residues. Therefore penalties for gaps increase when required at 8 residues or less for alignment. This gives a lower alignment score in that region. A gap weight is assigned after each aa according the frequency that such a gap naturally occurs after that aa in nature

Amino acid weight matrices As we know, there are many scoring matrices that one can use depending on the relatedness of the aligned proteins. As the alignment proceeds to longer branches the aa scoring matrices are changed to more divergent scoring matrices. The length of the branch is used to determine which matrix to use and contributes to the alignment score.

Example of Sequence Alignment using Clustal W Asterisk represents identity : represents high similarity. represents low similarity

Multiple Alignment Considerations Quality of guide tree. It would be good to have a set of closely related sequences in the alignment to set the pattern for more divergent sequences. If the initial alignments have a problem, the problem is magnified in subsequent steps. CLUSTAL W is best when aligning sequences that are related to each other over their entire lengths Do not use when there are variable N- and C- terminal regions If protein is enriched for G,P,S,N,Q,E,K,R then these residues should be removed from gap penalty list. (what types of residues are these?) Reference: