©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.

Slides:



Advertisements
Similar presentations
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
©CMBI 2003 MUTANT DESIGN BIO- INFORMATICS QUESTION ‘MOLECULAR BIOLOGY’ BIOPHYSICS.
©CMBI 2005 Exploring Protein Sequences – Part 1 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough.
Structural bioinformatics
Multiple Sequence Alignment. An alignment of heads.
Sequence alignment SEQ1: VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKK VADALTNAVAHVDDPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA SLDKFLASVSTVLTSKYR.
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
11 Ch6 multiple sequence alignment methods 1 Biologists produce high quality multiple sequence alignment by hand using knowledge of protein sequence evolution.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Sequence Alignment III CIS 667 February 10, 2004.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Pairwise alignment Computational Genomics and Proteomics.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Chapter 5 Multiple Sequence Alignment.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Protein Sequence Alignment and Database Searching.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Rising accuracy of protein secondary structure prediction Burkhard Rost
RNA and Protein Synthesis
©CMBI 2003 MUTANT DESIGN BIO- INFORMATICS QUESTION ‘MOLECULAR BIOLOGY’ BIOPHYSICS.
Sequence alignment SEQ1: VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKK VADALTNAVAHVDDPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA SLDKFLASVSTVLTSKYR.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Mutations. DNA Mistakes DNA is a molecule that replicates, works and copies with very high accuracy DNA has enzymes that make sure that it works with.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning.
COT 6930 HPC and Bioinformatics Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Pairwise Sequence Alignment. Three modifications for local alignment The scoring system uses negative scores for mismatches The minimum score for.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Alignment table: group 4
Multiple sequence alignment (msa)
Aligning Sequences You have learned about: Data & databases Tools
Molecular Evolution.
Intro to Alignment Algorithms: Global and Local
Sequence Based Analysis Tutorial
Prediction of protein structure
Protein structure prediction.
Alignment of H-NS, H-NS2, and StpA amino acid sequences.
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information from a well studied to a newly developed sequence, we need an alignment that represents the protein structures today.

©CMBI 2001 The amino acids Most information that enters the alignment procedure comes from the physico-chemical properties of the amino acids. Example: which is the better alignment (left or right)? CPISRTWASIFRCW CPISRT---LFRCW CPISRTL---FRCW

©CMBI 2001 A difficult alignment problem AYAYAYAYSY LGLPLPLPLP

©CMBI 2001 A difficult alignment problem solved AYAYAYAYSY AGAPAPAPSP LGLPLPLPLP

©CMBI 2001 Alignment order MIESAYTDSW QFEKSYVTDY -MIESAYTDSW QFEKSYVTDY-

©CMBI 2001 Alignment order MIESAYTDSW QFEKSYVTDY QWERTYASNF -MIESAYTDSW QFEKSYVTDY- QWERTYASNF-

©CMBI 2001 Alignment order Conclusion: Align first the sequences that look very much like each other. So you ‘build up information’ while making the alignments most likely to be correct.

©CMBI 2001 Alignment order In order to know which sequences look most like each other, you need to do all pairwise alignments first. This is what CLUSTAL does.

©CMBI 2001 Step 1 D E

©CMBI 2001 Step 2 D E A B

©CMBI 2001 Step 3 D E C A B

©CMBI 2001 Step 4 D E C A B

©CMBI 2001 Other algorithms Multi-sequence alignment can also be done with an iterative ‘profile’ alignment. A) Make alignment of few, well- aligned sequences B) Align all sequences using this profile

©CMBI What is a profile? Normally, we use a PAM-like matrix to determine the score for each possible match in an alignment. This assumes that each match I E is the same. But it isn’t.

©CMBI What is a profile? QWERTYIPASEF At 1, E and I are QWEKSFIPGSEY both OK. NWERTMVPVSEM QFEKTYLPSSEY At 2, I is OK, NFIKTLMPATEF but E surely not. QYIRSLIPAGEM NYIQSLIPSTEL At 3, E is OK, QFIRSLFPSSEI but I surely not

©CMBI What is a profile? The knowledge about which residue types are good for a certain position can be expressed in a profile. A profile holds for each position 20 scores for the 20 residue types, and sometimes also two values for gap open and gap elongation.

©CMBI 2001 Back to other algorithms Multi-sequence alignment can also be done with an iterative ‘profile’ alignment. A) Make alignment of few, well- aligned sequences B) Align all sequences using this profile

©CMBI 2001 Conserved, variable, or in-between QWERTYASDFGRGH QWERTYASDTHRPM QWERTNMKDFGRKC QWERTNMKDTHRVW Gray = conserved Black = variable Green = correlated mutations

©CMBI 2001 Correlated mutations determine the tree shape 1 AGASDFDFGHKM 2 AGASDFDFRRRL 3 AGLPDFMNGHSI 4 AGLPDFMNRRRV

©CMBI 2001 Correlation = Information 1, 2 and 5 bind calcium; 3 and 4 don’t. Which residues bind calcium? ASDFNTDEKLRTTYI 2 ASDFSTDEKLKTTYI 3 LSFFTTDTKLATIYI 4 LSHFLTDLKLATIYI 5 ASDFTTDEKLALTYI