Protein Sequence Alignment 1 July 2002- DPJ David Philip Judge.

Slides:



Advertisements
Similar presentations
Global Sequence Alignment by Dynamic Programming.
Advertisements

Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Measuring the degree of similarity: PAM and blosum Matrix
DNA sequences alignment measurement
Lecture 8 Alignment of pairs of sequence Local and global alignment
Introduction to Bioinformatics
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Sequence Similarity Searching Class 4 March 2010.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Sequence analysis course
Sequence Alignment.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
BNFO 240 Usman Roshan. Last time Traceback for alignment How to select the gap penalties? Benchmark alignments –Structural superimposition –BAliBASE.
Introduction to bioinformatics
Sequence similarity.
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
Introduction to Bioinformatics Algorithms Sequence Alignment.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignments Revisited
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Multiple Sequence Alignments
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties.
An Introduction to Bioinformatics
Substitution Numbers and Scoring Matrices
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Tutorial 4 Substitution matrices and PSI-BLAST 1.
Pairwise Sequence Analysis-III
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
©CMBI 2009 Transfer of information The main topic of this course is transfer of information. In the protein world that leads to the questions: 1)From which.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Pairwise Sequence Alignment and Database Searching
Tutorial 3 – Protein Scoring Matrices PAM & BLOSUM
Alignment IV BLOSUM Matrices
Presentation transcript:

Protein Sequence Alignment 1 July DPJ David Philip Judge

SQRGGKTVGYWSTDGYDFNDSKALRNGKFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL KDRGAKIVGSWSTDGYEFESSEAVVDGKFVGLALDLDNQSGKTDERVAAWLAQIAPEFGL AKQGAKPVGFSNPDDYDYEESKSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV N Comparing pairs of sequences Objectives: To align two sequences in a biologically logical fashion. To generate a score representing the “quality” of the alignment. Assumption: The sequences being compared are homologous. {so differences between the sequences are exclusively due to evolutionary processes} {Sadly, the computed score has little absolute meaning}

ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F Dayhoff PAM 250 Matrix R

The PAM matrices are computed from aligned families of proteins. Original protein Aligned current proteins

P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F Dayhoff PAM 250 Matrix R

F F F F F F F F F F Y Y Y Y Y F Y Y Y Y F F Y Y Y Y F Y F Y Y Y F F Original protein Aligned current proteins Y F F Y Original protein Aligned current proteins The High F Y score implies: F -> Y substitutions are common + Y -> F substitutions are common Where F is conserved Where Y is conserved

P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F Dayhoff PAM 250 Matrix R

F F F F F F F F F F Y Y Y Y Y F Y Y Y Y F F Y Y Y Y F Y F Y Y Y F F Original protein Aligned current proteins Y F F Y Original protein Aligned current proteins The High W W score implies: Any substitutions are uncommon Where W is conserved W W W W W W W W W W W W W W W W W W W

P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F Dayhoff PAM 250 Matrix R

A8.3 C1.7 D5.3 E6.2 F3.9 G7.2 H2.2 I5.2 K5.7 L9.0 M2.4 N4.4 P5.1 Q4.0 R5.7 S6.9 T5.8 V6.6 W1.3 Y3.2 AMINO ACIDS % A8.3 W1.3 Alanine is very commonTryptophan is relatively rare Typical Amino Acid composition { according to Argos and McCaldon}

PAM – Point Accepted Mutation PAM – is a measure of evolutionary distance An evolutionary distance of 1 PAM indicates the probability of a residue mutating during a distance in which 1 point mutation was accepted per 100 residues.

Observed %Evolutionary DifferenceDistance (PAMs) Relationship between Observed difference and PAM value.

The BLOSUM matrices are also computed from aligned families of proteins. Original protein Aligned current proteins The BLOSUM scoring matrices

Regions of high conservation, stored in the BLOCKS database

ABEERNALEDLAGERDFWGALSTOUTWRARWATERA Insertions/Deletions => GAPS ACBEERGYALEDILAGERAFGSTOUTFAWATERM

Insertions/Deletions => GAPS ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

Insertions/Deletions => GAPS ABEERN-ALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG-STOUTFAWATERM

Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG--STOUTFAWATERM

Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG---STOUTFAWATERM

Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG---STOUTFA-WATERM

Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG---STOUTFA--WATERM

?????? – but adds 12 to the score ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAF-G--STOUTFA--WATERM G W = -7 G G = +5 Insertions/Deletions => GAPS

?????? – but adds 12 to the score A-BEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAF-G--STOUTFA--WATERM ?????? – but adds 4 to the score G W = -7 G G = +5 A C = -2 A A = +2 Insertions/Deletions => GAPSThe need for penalising GAPs.

LOCAL and GLOBAL implementations of pairwise alignment LOCAL Alignment

LOCAL and GLOBAL implementations of pairwise alignment LOCAL Alignment GCG : bestfit Staden : spin Emboss: water/matcher GLOBAL Alignment GCG : gap Staden : spin Emboss : needle/stretcher