Comparing Protein Sequences Tutorial 4. Comparing Protein Sequences Substitution Matrices –PAM - Point Accepted Mutations –BLOSUM - Blocks Substitution.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Measuring the degree of similarity: PAM and blosum Matrix
Introduction to Bioinformatics
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Heuristic alignment algorithms and cost matrices
Comparing Protein Sequences Tutorial 4 Today’s menu: PAM and BLOSUM score matrices Psi-BLAST Phi-BLAST.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Project Proposals Due Monday Feb. 12 Two Parts: Background—describe the question Why is it important and interesting? What is already known about it? Proposed.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Introduction to bioinformatics
Sequence similarity.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Comparing Protein Sequences Tutorial 4. Comparing Protein Sequences Substitution Matrices –PAM –BLOSUM Advance comparison tools –Psi-BLAST –Phi-BLAST.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Scoring matrices Identity PAM BLOSUM.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Sequence Alignments Revisited
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
An Introduction to Bioinformatics
Protein Sequence Alignment and Database Searching.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
BLAST Workshop Maya Schushan June 2009.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Biology 4900 Biocomputing.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Tutorial 4 Substitution matrices and PSI-BLAST 1.
©CMBI 2005 Transfer of information The main topic of this course is transfer of information. A month in the lab can easily save you an hour in front of.
Pairwise Sequence Analysis-III
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Sequence Alignment.
Construction of Substitution matrices
Blosum matrices What are they? Morten Nielsen BioSys, DTU
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Copyright OpenHelix. No use or reproduction without express written consent1.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
©CMBI 2009 Transfer of information The main topic of this course is transfer of information. In the protein world that leads to the questions: 1)From which.
Tutorial 4 Comparing Protein Sequences Intro to Bioinformatics 1.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Tutorial 4 Substitution matrices and PSI-BLAST
Sequence similarity, BLAST alignments & multiple sequence alignments
Alignment IV BLOSUM Matrices
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
BLAST Slides adapted & edited from a set by
BLAST Slides adapted & edited from a set by
Presentation transcript:

Comparing Protein Sequences Tutorial 4

Comparing Protein Sequences Substitution Matrices –PAM - Point Accepted Mutations –BLOSUM - Blocks Substitution Matrix Advance comparison tools –Psi-BLAST –Phi-BLAST

Substitution Matrix Scoring matrix S –20x20 for protein alignment (Amino-acid) S i,j represents the gain/penalty due to substituting AA j by AA i (i – line, j – colomn) –Based on likelihood this substitution is found in nature –Computed differently in PAM and BLOSUM

Computing probability of Mutation (M i,j ) PAM - Point Accepted Mutations –Based on closely related proteins (X% divergence) –Matrices for comparison of divergent proteins computed BLOSUM - Blocks Substitution Matrix –Based on conserved blocks bounded in similarity (at least X% identical) –Matrices for divergent proteins are derived using appropriate X%

PAM-1 Captures mutation rates between close proteins –1% divergence –M i,j = A  B / #A Problematic when comparing far proteins –The 1% divergence does not capture more sporadic mutations –PAM250 is theoretical (extrapolation based)

PAM-1

Captures mutation rates between divergent proteins Why is BLOSUM62 called BLOSUM62? Basically, this is because all blocks whose members shared at least 62% identity with ANY other member of that block were averaged and represented as 1 sequence. BLOSUM62

The idea of BLOSUM matrices is to get a better measure of differences between two proteins specifically for more distantly related proteins. Similar AA have high score

PAM & BLOSUM PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. Other PAM matrices are extrapolated from PAM1. BLOSUM matrices are based on local alignments. BLOSUM 62 is a matrix calculated from comparisons of sequences with at least 62% identity in the blocks. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins.

PAM100 ~ BLOSUM90 Closely Related PAM120 ~ BLOSUM80 PAM160 ~ BLOSUM60 PAM200 ~ BLOSUM52 PAM250 ~ BLOSUM45 Highly Divergent Query lengthMatrixGap costs <35PAM309, PAM7010, BLOSUM8010,1 >85BLOSUM6211,1 Use Recommendations

Example Query: >ADRM1_HUMAN (A glycosylated plasma membrane protein which promotes cell adhesion Data Base: nr on Human genome. Blast Program: BLASTP Matrices: PAM30,BLOSUM45

PAM 30 BLOSUM45 With BLOSUM45 we found related and divergent sequences. With PAM30 we found only related sequences. What difference do we observe?

PAM 30 BLOSUM45 With BLOSUM45 we can discover interesting relations between proteins Mucin-13:a glycosylated membrane protein that protects the cell by binding to pathogens

With PAM 30 With BLOSUM45 Using different scoring matrices can produce slightly Different alignments:

A same alignment can be solved in many ways, specially when using a matrix for highly divergent sequences (BLOSUM45):

PSI-BLAST Position Specific Iterative BLAST We will analyze the following Archeal uncharacterized protein: >gi| |sp|Q57997|Y577_METJA PROTEIN MJ0577 MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVI DEREIKKRDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNK MENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIM GSHGKTNLKEILLGSVTENVIKKSNKPVLVVKRKNS

Threshold for initial BLAST Search (default:10) Threshold for inclusion in PSI-BLAST iterations (default:0.005)

The query itself Orthologous sequences in two other archaeal species Other homologo us sequence s

Is MJ0577 a filament protein? Is MJ0577 a cationic amino transporter? Is MJ0577 a universal stress protein?

Pattern Hit Initiated BLAST PHI-BLAST

Pattern symbols []= For grouping up aminoacids that can happen at a given position ()= For numbers, when a residue (or group of residues) is repited - = For separating between positions

Making a pattern [LIVM](2)-D-E-A-D-[RKEN]-x-[LI] …LIDEADKTT… …IMDEADEFL… …LLDEADKCL… …ILDEADRIL… …VVDEADNFI… …LVDEADKGI… …LMDEADEFL… …MLDEADRSI… …LIDEADKML… …MLDEADNWI… …LVDEADRFL…

Example >gi| |sp|P0A9P6|DEAD_ECOLI Cold-shock DEAD box protein A (ATP-dependent RNA helicase deaD) MAEFETTFADLGLKAPILEALNDLGYEKPSPIQAECIPHLLNGRDVLGMAQTGSGKTAAFSLPLLQNLDP ELKAPQILVLAPTRELAVQVAEAMTDFSKHMRGVNVVALYGGQRYDVQLRALRQGPQIVVGTPGRLLDHL KRGTLDLSKLSGLVLDEADEMLRMGFIEDVETIMAQIPEGHQTALFSATMPEAIRRITRRFMKEPQEVRI QSSVTTRPDISQSYWTVWGMRKNEALVRFLEAEDFDAAIIFVRTKNATLEVAEALERNGYNSAALNGDMN QALREQTLERLKDGRLDILIATDVAARGLDVERISLVVNYDIPMDSESYVHRIGRTGRAGRAGRALLFVE NRERRLLRNIERTMKLTIPEVELPNAELLGKRRLEKFAAKVQQQLESSDLDQYRALLSKIQPTAEGEELD LETLAAALLKMAQGERTLIVPPDAPMRPKREFRDRDDRGPRDRNDRGPRGDREDRPRRERRDVGDMQLYR IEVGRDDGVEVRHIVGAIANEGDISSRYIGNIKLFASHSTIELPKGMPGEVLQHFTRTRILNKPMNMQLL GDAQPHTGGERRGGGRGFGGERREGGRNFSGERREGGRGDGRRFSGERREGRAPRRDDSTGRRRFGGDA The DEAD box pattern: [LIVM](2)-D-E-A-D-[RKEN]-x-[LI]