Tutorial 4 Substitution matrices and PSI-BLAST 1.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
BLAST, PSI-BLAST and position- specific scoring matrices Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Introduction to Bioinformatics
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Heuristic alignment algorithms and cost matrices
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Comparing Protein Sequences Tutorial 4. Comparing Protein Sequences Substitution Matrices –PAM - Point Accepted Mutations –BLOSUM - Blocks Substitution.
Sequence analysis course
Comparing Protein Sequences Tutorial 4 Today’s menu: PAM and BLOSUM score matrices Psi-BLAST Phi-BLAST.
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Project Proposals Due Monday Feb. 12 Two Parts: Background—describe the question Why is it important and interesting? What is already known about it? Proposed.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Introduction to bioinformatics
Sequence similarity.
Similar Sequence Similar Function Charles Yan Spring 2006.
BLAST.
Comparing Protein Sequences Tutorial 4. Comparing Protein Sequences Substitution Matrices –PAM –BLOSUM Advance comparison tools –Psi-BLAST –Phi-BLAST.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Scoring matrices Identity PAM BLOSUM.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
An Introduction to Bioinformatics
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
BLAST Workshop Maya Schushan June 2009.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
©CMBI 2005 Transfer of information The main topic of this course is transfer of information. A month in the lab can easily save you an hour in front of.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Sequence Alignment.
Construction of Substitution matrices
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Tutorial 4 Comparing Protein Sequences Intro to Bioinformatics 1.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Tutorial 4 Substitution matrices and PSI-BLAST
Sequence similarity, BLAST alignments & multiple sequence alignments
Alignment IV BLOSUM Matrices
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
BLAST Slides adapted & edited from a set by
Presentation transcript:

Tutorial 4 Substitution matrices and PSI-BLAST 1

Agenda Why study distant homologies? Substitution Matrices – PAM - Point Accepted Mutations – BLOSUM - Blocks Substitution Matrix PSI-BLAST Cool story of the day: Why should we care about cellular fusion in worms? 2

How proteins evolve Throughout evolution proteins change Some change more than others, and at different rates in different regions of the protein. 3

When we study a new organism we may find a lot of unknown sequences that we would like to characterize. We might not be able to find any close homologies. Substitution matrices model different evolutional distances. PSI-BLAST enable to find more distant relations between proteins. 4 Why study distant homologies?

Amino acids were not born equally 5 Both substitution matrices and PSI-BLAST are designed to model the process by which AAs mutate.

Substitution Matrix Scoring matrix S of size 20x20 S i,j represents the gain/penalty due to substituting AA j by AA i (i – line, j – column) – Based on likelihood this substitution is found in nature – Computed differently in PAM and BLOSUM Each matrix is tailored to a particular evolutionary distance 6

Computing probability of Mutation (M i,j ) PAM - Point Accepted Mutations – Based on a small set of proteins that are closely related – Other than PAM1 the matrices are theoretical. BLOSUM - Blocks Substitution Matrix – Based on a wider database of proteins that includes families of proteins with conserved regions. – The matrices are empirical. 7

PAM Based on a small set of proteins that are closely related PAM1 Captures mutation rates between close proteins – protein with 1% divergence Problematic when comparing distant proteins. The 1% divergence does not capture more sporadic mutations 8

PAM-X In order to apply for more distant proteins PAM-1 was self-multiplied. This models the evolutionary process of accumulation of mutations. The higher the number of the matrix – the more suitable it is to find distant homologies. Other than PAM1 the matrices are theoretical. 9

Scores for each position are derived from observations of the frequencies of substitutions in blocks of local alignments in related proteins. BLOSUM62 contains all blocks whose members shared at most 62% identity with any other member of that block. 10 BLOSUM

11 50% similarity 32% similarity Substitution Matrix B BLOSUM-X Substitution Matrix A BLOCKS DB

PAM vs. BLOSUM PAMBLOSUM Based on global alignments of closely related proteins. Based on local alignments. The PAM1 is calculated from comparisons of sequences with no more than 1% divergence. BLOSUM 62 is calculated from comparisons of sequences with no more than 62% identity in the blocks. Other PAM matrices are extrapolated from PAM1. All BLOSUM matrices are based on observed alignments. They are not extrapolated from comparisons of closely related proteins. 12 BLOSUM are the substitution matrices in use

PAM100 ~ BLOSUM90 Closely Related PAM120 ~ BLOSUM80 PAM160 ~ BLOSUM60 PAM200 ~ BLOSUM52 PAM250 ~ BLOSUM45 Highly Divergent Query lengthMatrixGap costs <35PAM309, PAM7010, BLOSUM8010,1 >85BLOSUM6211,1 Use Recommendations 13

Example Query: an uncharacterized (hypothetical) protein Data Base: nr Blast Program: BLASTP Matrices: PAM30 / PAM250 BLOSUM45 / BLOSUM90 14

15

16

PSI-BLAST Position Specific Iterative BLAST Aimed to find more distant proteins than BLAST allows 17

PSI-BLAST Steps 18 1.Search a query against a protein database 2.Constructs a specialized multiple sequence alignment based on the top results. 3.Creates a position-specific scoring matrix (PSSM). 4.The PSSM is used as a query against the database 5.PSI-BLAST estimates statistical significance (E values) Repeat steps 3-5 iteratively. Protein DB Search Query PSSM Results Iterations

Example 19 We will use a sequence of an uncharacterized (hypothetical) protein:

20 Threshold for initial BLAST Search (default: 10) Threshold for inclusion in PSI-BLAST iterations (default: 0.005)

21 The results are all hypothetical proteins

22

Cool Story of the day Why should we care about cellular fusion in worms?

Cellular fusion In cellular fusion two cells unite and form one cell Fertilization Muscle cells are composed of rows of fused cells Placenta is made up of powerful multinucleated cells that are actually numerous individual cells that have fused The eyes' lenses are formed of rows of fused cells In bones too cellular fusion occurs. The fusion processes are also involved in cancer, viral infections and stem cells news-items/ elegans/news-item-en.htm

25 Beni Podbilewicz The exact way fusion takes place is still not completely clear and is the focus of work in Prof. Podbilewicz's lab. The worm suits cell fusion research because in its skin intensive cell-cell fusion processes take place and can be easily followed. They identified the protein responsible for the worm's fusion activity - the EFF-1 protein. The researchers showed that in mutant worms skin cells do not fuse and the cells begin to migrate through the body. Cellular fusion in C.elegans

26

27 “...we identified fusion family (FF) proteins within and beyond nematodes, and divergent members from the human parasitic nematode Trichinella spiralis and the chordate Branchiostoma floridae could also fuse mammalian cells…”