9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
BLAST Sequence alignment, E-value & Extreme value distribution.
Last lecture summary.
Measuring the degree of similarity: PAM and blosum Matrix
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
Heuristic alignment algorithms and cost matrices
We continue where we stopped last week: FASTA – BLAST
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
Introduction to bioinformatics
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Heuristic Approaches for Sequence Alignments
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 16 th, 2014.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence alignment, E-value & Extreme value distribution
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
An Introduction to Bioinformatics
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
8/31/07BCB 444/544 F07 ISU Dobbs #6 - More DP: Global vs Local Alignment1 BCB 444/544 Lecture 6 Try to Finish Dynamic Programming Global & Local Alignment.
9/12/07BCB 444/544 F07 ISU Dobbs #10 - BLAST details + some Gene Jargon1 BCB 444/544 Lecture 10 BLAST Details Plus some Gene Jargon #10_Sept12.
8/31/07BCB 444/544 F07 ISU Dobbs #6 - Scoring Matrices & Alignment Stats1 BCB 444/544 Lecture 6 Finish Dynamic Programming Scoring Matrices Alignment Statistics.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Finding Sequence Similarities >query AGACGAACCTAGCACAAGCGCGTCTGGAAAGACCCGCCAGCTACGGTCACCGAG CTTCTCATTGCTCTTCCTAACAGTGTGATAGGCTAACCGTAATGGCGTTCAGGA GTATTTGGACTGCAATATTGGCCCTCGTTCAAGGGCGCCTACCATCACCCGACG.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Sequence Alignment.
Construction of Substitution matrices
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
#8 Finish DP, Scoring Matrices, Stats & BLAST
#7 Still more DP, Scoring Matrices
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
BCB 444/544 Lecture 9 Finish: Scoring Matrices & Alignment Statistics
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST2 Exhaustive vs Heuristic Methods Exhaustive - tests every possible solution guaranteed to give best answer (identifies optimal solution) can be very time/space intensive! e.g., Dynamic Programming as in Smith-Waterman algorithm Heuristic - does NOT test every possibility no guarantee that answer is best (but, often can identify optimal solution) sacrifices accuracy (potentially) for speed uses "rules of thumb" or "shortcuts" e.g., BLAST & FASTA

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST3 Today's Lab: focus on BLAST B asic L ocal A lignment S earch T ool STEPS: 1.Create list of very possible "word" (e.g., 3-11 letters) from query sequence 2.Search database to identify sequences that contain matching words 3.Score match of word with sequence, using a substitution matrix 4.Extend match (seed) in both directions, while calculating alignment score at each step 5.Continue extension until score drops below a threshold (due to mismatches) 6.Contiguous aligned segment pair (no gaps) is called: High Scoring Segment Pair (HSP)

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST4 Today's Lab: focus on BLAST B asic L ocal A lignment S earch T ool Results? Original version of BLAST? List of HSPs = Maximum Scoring Pairs More recent, improved versionof BLAST? Allows gaps: Gapped Alignment How? Allows score to drop below threshold, (but only temporarily)

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST5 BLAST - a few details Developed by Stephen Aultschul at NCBI in 1990 Word length? Typically: 3 aa for protein sequence 11 nt for DNA sequence Substitution matrix? Default is BLOSUM62 Can change under Algorithm Parameters Choose other BLOSUM or PAM matrices Stop Extension Threshold? Typically: 22 for proteins 20 for DNA

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST6 BLAST - a few more details BLAST is family of programs with several "variants" BLASTN - BLASTP - BLASTX - TBLASTM - TBLASTX - Statistical Significance? E-value: E = m x n x P m = total number of residues in database n = number of residues in query sequence P = probability that an HSP is result of random chance lower E-value, less likely to result from random change, thus higher significance Bit Score: S' is normalized, to account for sequence length differences & size of database Low Complexity Masking - remove repeats that confound scoring

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST7 "Scoring" or "Substitution" Matrices 2 Major types for Amino Acids: PAM & BLOSUM PAM = Point Accepted Mutation relies on "evolutionary model" based on observed differences in alignments of closely related proteins BLOSUM = BLOck SUbstitution Matrix based on % aa substitutions observed in blocks of conserved sequences within evolutionarily divergent proteins

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST8 PAM Matrix PAM = Point Accepted Mutation relies on "evolutionary model" based on observed differences in closely related proteins Model includes defined rate for each type of sequence change Suffix number (n) reflects amount of "time" passed: rate of expected mutation if n% of amino acids had changed PAM1 - for less divergent sequences (shorter time) PAM250 - for more divergent sequences (longer time)

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST9 BLOSUM Matrix BLOSUM = BLOck SUbstitution Matrix based on % aa substitutions observed in blocks of conserved sequences within evolutionarily divergent proteins Doesn't rely on a specific evolutionary model Suffix number (n) reflects expected similarity: average % aa identity in the MSA from which the matrix was generated BLOSUM45 - for more divergent sequences BLOSUM62 - for less divergent sequences

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST10 BLOSUM62 Substitution Matrix s(a,b) corresponds to score of aligning character a with character b Match scores are often calculated based on frequency of mutations in very similar sequences (more details later)

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST11

9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST12 Affine Gap Penalty Functions Affine Gap Penalties = Differential Gap Penalties used to reflect cost differences between opening a gap and extending an existing gap Total Gap Penalty is linear function of gap length: W =  +  X (k - 1) where  = gap opening penalty  = gap extension penalty k = length of gap Sometimes, a Constant Gap Penalty is used, but it is usually least realistic than the Affine Gap Penalty Can also be solved in O(nm) time using DP