BLAST What it does and what it means Steven Slater Adapted from www.pitt.edu/~mcs2/teaching/biocomp/ppt/BLAST_Sp10.p pt.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
BLAST, PSI-BLAST and position- specific scoring matrices Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Last lecture summary.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Searching Sequence Databases
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
FASTA and BLAST. FASTA: Introduction FASTA (pronounced FAST-Aye) stands for FAST-All, reflecting the fact that it can be used for a fast protein comparison.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Heuristic Approaches for Sequence Alignments
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 16 th, 2014.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence alignment, E-value & Extreme value distribution
From Pairwise Alignment to Database Similarity Search.
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 17 th, 2013.
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
SSAHA, or Sequence Search and Alignment by Hashing Algorithm, is used mainly for fast sequence assembly, SNP detection, and the ordering and orientation.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Bacterial Genetics - Assignment and Genomics Exercise: Aims –To provide an overview of the development and.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
1 Data structure:Lookup Table Application:BLAST. 2 The Look-up Table Data Structure A k-mer is a string of length k. A lookup table is a table of size.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Construction of Substitution matrices
Doug Raiford Phage class: introduction to sequence databases.
Finding Sequence Similarities >query AGACGAACCTAGCACAAGCGCGTCTGGAAAGACCCGCCAGCTACGGTCACCGAG CTTCTCATTGCTCTTCCTAACAGTGTGATAGGCTAACCGTAATGGCGTTCAGGA GTATTTGGACTGCAATATTGGCCCTCGTTCAAGGGCGCCTACCATCACCCGACG.
Step 3: Tools Database Searching
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
What is BLAST? Basic BLAST search What is BLAST?
Heuristic Methods for Sequence Database Searching BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
What is BLAST? Basic BLAST search What is BLAST?
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Welcome to Introduction to Bioinformatics
Identifying templates for protein modeling:
Sequence alignment, Part 2
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
1-month Practical Course Genome Analysis Iterative homology searching
Searching Sequence Databases
Presentation transcript:

BLAST What it does and what it means Steven Slater Adapted from pt

Why Search Sequence Databases? Sequence databases like GenBank contain all public sequences and any annotations of them Searching these databases permits you to find any genes related to your Gene of Interest (GOI), and to potentially assign it a function This is a routine, but highly sophisticated, tool used daily by genome scientists

Search programs are sequence alignment programs They try to find the best alignment between your probe sequence and every target sequence in the database Finding optimal alignments is computationally a very resource intensive process It is usually not necessary to find optimal alignments, particularly for large databases Alignments are ranked and only top scores are reported

Practical database search methods incorporate shortcuts The fastest sequence database searching programs use heuristic algorithms Heuristic = “Computing proceeding to a solution by trial and error or by rules that are only loosely defined. ” – Oxford English Dictionary The basic concept is to break the search and alignment process down into several steps At each step, only a best scoring subset is retained for further analysis

Heuristic programs find approximate alignments They are less sensitive than “dynamic programming” algorithms such as Smith- Waterman for detecting weak similarity In practice, they run much faster and are usually adequate The BLAST program developed by Stephen Altschul and coworkers at the NCBI is the most widely used heuristic program.  Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res Sep 1;25(17):

BLAST is a collection of five programs for different combinations of query and database sequences

ProgramQueryDatabase BLASTNDNA BLASTPprotein BLASTXtranslated DNA protein TBLASTNproteintranslated DNA TBLASTXtranslated DNA translated DNA

How does BLAST Quantify Alignment Quality? It uses a scoring matrix to judge the quality of each alignment match. The most commonly-used matrix is designated BLOSUM62 The BLOSUM matrices are calculated using real gene alignments and estimating the likelihood that a particular alignment will occur randomly W/blosum62.htm 8

Why BLAST is great Very fast and can be used to search extremely large databases Sufficiently sensitive and selective for most purposes Robust - the default parameters can usually be used

BLAST scores are reported in two columns Raw values based on the specific scoring matrix employed As bits, which are matrix independent normalized values (bigger = better) Significance is represented by E values (smaller = better)

Typical BLAST Output Sorted by E value

The EXPECT (E) threshold is used to control score reporting A match will only be reported if its E value falls below the threshold set The default value for E is 10, which means that 10 matches with scores this high are expected to be found by chance Lower EXPECT thresholds are more stringent, and report fewer matches

Interpreting BLAST scores Score interpretation is based on context  What is the question?  What else do you know about the sequences?  Scoring is highly dependent on probe length Exact matches will usually have the highest scores (and lowest E values)  Short exact matches may score lower than longer partial matches

Interpreting BLAST scores Short exact matches are expected to occur at random. Partial matches over the entire length of a query are stronger evidence for homology than are short exact matches.

Translated BLAST Searches translations use all 6 frames computationally intensive tblastx searches can be very slow with some large databases must specify genetic code

Alternate Genetic Codes

Translated BLAST Searches

Taxonomy Reports

BLAST Genomes

Align 2 Sequences with BLAST

BLAST from ORF Finder

Primer BLAST