BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Last lecture summary.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Searching Sequence Databases
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
BLAST.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Sequence alignment, E-value & Extreme value distribution
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
An Introduction to Bioinformatics
BLAST What it does and what it means Steven Slater Adapted from pt.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
BLAST : Basic local alignment search tool B L A S T !
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
What is BLAST? BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Construction of Substitution matrices
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
What is BLAST? Basic BLAST search What is BLAST?
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
Lab 3.2: Database Similarity Searching “The BLAST Buffet” Stephanie Minnema University of Calgary.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
What is BLAST? Basic BLAST search What is BLAST?
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Identifying templates for protein modeling:
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Sequence alignment, Part 2
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 2 Sequence based searching –To compare a sequence against the sequence database –To locate similar sequences Similarity may extent to entire length Similarity may be restricted to local regions (domains)

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 3 Steps in sequence-based database searching Identify the query sequence –Protein/nucleic acid Select an algorithm/tool –FASTA / BLAST Select the database –Protein or nucleic acid sequence database –One or all databases Fire the query –On-line / Off-line Analyse the results –Statistically significant vs chance findings

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 4 DNA vs. Protein searches Comparing DNA sequences: –More diverged –significantly more random matches –No choice of scoring matrices (Unitary matrix) Comparing protein sequences –Less diverged than the DNA encoding them. –Significantly less random hits –A wide choice of sensitive matrices like PAM and BLOSUM

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 5 Database Searching Programs FASTA BLAST BLITZ Smith & Waterman algorithm Identify local similarity

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 6 BLAST Algorithm

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 7 BLAST Algorithm

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 8 BLAST Algorithm

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 9 Protein databases for BLAST 1: Default; 2: thru rpsblast pages

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 10 Nucleotide databases for BLAST

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 11 BLAST family of programs Blastp: compares an amino acid query sequence against a protein sequence database Blastn: compares a nucleotide query sequence against a nucleotide sequence database Blastx: compares a nucleotide query sequence translated in all reading frames against a protein sequence database Tblastn: compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames Tblastx: compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 12 How to run Input: sequence in FASTA format, Bare sequence, GenBank/ GenPept sequence format, copy & paste OR upload as a file OR Identifiers: accession, accession.version or gi's Sequence range: Specific to protein blast; domain search

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 13 Options for advanced BLAST Limit the BLAST search to the result of an Entrez query against the database chosen mask off segments of the query sequence that have low compositional complexit Filtering is only applied to the query sequence and not to database sequences Carried out using SEG and DUST programs masks Human repeats. Ex: LINE's, SINE's, plus retroviral repeasts Format input sequence to mask certain regions the statistical significance threshold for reporting matches against database

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 14 BLAST Statistics: significance of E-value Quantification of similarity –% identity & Similarity score to rank database sequences Statistics –E-value indicates the number of different alignments with score >= S expected to occur by chance in a database search –Lower the E-value higher is the significance of score –P-value indicates if such an alignment can be expected from a chance alone Chance: can mean the comparison of (a)real but non-homologous sequences (True negatives) (b) real sequences that are shuffled to preserve compositional properties (c) sequences that are generated randomly

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 15 Expect value E() –Number of hits expected to be found by chance with a such score. –E() does not represent a measure of similarity between two sequences. –As close to 0 as possible

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 16 More about E-value The number of hits one can "expect" to see just by chance Lower the E-value, or the closer it is to "0" the more "significant" the match It decreases exponentially with the Score (S) assigned to a match between two sequences. For example: E value of 1 assigned to a hit can be interpreted as in a database of the current size one might expect to see 1 match with a similar score simply by chance. Note: Searches with short sequences have relatively high E- value meaning shorter sequences have a high probability of occurring in the database purely by chance.

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 17 Test case: proteinprotein > gi| |Enoyl-Acyl-Carrier Protein Reductase [Chlamydia trachomatis] MLKIDLTGKIAFIAGIGDDNGYGWGIAKMLAEAGATILVGTWVPIYKIFSQSLELGKFNASRELSNGELL TFAKIYPMDASFDTPEDIPQEILENKRYKDLSGYTVSEVVEQVKKHFGHIDILVHSLANSPEIAKPLLDT SRKGYLAALSTSSYSFISLLSHFGPIMNAGASTISLTYLASMRAVPGYGGGMNAAKAALESDTKVLAWEA GRRWGVRVNTISAGPLASRAGKAIGFIERMVDYYQDWAPLPSPMEAEQVGAAAAFLVSPLASAITGETLY VDHGANVMGIGPEMFPKD The output The first hit

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 18 How plant genes were acquired by human parasites? Acanthamoeba, a free-living protozoan found in fresh water or soil, but which may occur as a human pathogen. Perhaps Acanthamoeba was the original host for Chlamydia, and served as a vector to transfer its Chlamydia parasite to humans. 16s RNA analyses shows that it is more related to plants Thus, Chamydia might have acquired plant genes from Acanthamoeba

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 19 What have we seen? A bacterial protein involved in fatty acid metabolism shows similarity with Plant proteins The similarity with plant proteins is more than the proteins from other bacteria or the host – human. Could it be a case of horizontal gene transfer?

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 20 Searching databases When searching a database, we take a query sequence and use an algorithm (program) for the search. Every pair compared yields a few scores. Larger bit/opt scores usually indicate a higher degree of similarity. Smaller the E/P values: higher confidence A typical db search will yield a huge number of scores to be analyzed.

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 21 db searching Normally, each database search yields 2 groups of scores: genuinely related (True) and unrelated sequences (False positives), with some overlap between them. A good search method should completely separate between the 2 score groups. In practice no search method succeeds in total separation.

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 22 Sensitivity vs Specificity True Positives True Negatives False Positive: True negative but selected by program as positives False Negative: True positive but missed by program and indicated as negative Sensitivity: –Ability to detect True positive matches –Most sensitive search finds all true positives –But will also have a few false positives (as low as possible) Specificity: –Ability to reject True negative matches –But will also reject True positives (false negatives)

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 23 Sensitivity (Sn) & Specificity (Sp) Calculation Sn = TP/ (TP+FN) Sp = TP/ (TP+FP) Where –TP: True Positives –FP: False Positive –FN: False Negative

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 24 Presenting your results Document –Name and version no of software and database –Reference/URL Include statistical results that support an inference –% identity, P-value, E-value

Sept 26, 2008© UKK, Bioinformatics Centre, University of Pune 25 More BLAST case studies Visit Coffee NCBI