Slide 1 EE3J2 Data Mining Lecture 20 Sequence Analysis 2: BLAST Algorithm Ali Al-Shahib.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
1 CAP5510 – Bioinformatics Database Searches for Biological Sequences or Imperfect Alignments Tamer Kahveci CISE Department University of Florida.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
Lecture 3.11 BLAST. Lecture 3.12 BLAST B asic L ocal A lignment S earch T ool Developed in 1990 and 1997 (S. Altschul) A heuristic method for performing.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
We continue where we stopped last week: FASTA – BLAST
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Heuristic Approaches for Sequence Alignments
BLAST.
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Introduction to Bioinformatics BLAST. Introduction –What is BLAST? –Query Sequence Formats –What does BLAST tell you? Choices –Variety of BLAST –BLAST.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence alignment, E-value & Extreme value distribution
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
From Pairwise Alignment to Database Similarity Search.
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 17 th, 2013.
Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
An Introduction to Bioinformatics
BLAST Workshop Maya Schushan June 2009.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
What is BLAST? BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Part 2- OUTLINE Introduction and motivation How does BLAST work?
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Doug Raiford Phage class: introduction to sequence databases.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
What is BLAST? Basic BLAST search What is BLAST?
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
What is sequencing? Video: WlxM (Illumina video) WlxM.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
What is BLAST? Basic BLAST search What is BLAST?
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Bioinformatics and BLAST
Sequence alignment, Part 2
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

Slide 1 EE3J2 Data Mining Lecture 20 Sequence Analysis 2: BLAST Algorithm Ali Al-Shahib

Slide 2 Protein Structures

Slide 3 Protein Structures

Slide 4 Background  One method of finding the function of an unknown protein (X) is to compare its sequence with a known protein sequence (Y).  If the sequences match to a certain degree then we can say X has similar functions to Y.  Protein sequences contain 20 different types of ‘letters’. In biology these are known as amino acids.

Slide 5 BLAST  One method of performing this sequence comparison is called Basic Local Alignment Search Tool (BLAST).  Developed in 1990 and 1997 (S. Altschul)  A heuristic method for performing local alignments through searches of high scoring segment pairs (HSP’s).  An HSP consists of two sequence fragments of arbitrary but equal length whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score (also called neighbourhood score threshold).

Slide 6 Why Is BLAST Useful? hereditary non-polyposis colon cancer gene sequence hereditary non-polyposis colon cancer gene sequence A Lindblom et al (1993) Nat Genet 5:279

Slide 7 BLASTing a Sequence

Slide 8 BLAST Results hereditary non-polyposis colon cancer DNA mismatch repair protein hereditary non-polyposis colon cancer DNA mismatch repair protein

Slide 9 Another Example Database: swissprot: 86,593 sequences; 31,411,157 total letters Score E Sequences producing significant alignments: (bits) Value SW:HBB_HUMAN P02023 HEMOGLOBIN BETA CHAIN. (human) 306 2e-83 SW:HBB_GORGO P02024 HEMOGLOBIN BETA CHAIN. (gorilla) 305 4e-83 SW:HBB2_PANLE P18988 HEMOGLOBIN BETA-2 CHAIN. (lion) 302 3e-82 SW:HBB_HYLLA P02025 HEMOGLOBIN BETA CHAIN. (gibbon) 300 8e-82 SW:HBB_PREEN P02032 HEMOGLOBIN BETA CHAIN. (Hanumam langur) 298 5e-81 SW:HBB_COLPO P19885 HEMOGLOBIN BETA CHAIN. (Colobus) 295 3e-80 SW:HBB_CERAE P02028 HEMOGLOBIN BETA CHAIN. (Green monkey) 295 3e-80 SW:HBB_MACFU P02027 HEMOGLOBIN BETA CHAIN. (Japanese macaque) 293 2e-79 SW:HBB_CALAR P18985 HEMOGLOBIN BETA CHAIN. (Marmoset) 292 2e-79 SW:HBB_ATEGE P02034 HEMOGLOBIN BETA CHAIN. (Spider monkey) 292 2e-79 SW:HBB_MANSP P08259 HEMOGLOBIN BETA CHAIN. (Mandrill) 291 4e-79 … SW:HBB1_RAT P02091 HEMOGLOBIN BETA CHAIN, (Rat) 255 4e-68 SW:HBB_ERIEU P02059 HEMOGLOBIN BETA CHAIN. (Hedgehog) 252 2e-67 SW:HBB_PANPO P04244 HEMOGLOBIN BETA CHAIN. (Bison) 251 5e-67 SW:HBB_BISBO P09422 HEMOGLOBIN BETA CHAIN. (Leopard) 251 5e-67

Slide 10 BLAST Parameters  Identities - No. & % exact residue matches  Positives - No. and % similar & ID matches  Gaps - No. & % gaps introduced  Score - Summed HSP score (S)  Bit Score - a normalized score (S’)  Expect (E) - Expected # of chance HSP aligns  P - Probability of getting a score > X  T - Minimum word or k-tuple score (Threshold)

Slide 11 Different Flavours of BLAST  BLASTP - protein query against protein DB  BLASTN - DNA/RNA query against GenBank (DNA)  BLASTX - 6 frame trans. DNA query against proteinDB  TBLASTN - protein query against 6 frame GB transl.  TBLASTX - 6 frame DNA query to 6 frame GB transl.  PSI-BLAST - protein ‘profile’ query against protein DB  PHI-BLAST - protein pattern against protein DB

Slide 12 BLAST Algorithm Source: NCBI

Slide 13 A Question Question: Given the protein sequence SLAALLNKCKTPQGQRLVNQW and the word length L= 3, explain how the BLAST algorithm is used to find the highest scoring alignment between the sequences

Slide 14 Answer: Explaining the BLAST Algorithm 1. Query sequence must be split into words of defined length. A list of words of length 3 (L) in the query protein sequence is made starting with positions 1,2, and 3; then 2,3, and 4; etc. Our query sequence: SLAALLNKCKTPQGQRLVNQW SLA, LAA, AAL, ALL, LLN, LNK, NKC, KCK, CKT,PQG,QGQ,GQR,QRL,RLV,LVN,VNQ NQW

Slide 15 Con…BLAST Algorithm 2. Define a threshold alignment score T (neighbourhood score threshold). 3. Find all word-pairs of length L with score ≥ T e.g Find all w such that S(w, PQG) ≥ T In another words, the query sequence are evaluated with any other combination of three amino acids. This is done using a scoring matrix (e.g., BLOSUM 62). Note: There are a total 20 x 20 x 20 = 8,000 possible match scores for a word

Slide 16 Con…BLAST Algorithm Neighbourhood words to PQG PQG18 PEG15 PRG14 PKG14 PDG13 PHG13 PMG13 PSG13 PQA12 PQN12 Neighbourhood Score Threshold (T=13) Neighbourhood words Note: This procedure is repeated for each three-letter word in the query sequence

Slide 17 Con….BLAST Algorithm 4. Now, search database for all ‘hits’ - sequences with exact matches to each w. 5. Extend in both directions alignment of ‘hits’ while score increases – producing High Scoring Pair’s (locally optimal ungapped alignments). 6. Return sequences with HSP’s which have significantly (statistically) higher scores than a threshold Smax Smax obtained empirically from random sequences

Slide 18 Con….BLAST Algorithm  So…. SLAALLNKCKTPQGQRLVNQW +LA++L+ TP G R++ +W TLASVLDCTVTPMGSRMLKRW High Scoring Segment Pair’s

Slide 19 Con….BLAST Algorithm 7. Varying the threshold alignment score T Search time decreases as T is increased, fewer word pairs are found Sensitivity of search decreases as T is increased, word pairs overlooked (homologous (or similar) sequences may be discarded). Note: The score of the alignment Smax AND the associated statistical significance are required to assess whether homology is suggested.

Slide 20 Conclusions  Protein Sequences  BLAST Algorithm