Basic Local Alignment Search Tool (BLAST)

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Introduction to bioinformatics
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Sequence alignment, E-value & Extreme value distribution
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Alignment, Part I Vasileios Hatzivassiloglou University of Texas at Dallas.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Sequence Alignment Xuhua Xia
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Copyright OpenHelix. No use or reproduction without express written consent1.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Your friend has a hobby of generating random bit strings, and finding patterns in them. One day she come to you, excited and says: I found the strangest.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Sequence I/O How to find sequence information from Bio import SeqIO
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Sequence comparison: Local alignment
Local alignment and BLAST
Global, local, repeated and overlaping
Pairwise sequence Alignment.
#7 Still more DP, Scoring Matrices
Sequence Based Analysis Tutorial
Pairwise Sequence Alignment
Lecture 14 Algorithm Analysis
Basic Local Alignment Search Tool
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Basic Local Alignment Search Tool (BLAST)
Pairwise Alignment Global & local alignment
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Basic Local Alignment Search Tool (BLAST) Katie Moreland

Overview Sequence Alignment Dynamic Programming BLAST tutorial Example execution of BLAST References

Sequence Alignment In bioinformatics, a sequence alignment is a way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. (http://wikipedia.org) Example Alignment: G A A T T C A G T T A G G A - T C - G - - A

Sequence Alignment Cont… Motivations: Similar primary structure in proteins implies similar form and function Similar short sequences can lead to motif finding (ie: promoter regions) Similarities between gene regions can be used for phylogenetic classification

Sequence Similarity Alignments are not unique Need a way to compare alignments to find optimal Optimal Alignment is the alignment that maximizes the overall score (may not be unique) Three possibilities when aligning a character for each string: (perfect match, mismatch, indel) Align the two characters Perfect Match Mismatch C C C G Insertion/Deletion (indel) Gap in 1st string (S) Gap in 2nd string (T) - C C -

Sequence Similarity Cont… Simple Metric: σ(x,x) = 1 (match) σ(x,y) = -1 (mismatch) σ(x,-) = σ(-,x) = -1 (indel) In practice it is useful to define a substitution matrix such as PAM250 to take probabilities of certain mutations into account. ie: cost of mutation to a chemically similar amino-acid less than cost of mutation to dissimilar amino-acid Cost of indels depends on application

Intro to Dynamic Programming Used to reduce time complexity of algorithms with certain properties Characteristics of Dynamic Programming: Overlapping subproblems (otherwise recursion/divide and conquer) Optimality of subproblems (ie: Shortest Path)

Intro to Dynamic Programming Two types of alignment Global (Needleman-Wunsch) Attempt to align every residue in the sequences Most useful when sequences are similar in size and sequence Local (Smith-Waterman) Finds an alignment for parts of the two strings Most useful for dissimilar sequences that share regions of similarity or contain similar motifs

Needleman-Wunsch Algorithm Input: Two strings, S and T Construct a matrix with |S|+1 rows and |T|+1 columns Label each row with a symbol from S and each column with a symbol from T, except for the first position in each which represents an initial gap Beginning at upper left corner: Move diagonally to represent aligning the two characters from the strings Move right to represent inserting a space in S Move down to represent insert a space in T Update when newScore > oldScore (include arrow to show which cell we came from) Optimal alignment score is in bottom right corner of matrix Backtrack to find optimal alignment

Needleman-Wunsch Algorithm Sequences to Align: S : GCTC T : CGTTC Simple Scoring Function: σ(x,x) = 2 (match) σ(x,y) = -1 (mismatch) σ(x,-) = σ(-,x) = -1 (indel)

Tracing Needleman-Wunsch

Tracing Needleman-Wunsch

Tracing Needleman-Wunsch -1

Tracing Needleman-Wunsch -1

Tracing Needleman-Wunsch -1 -2 +1

Tracing Needleman-Wunsch -1 -2 +1

Tracing Needleman-Wunsch -1 -2 -3 +1

Tracing Needleman-Wunsch -1 -2 -3 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Tracing Needleman-Wunsch -1 -2 -3 -4 -5 +1 +2 +4

Modifications for Local Alignment Allow the algorithm to restart whenever it is advantageous to do so (start the algorithm from any position in S or T) If 0 > newScore, set score for cell I,j to 0 The optimal score is now the maximum value in all cells of the matrix (stop at any position in S or T)

Other Modifications Use a gap penalty function to accommodate large areas of gaps vs many gaps of size 1 Biological motivations (ie: mutations, cDNA matching)

BLAST Basic Local Alignment Search Tool Features: Uses: http://www.ncbi.nlm.nih.gov/BLAST/ Features: Finds regions of local similarity between sequences Heuristic approach achieves efficiency (important when searching entire databases of sequences) Computes statistical significance of matches Uses: Infer evolutionary/functional relationships Identify members of gene families

BLAST Algorithm Three Stages Find hotspots – exact matches of word length=W in the two sequences being considered (idea: good alignments for sequences will share regions of similarity, find first) Extend hotspots in both directions using ungapped alignment to increase alignment score, pass high scoring sequences to stage 3 Perform gapped alignment between the 2 sequences using variation of Smith-Waterman algorithm. Only statistically significant alignments are displayed to the user.

BLAST Input FASTA format >gi|532319|pir|TVFV2E|TVFV2E envelope protein ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLM NTTVTTGLLLNGSYSENRTQIWQKHRTSNDSALILLNKHYNL TVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWCHFPS NWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPE TANLWFNCHGEFFYCKMDWFLNYLNNLTVDADHNECKNTS GTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKKTYAPPRE GHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRY KLVEITPIGFAPTEVRRYTGGHERQKRVPFVXXXXXXXXXXX XXXXXXXXXXXVQSQHLLAGILQQQKNL LAAVEAQQQMLKLTIWGVK Accession/GI Number Found using GenBank In FASTA example, gi number is 532319

BLAST Input

BLAST Options Select Program: Select database(s) to search blastp, blastn, etc Select database(s) to search Nr default, contains GenBank, PDB, SwissProt, and others Gapped/Ungapped Alignment Search within certain organism

BLAST Options Cont… Filtering on/off E Value Threshold On by default, locates low complexity regions in a sequence and removes them before performing an alignment Low complexity region: a region with highly biased amino acid composition E Value Threshold Default =10, represents the number of hits one can expect to find by chance when searching the database Substitution Matrix Default: BLOSUM62 Assigns probability for each alignment position that a given substitution is known to occur Other matrices are supported, including PAM matrices

BLAST Options

Advanced BLAST Options -G Cost to open a gap [Integer] default = 11 -E Cost to extend a gap [Integer] default = 1 -e Expectation value (E) [Real] default = 10.0 -W Word size default is 11 for blastn, 3 for other programs. -v Number of one-line descriptions (V) [Integer] default = 100 -b Number of alignments to show (B) [Integer]

BLAST Output Request ID Query Information Database Information Taxonomy Reports Link Graphical Display of alignments Description of significant alignments Pairwise alignments

BLAST Output Cont…

Taxonomy Reports Lineage Report Organism Report Taxonomy Report Hierarchical tree structure representing how many hits occurred in each group 'focused' on the organism which yielded the strongest BLAST hit Organism Report Groups hits by species Taxonomy Report Summary of relationships between organisms in BLAST hit list

Graphical Display of Alignments displays the top 100 sequence alignments for a search by default Thick red bar at top represents query sequence, numbers correspond to amino acid residues Hits represented by colored bars, mouse over the bar to view the definition and score in the text box, click to go to pairwise alignment Bar color represents alignment similarity score Color Key given above query sequence to determine ranges of similarities for a particular color

Graphical Display of Alignments

Description of Significant Alignments Listed in order of decreasing significance Default number displayed=100

Pairwise Alignments

BLAST Demonstration >gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577 MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNKMENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLGSVTENVIKKSNKPVLVVKRKNS http://www.ncbi.nlm.nih.gov/BLAST/

References Altschul, SF, W Gish, W Miller, EW Myers, and DJ Lipman. Basic local alignment search tool. J Mol Biol 215(3):403-10, 1990." 2. BLAST Tutorials http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosome/blast.shtml http://wikipedia.org 4. Hatzivassiloglou, V. http://www.hlt.utdallas.edu/%7Evh/Courses/Fall06/Lectures/Alignment%20part%203.ppt