Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Summer Bioinformatics Workshop BLAST Introduction –What is BLAST? –Query Sequence in FASTA Format –What does BLAST tell you? Choices –BLAST Programs: Which One to Use? –Commonly Used BLAST programs –BLAST Databases: Which One to Search? Understanding the Output Database Search with BLAST Blast Steps – How It Works Acknowledgement: The presentation includes adaptations from NCBI’s Introduction to Molecular Biology Information ResourcesIntroduction to Molecular Biology Information Resources Modules
Summer Bioinformatics Workshop What is BLAST? Basic Local Alignment Search Tool The Google TM of bioinformatics query is a DNA or protein sequence, not a text term character string comparison against all the sequences in the target database rigorous statistics used to identify statistically significant matches
Summer Bioinformatics Workshop Query Sequence in FASTA Format FASTA definition line ("def line") that begins with a >, followed by some text that briefly describes the query sequence on a single line up to 80 nucleotide bases or amino acids per line example and additional informationexample >gi|532319|pir|TVFV2E|TVFV2E envelope protein ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
Summer Bioinformatics Workshop What does BLAST tell you? putative identity and function of your query sequence helps to direct experimental design to prove the function find similar sequences in model organisms (e.g., yeast, C. elegans, mouse), which can be used to further study the gene compare complete genomes against each other to identify similarities and differences among organisms
Summer Bioinformatics Workshop BLAST Programs: Which One to Use? Depends on: what type of query sequence you have (nucleotide or protein) what type of database you will search against (nucleotide or protein) Most commonly used BLAST programs –blastn –blastp –blastx
Summer Bioinformatics Workshop Commonly Used BLAST Programs BLASTN –Nucleic acids against nucleic acids BLASTP –Protein query against protein database –usually better to use than nucleotide-nucleotide BLAST –...but... if we don't have a protein query sequence, what are our options? BLASTX –Translated nucleic acids against protein database –one way to do a protein BLAST search if you have a nucleotide query sequence –the BLAST program does the translating for you, in all 6 reading framesreading frames
Summer Bioinformatics Workshop Request ID: RID An RID is like a ticket number that allows you to retrieve your search results and format them in many different ways over the next 24 hours. If you've saved RIDs from your recent searches, you can enter the RIDs directly using the Retrieve results with a Request ID page, which is accessible from the bottom of the BLAST home pageRetrieve results with a Request IDBLAST
Summer Bioinformatics Workshop Search Results: Understanding the Output Reference to BLAST paper Reminders about your specific query –RID –query sequence reminder (contains the information from your FASTA def line) –what database you searched against Graphical summary –shows where the hits aligned to your query –colors indicate score range –mouse over a colored bar to see info about that hit Text summary (GI numbers and Def lines) –GI links to complete record in Entrez –Score links to pairwise alignment between your query sequence and the hit Pairwise alignments BLAST statistics for your search
Summer Bioinformatics Workshop Database Search w/ BLAST Used most often!
Summer Bioinformatics Workshop Database Search w/ BLAST Selecting a BLAST program Insert sequence Hit “BLAST” near the end of the web page In general, if you select blastn, select “Others” as your Database to search.
Summer Bioinformatics Workshop Database Search w/ BLAST RID and search status will appear RID
Summer Bioinformatics Workshop Database Search w/ BLAST Wait for your result (patiently …)
Summer Bioinformatics Workshop Database Search w/ BLAST Interpret the result –Graphic result –The black color lines are sequences that matched the least while the red lines would be sequences that matched best. In the example below, the purple color sequences are the best matches available. Source of the image:
Summer Bioinformatics Workshop Database Search w/ BLAST BLAST result Matching sequences w/ bit-score & E-value Hyperlinks to database entry for sequence Example Notes that 3e-188 means 3
Summer Bioinformatics Workshop BLAST – Statistical Evaluation E Value – The number of different alignments with scores equivalent to or better than alignment score that are expected to occur in a database search by chance. – The lower the E value, the more significant the score.
Summer Bioinformatics Workshop BLAST Steps – How It Works 1. Seeding - Prepare a list of short, fixed-length segments (words) from the query 2. Searching - Find highly similar or exact match for each word 3. Extension - Extend each match to (potentially) a longer match 4. Evaluation - Evaluate the results using E values