Download presentation
Presentation is loading. Please wait.
1
Doug Davis Plant Science Division Univ. of Missouri 6/26/06
How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06
2
Lab Premise Bioinformatics research is typically web-based
Access to necessary URLs may be hampered by need for administrator permissions Solution: Standalone BLAST (you will be provided a CD containing all necessary files at the lab’s conclusion)
3
Lab Goals See where BLAST fits into the larger scheme of bioinformatics Demonstrate installation of a standalone BLAST server on a Windows XP PC (should also work on a Windows 2000 PC) Gain initial familiarity with available standalone BLAST parameters
4
Bioinformatics Defined
Study of biological questions using computers in place of traditional labware (e.g. test tubes, pH meters, electrophoretic equipment) Dependent on databases containing molecular data generated over many decades Millions of sequences are in these databases; best of all, tools like BLAST can search for sequences in such large databases very rapidly
5
What Is BLAST? BLAST is a program that searches for similarities among molecular sequences- works with nucleic acids and proteins It performs local (as opposed to global) alignments using a special set of scoring matrices It calculates statistical significance for any matches it finds (allows you to evaluate the degree of similarity) a very powerful tool for characterizing unknown sequences by using sequence alignments to known sequences
6
The usual way BLAST is employed…
Requires an active internet connection to visit websites where molecular databases reside (e.g. you have a lot of flexibility working over the web (many different databases and informatics tools can be rapidly accessed) You specify a target database to be searched using the website’s BLAST server You upload the query sequences (these are the sequences you want to learn more about) to a web-BLAST server; then these sequences are compared by the BLAST alignment algorithm to all sequences in the specified target database
7
BLAST Session Setup Target database sequences This database contains many sequences which are al- ready characterized, these are the “knowns” Query sequence(s) These are sequences you want to know more about. Consider them as “unknowns”. BLAST program If BLAST detects a match between query sequences and database sequences, this indicates some meaningful relationship between the aligned sequences.
8
Here’s how the BLAST session looks in
“Command Prompt” (this is the program you will use in Windows to run BLAST):
9
Here’s the “Hit Table” Output from a BLAST
Session- the Hit Table format is a stripped-down BLAST output
10
Hit Table Format of BLAST Output
The output report fields are outlined here # BLASTN [May ] # Query: 5221 sequences # Database: maize_genes.txt # Fields: Query, Subject, %ID, AlignLngth, Mismatch, Gaps, Qry_start, Qry_end, Subj_start, Subj_end, e-val, bit_score CK TC CF TC e CF TC CK TC e CF TC e
11
BLAST Report Field Explanations
mismatches- number of nucleotides that don’t match over the length of the aligned portion gaps- a confusing field, as these can be caused both by truncation of sequence or when there are multiple, contiguous mismatches in the middle of an alignment- then the matching algorithm introduces a gap into the alignment e-value- a statistic which indicates the probability of recovering the sequence of interest, given the size of the database searched; it is strongly influenced by the size of the database searched bit score- a probability statistic which takes the size of the searched database into account (high scores indicate strong alignments); unaffected by the size of the database searched
12
Default BLAST Output: Graphical Alignment
of Query Sequence to Subject Sequence in the Target Database (nucleotide-nucleotide) Query= gi| |gb|CK |CK zmrsub1_0B a11.s4 zmrsub1 Zea mays cDNA 3', mRNA sequence (609 letters) Score E Sequences producing significant alignments: (bits) Value TC UP|Q9LLI2_MAIZE (Q9LLI2) Cellulose synthase-8, complete >TC UP|Q9LLI2_MAIZE (Q9LLI2) Cellulose synthase-8, complete Length = 3931 Score = 32.2 bits (16), Expect = 0.34 Identities = 22/24 (91%) Strand = Plus / Plus Query: 531 cgaggcggaggacgccgtcgacga 554 ||||| |||||||| ||||||||| Sbjct: 519 cgaggaggaggacggcgtcgacga 542
13
How Does BLAST Make the Alignments?
C O E L A N T H P 1 2 I 3 4 Answer: Local Alignment is based on the “Smith-Waterman Algorithm” the local alignment produced by this algorithm is: ELACAN ELICAN
14
How to Calculate Smith-Waterman Matrix Values
Matches are assigned a value of +1, mismatches are -1, gaps (where there is no character to try matching with in one of the sequences) are also assigned a value of -1 Calculate the match score: sum of the score in the preceeding diagonal cell plus the gap penalty (+1 if no gap, -1 if there is a gap) Calculate the horizontal gap score: sum of the cell to the left plus the gap penalty Calculate the vertical gap score: sum of the cell above plus the gap penalty The maximum score is never less than 0.
15
What Types of Questions Can BLAST Be Used to Answer?
Find genes in a genomic sequence Predict a protein’s function Predict the 3-D structure of a protein Identify members of gene/protein families
16
Why install a Standalone Copy of BLAST?
You don’t need administrator permissions to run it Easier to control the output format (you aren’t stuck with what the website decides you should have) More user control (easier to construct custom BLAST queries)
17
Flow of Events in a BLAST Session
create a file that contains the query sequences create a blank file that will receive the BLAST output format the target database (protein or nucleic acid) submit the BLAST job using the command prompt review the BLAST output; formulate new hypothesis
18
BLAST Installation Details: Part 1
Insert the provided CD and locate the file named “ncbi.ini” (this file contains the path to the BLAST\data subfolder) Click the “Start” button on your desktop, then click on “My Computer”, then click on the C:\ drive Open the WINDOWS, WINNT, or WINDOWS NT folder and drag the ncbi.ini file into either of these folders
19
BLAST Installation Details:Part 2
Go to C:\Program Files Drag the BLAST folder on your CD into the C:\Program Files folder- be careful to not place it inside another folder that resides in C:\Program Files. Open the BLAST folder and click the file named “blast ia32-win32” to install the BLAST application
20
BLAST Installation Details: Part 3
Drag the .txt file “maize_genes” from the CD into the “C:\Program Files\BLAST\data” folder Create and save a blank text (.txt) file named “query_seqs” in the “C:\Program Files\BLAST\data” folder Open the .txt file named “Install_Lab_seqs” from the CD, and copy the contents; paste these into the file “query_seqs” then save the file Create and save a .txt file named “output” in the “C:\Program Files\BLAST\data” folder- this file will receive the BLAST output
21
BLAST Installation Details: Part 4
Move the following files from the “C:\Program Files\BLAST\bin” folder into the “C:\Program Files\BLAST\data” folder: “formatdb”, “blastall”, “blastclust”, and “megablast” (these are the “executable” files you will need to make BLAST run) Click Start, select “All Programs”, then select “Accessories”; click the “Command Prompt” icon to open a “command line” session
22
Get Ready to BLAST Type the following in at the command prompt: “formatdb –i maize_genes.txt –p F –o F” (this command will format the target database, maize_genes.txt, so that it can be searched by BLAST)
23
Using Standalone BLAST
At the command prompt, type the following: C:\Program Files\BLAST\data>megablast -i query_seqs.txt -d maize_genes.txt -o output.txt -F "m D" -D 3 Press the Enter button, then BLAST will start processing the commands When the program terminates (you will get a new command prompt), open the output.txt file to inspect the results.
24
Different Types of BLAST
There are 5 types of BLAST available: megaBLAST: very rapid (~12-fold faster than BLASTN), DNA query against DNA databases BLASTN: same set-up as megaBLAST, slower, but more options for query construction BLASTP: protein used to search protein database BLASTX: translated DNA search of protein database TBLASTN: protein used to search translated DNA database TBLASTX: DNA translated in all 6 frames versus a translated DNA database We’ll look more at these this afternoon
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.