BLAST
Similarity and Homology Similarity is a measure of “sameness”. It is expressed as a percentage, and it does not imply any reasons for the observed sameness, it is simply a measure of the observed likeness. Homology is an evolutionary term used to describe relationship via descent from a common ancestor. Homologous things are often similar, but not always, for example the flipper of a whale and your arm, or the DNA sequence for Actin in humans and chickens. Homology is NEVER expressed as a percent, either things being compared are related or they are not. Similarity is not homology, things may be % similar, but they are either homologous or not.
Similarity and Homology Sequence homology can be reliably inferred from statistically significant similarity over a majority of the sequence length. Non-homology CANNOT be inferred from non-similarity because non-similar things can still share a common ancestor. Homologous proteins share common structures, but not necessarily common sequence or function.
What is BLAST? Basic Local Alignment Search Tool It is a sequence database search program It tries to match a query sequence with each of a target database sequences Produces local alignments: only a portion of each sequence is aligned Uses statistical theory to determine if a match might have occurred by chance
blastx blastn blastp tblastx tblastn Translated Protein Sequence In 6 frames blastx Translated Protein Sequence Nucleotide Sequence Protein blastn blastp tblastx tblastn Nucleotide DB Protein DB Translated DB (contain amino acid sequences) In 6 frames
BLAST at NCBI
NCBI BLAST Databases Peptide Sequence Databases nr: All non-redundant GenBank CDS translations + RefSeq Proteins + PDB + SwissProt + PIR + PRFrefseq RefSeq: protein sequences from NCBI's Reference Sequence Project. Swissprot: Last major release of the SWISS-PROT protein sequence database (no updates). Pat: Proteins from the Patent division of GenPept. PDB: Sequences derived from the 3-dimensional structure from Brookhaven Protein Data Bank. Month: All new or revised GenBank CDS translation+PDB+SwissProt+PIR+PRF released in the last 30 days. env_nr: Protein sequences from environmental samples. Nucleotide Sequence Databases nr: All GenBank + RefSeq Nucleotides + EMBL + DDBJ + PDB sequences (excluding HTGS0,1,2, EST, GSS, STS, PAT, WGS). No longer "non-redundant". refseq_rna: RNA entries from NCBI's Reference Sequence project refseq_genomic: Genomic entries from NCBI's Reference Sequence project Est: Database of GenBank + EMBL + DDBJ sequences from EST Divisions est_human: Human subset of est. est_mouse: Mouse subset. est_others: Non-Mouse, non-Human subset of est gss: Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences. htgs: Unfinished High Throughput Genomic Sequences: phases 0, 1 and 2 (finished, phase 3 HTG sequences are in nr) Pat: Nucleotides from the Patent division of GenBank. Pdb: Sequences derived from the 3-dimensional structure from Brookhaven Protein Data Bank Month: All new or revised GenBank + EMBL + DDBJ + PDB sequences released in the last 30 days.dbsts Database of GenBank+EMBL+DDBJ sequences from STS Divisions . Chromosome: A database with complete genomes and chromosomes from the NCBI Reference Sequence project.. Wgs: A database for whole genome shotgun sequence entries.env_nt Nucleotide sequences from environmental samples.
Graphical Overview The graphical overview shows the database hits aligned underneath the query sequence (top red bar). Also on this slide is information about the query and the database searched as well as a link to TaxBlast.
Using a filter (SEG) on a query.
http://www.ncbi.nlm.nih.gov/blast/producttable.shtml
What do you need for running BLAST ? Blastable database or formatted database which can be queried. Query sequence Query parameter
Making your own BLAST DB Any sequence file of fasta formatted sequences can be turned into a BLAST DB. How you do this depends on which BLAST variant you are using. NCBI BLAST-protein DB: formatdb -p T –i myseqfile NCBI BLAST-nucleotide DB: formatdb -p F –i myseqfile
Command line BLAST blastall -p blastp -d formatteddb -i myseq -o myseq.blastp
PSI BLAST PSI stands for Position Specific Iterated. This search method makes use of a profile, which is a position-specific accounting of what amino acid residues are found in a family of aligned homologous proteins. PSI-blast accepts a protein sequence as input and first conducts a normal blast search to identify homologues in the database. A profile is constructed from the spectrum of sequences found in the initially identified homologues. This profile is used as the search key to identify more distant relatives. The process is then iterated, each time refining the profile based on inclusion of the new members. Ideally, the process is expected to converge on a unique set of genes
PHI-BLAST Pattern Hit Initiated BLAST PHI-BLAST expects as input a protein query sequence and a pattern contained in that sequence. PHI-BLAST searches the specified database for other protein sequences that also contain the input pattern and have significant similarity to the query sequence in the vicinity of the pattern occurrences. PHI-BLAST is integrated with Position-Specific Iterated BLAST (PSI-BLAST), so that the results of a PHI-BLAST query can be used to initiate one or more rounds of PSI-BLAST searching. By filling in the "regular expression" box on the PSI-blast page, you can execute a PHI-blast search. PHI-blast enforces the presence of a motif in addition to the usual PSI-blast criteria for matching. An example of a regular expression is W-x(9,11)-[VFY]-[FYW]-x(6,7)-[GSTNE]-[GSTQCR]-[FYW]-x(2)-P. This means a W followed by 9 to 11 of anything, followed by one of the residues V, F, or Y, etc.
BLAST Assignment http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml After reading the tutorial go to basic BLAST input a sequence and run BLAST Go to advanced BLAST page and use the same input sequence – change the parameters and see if there is any change in output Go to PSI BLAST tutorial page follow the tutorial and proceed to PHI blast search.
BLAST: Ian Korf, and M. Yandell O’Reilly Publishing