A Field Guide part 2 National Center for Biotechnology Information February 14, 2006 UT-Health Science Center
GenBank Records Header The Flatfile Format Feature Table Sequence
A Typical GenBank Record LOCUS NM_019570 4279 bp mRNA linear INV 28-OCT-2004 DEFINITION Mus musculus REV1-like(S. cerevisiae)(Rev1l),mRNA ACCESSION NM_019570 VERSION NM_019570.3 GI:50811869 KEYWORDS . A Typical GenBank Record = Title
GenBank Record: Feature Table
GenBank Record: Feature Table, con’t. GenPept identifier
GenBank Record: sequence skip
Indexing for Nucleotide UID 59958365 Field Indexed Terms [primary accession] NM_001012399 [title] Bos taurus hemochromatosis (hfe), mRNA. [organism] Bos taurus [sequence length] 1168 [modification date] 2005/02/19 [properties] biomol mrna gbdiv mam srcdb refseq [accn] [orgn] [mdat] [prop]
Global Entrez Search: HFE
Entrez Nucleotide: HFE 137 records [Title] Not HFE
Smarter Query hfe[title] AND human[orgn] 42 records Curated HFE splice variants (11 total)
hfe[title] AND human[orgn] (con’t) Primary data
Preview/Index Gateway to Advanced Searches
Preview/Index
Preview/Index: Properties, srcdb
Preview/Index: Properties, srcdb …AND srcdb refseq[Properties]
Preview/Index: Properties, srcdb …AND srcdb ddbj/embl/genbank[Properties]
Database Queries #1 hfe 137 #2 hfe[title] AND human[orgn] 42 #3 #2 AND srcdb refseq[prop] 11 #4 #2 AND srcdb ddbj/embl/genbank[prop] 31 #5 #4 AND gbdiv pri[prop] 29 #4 #4 AND gbdiv est[prop] 2 Primate division gbdiv pri[prop] EST division gbdiv est[prop]
Molecule Queries #1 hfe 116 #2 hfe[title] AND human[orgn] 42 #3 #2 AND biomol mrna[prop] 29 #4 #2 AND biomol genomic[prop] 13 Genomic DNA biomol genomic[prop] cDNA biomol mrna[prop]
srcdb refseq reviewed[prop] AND transcript[title] AND variant[title] More Queries… Fields are database-specific Entrez Nucleotide Reviewed RefSeqs with transcript variants: srcdb refseq reviewed[prop] AND transcript[title] AND variant[title]
More Queries… Fields are database-specific Entrez Nucleotide Reviewed RefSeqs with transcript variants: srcdb refseq reviewed[prop] AND transcript[title] AND variant[title] Topoisomerase genes from Archaea: topoisomerase[gene name] AND archaea[organism] Entrez Gene Genes on human chromosome 2 with OMIM links 2[chromosome] AND human[organism] AND “gene omim”[filter] Membrane proteins linked to cancer: “integral to plasma membrane”[gene ontology] AND cancer[dis]
Other Entrez Databases UniGene: rat clusters that have at least one mRNA rat[organism] NOT 0[mrna count] SNP: uniquely mapped microsatellites on human chr2 microsat[SNP Class] AND 1[Map Weight] AND 2[Chromosome]) AND human[orgn] UniSTS: markers on the Genethon map of human chromosome 12 Genethon[Map Name] AND human[organism] AND 12[chromosome] Structure: structures of bacterial kinases with resolutions below 2 Å bacteria[organism] AND kinase AND 000.00:002.00[resolution]
Genome Resources Genomic Biology Genomic Biology UniGene E-PCR Map Viewer Trace Archive
Genomic Biology
Gen Biol: Gen Resources
Map Viewer – Genome Annotation Updates
Gen Biol: Gen Resources
Genome Projects: microb
Genome Projects: microb 13 Eukaryotic Genome Sequencing Projects Selected: Complete – 0, Assembly – 2, In Progress - 11
Genome Projects: microb 13 Eukaryotic Genome Sequencing Projects Selected: Complete – 0, Assembly – 2, In Progress - 11
Gen Biol: Gen Resources
Gen Biol: Gen Resources
Gen Biol: Gen Resources
Gen Biol: Gen Resources
Gen Biol: Gen Resources
Genome Resources UniGene Genomic Biology E-PCR Map Viewer Trace Archive
Gene-oriented clusters of expressed sequences UniGene Gene-oriented clusters of expressed sequences Automatic clustering using MegaBlast Each cluster represents a unique gene Informed by genome hits Information on tissue types and map locations Useful for gene discovery and selection of mapping reagents
A Cluster of ESTs query 5’ EST hits 3’ EST hits
UniGene Collections
UniGene Collections Species UniGene
UniGene Hs build 188
UniGene Cluster Hs.95351 Lipase, hormone-sensitive (LIPE)
UniGene Cluster Hs.95351
UniGene Cluster Hs.95351: expression
UniGene Cluster Hs.95351: seqs
Get Sequences web page ftp://ftp.ncbi.nih.gov/repository/UniGene/Homo_sapiens/
Genome Resources E-PCR Genomic Biology UniGene Map Viewer Trace Archive
E-PCR Genomic sequence here
Options
Results
reverse e-pcr
reverse e-pcr
reverse e-pcr
LY6G6D: lymphocyte antigen 6 complex, locus G6D reverse e-pcr Gene STS LY6G6D: lymphocyte antigen 6 complex, locus G6D
Genome Resources Map Viewer Genomic Biology UniGene E-PCR Trace Archive
List View
Human MapViewer adar
MapViewer: Human ADAR
3’ UTR 5’ UTR MV Hs ADAR
Maps & Options Maps & Options --Sequence maps-- --Cytogenetic maps-- Ab initio Assembly Repeats BES_Clone Clone NCI_Clone Contig Component CpG island dbSNP haplotype Fosmid GenBank_DNA Gene Phenotype SAGE_Tag STS TCAG_RNA Transcript (RNA) Hs_UniGene Hs_EST --Cytogenetic maps-- Ideogram FISH Clone Gene_Cytogenetic Mitelman Breakpoint Morbid/Disease --Genetic Maps-- deCODE Genethon Marshfield --RH maps-- GeneMap99-G3 GeneMap99-GB4 NCBI RH Standford-G3 TNG Whitehead-RH Whitehead-YAC Mm_UniGene Mm_EST Rn_UniGene Rn_EST Ssc_UniGene Ssc_EST Bt_UniGene Bt_EST Gga_UniGene Gga_EST Variation = SNP
Component MapViewer Gene UniGene Repeats Customize the annotation
Phenotype Variation Gene Customize the annotation
Maps & Options Maps & Options DR52/53 = alternate MHC haplotypes on chr6; hsc_tcag = alternate chr7 assembly.
Chimp ADAR Human ADAR Mouse ADAR Customize the annotation
Genome Resources Trace Archive Genomic Biology UniGene E-PCR Map Viewer Trace Archive
Trace Archive Page
Ciona savignyi Traces
Trace Archive BLAST Page Potential access to sequences NOT yet in GenBank
Basic Local Alignment Search Tool
BLAST Web Searches, 2005 200,000
Precomputed BLAST Services Nucleotide or protein: Related Sequences BLAST link: BLink Transcript clusters: UniGene Protein homologs: HomoloGene
Link to Related Sequences
Related Sequences Most similar Least similar
BLink (BLAST Link)
BLink Output Best hits 3D structures CDD-Search
Why Is BLAST So Popular? Fast Local alignments - heuristic approach based on Smith Waterman Local alignments Statistical significance - Expect value Versatile - blastn, blastp, blastx, tblastn, tblastx, rps-blast, psi-blast - www, standalone, and network clients
Global vs Local Alignment Seq 1 Seq 2 Global alignment Seq 1 Seq 2 Local alignment
Global vs Local Alignment Seq1: WHEREISWALTERNOW (16aa) Seq2: HEWASHEREBUTNOWISHERE (21aa) Global Seq1: 1 W--HEREISWALTERNOW 16 W HERE Seq2: 1 HEWASHEREBUTNOWISHERE 21 Local Seq1: 1 W--HERE 5 Seq1: 1 W--HERE 5 W HERE W HERE Seq2: 3 WASHERE 9 Seq2: 15 WISHERE 21
How BLAST Works Make lookup table of “words” for query Scan database for hits Extend alignment both directions Ungapped extensions of hits (initial HSPs) Gapped extensions (no traceback) Gapped extensions (traceback - alignment details) -X X dropoff value for gapped alignment (in bits) (zero invokes default behavior) blastn 30, megablast 20, tblastx 0, all others 15 [Integer] default = 0 =X2 (in output) -y X dropoff value for ungapped extensions in bits (0.0 invokes default behavior) blastn 20, megablast 10, all others 7 [Real] default = 0.0 =X1 (in output) -Z X dropoff value for final gapped alignment in bits (0.0 invokes default behavior) blastn/megablast 50, tblastx 0, all others 25 [Integer] =X3 (in output)
Protein Words GTQITVEDLFYNIATRRKALKN GTQ TQI QIT ITV TVE VED EDL DLF Query: Word size = 3 (default) Word size can only be 2 or 3 GTQ TQI QIT ITV TVE VED EDL DLF ... Make a lookup table of words Neighborhood Words VTV, LTV, VSV, etc. VTV 12 LTV 11 VSV 8 Neighborhood score threshold
Highest score – current score BLASTP Summary Query: IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILEV… example query words HFL 18 HFV 15 HFS 14 HWL 13 NFL 13 DFL 12 HWV 10 etc … YLS 15 YLT 12 YVS 12 YIT 10 Neighborhood words Neighborhood score threshold T (-f) =11 YLS HFL Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333 Query 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI 47 +E YA YL K F+ L +SP+ +DVNVHP+K V +++ I Drop-off score = Highest score – current score -X X dropoff value for gapped alignment (in bits) blastn 30, megablast 20, tblastx 0, all others 15
BLASTP Summary High-scoring pair (HSP) Final HSP example query words Query: IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILEV… example query words HFL 18 HFV 15 HFS 14 HWL 13 NFL 13 DFL 12 HWV 10 etc … YLS 15 YLT 12 YVS 12 YIT 10 Neighborhood words Neighborhood score threshold T (-f) =11 YLS HFL Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333 Query 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI 47 +E YA YL K F+ L +SP+ +DVNVHP+K V +++ I High-scoring pair (HSP) Gapped extension with trace back Query 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI-LEV… 50 +E YA YL K F+YLSL +SP+ +DVNVHP+K VHFL+++ I + + Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEIATSI… 337 Final HSP
Scoring Systems - Nucleotides Identity matrix A G C T A +1 –3 –3 -3 G –3 +1 –3 -3 C –3 –3 +1 -3 T –3 –3 –3 +1 [ -r 1 -q -3 ] The default values for M and N, respectively 5 and -4, having a ratio of 1.25, correspond to about 47 nucleic acid PAMs, or about 58 amino acid PAMs; an M:N ratio of 1 corresponds to 30 nucleic acid PAMs or 38 amino acid PAMs. At higher than about 40 nucleic acid PAMs, or 50 amino acid PAMs, better sensitivity at detecting similarities between coding regions is expected by performing comparisons at the amino acid level (States et al., 1991). CAGGTAGCAAGCTTGCATGTCA || |||||||||||| ||||| raw score = 19-9 = 10 CACGTAGCAAGCTTG-GTGTCA
Scoring Systems - Proteins Position Independent Matrices PAM Matrices (Percent Accepted Mutation) Derived from observation; small dataset of alignments Implicit model of evolution All calculated from PAM1 PAM250 widely used BLOSUM Matrices (BLOck SUbstitution Matrices) Derived from observation; large dataset of highly conserved blocks Each matrix derived separately from blocks with a defined percent identity cutoff BLOSUM62 - default matrix for BLAST Position Specific Score Matrices (PSSMs) PSI- and RPS-BLAST
BLOSUM62 Negative for less likely substitutions D -2 -2 1 6 C 0 -3 -3 -3 9 Q -1 1 0 0 -3 5 E -1 0 0 2 -4 2 5 G 0 -2 0 -1 -3 -2 -2 6 H -2 0 1 -1 -3 0 0 -2 8 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 A R N D C Q E G H I L K M F P S T W Y V X D F Negative for less likely substitutions Y F Positive for more likely substitutions
Position-Specific Score Matrix Serine/Threonine protein kinases catalytic loop 1 7 4 PSSM scores 5 DAF-1 abnormal DAuer Formation DAF-1, cell-surface receptor daf-1 [Caenorhabditis elegans] gi|25145039|ref|NP_499868.2
Position-Specific Score Matrix A R N D C Q E G H I L K M F P S T W Y V 435 K -1 0 0 -1 -2 3 0 3 0 -2 -2 1 -1 -1 -1 -1 -1 -1 -1 -2 436 E 0 1 0 2 -1 0 2 -1 0 -1 -1 0 0 0 -1 0 0 -1 -1 -1 437 S 0 0 -1 0 1 1 0 1 1 0 -1 0 0 0 2 0 -1 -1 0 -1 438 N -1 0 -1 -1 1 0 -1 3 3 -1 -1 1 -1 0 0 -1 -1 1 1 -1 439 K -2 1 1 -1 -2 0 -1 -2 -2 -1 -2 5 1 -2 -2 -1 -1 -2 -2 -1 440 P -2 -2 -2 -2 -3 -2 -2 -2 -2 -1 -2 -1 0 -3 7 -1 -2 -3 -1 -1 441 A 3 -2 1 -2 0 -1 0 1 -2 -2 -2 0 -1 -2 3 1 0 -3 -3 0 442 M -3 -4 -4 -4 -3 -4 -4 -5 -4 7 0 -4 1 0 -4 -4 -2 -4 -1 2 443 A 4 -4 -4 -4 0 -4 -4 -3 -4 4 -1 -4 -2 -3 -4 -1 -2 -4 -3 4 444 H -4 -2 -1 -3 -5 -2 -2 -4 10 -6 -5 -3 -4 -3 -2 -3 -4 -5 0 -5 445 R -4 8 -3 -4 0 -1 -2 -3 -2 -5 -4 0 -3 -2 -4 -3 -3 0 -4 -5 446 D -4 -4 -1 8 -6 -2 0 -3 -3 -5 -6 -3 -5 -6 -4 -2 -3 -7 -5 -5 447 I -4 -5 -6 -6 -3 -4 -5 -6 -5 3 5 -5 1 1 -5 -5 -3 -4 -3 1 448 K 0 0 1 -3 -5 -1 -1 -3 -3 -5 -5 7 -4 -5 -3 -1 -2 -5 -4 -4 449 S 0 -3 -2 -3 0 -2 -2 -3 -3 -4 -4 -2 -4 -5 2 6 2 -5 -4 -4 450 K 0 3 0 1 -5 0 0 -4 -1 -4 -3 4 -3 -2 2 1 -1 -5 -4 -4 451 N -4 -3 8 -1 -5 -2 -2 -3 -1 -6 -6 -2 -4 -5 -4 -1 -2 -6 -4 -5 452 I -3 -5 -5 -6 0 -5 -5 -6 -5 6 2 -5 2 -2 -5 -4 -3 -5 -3 3 453 M -4 -4 -6 -6 -3 -4 -5 -6 -5 0 6 -5 1 0 -5 -4 -3 -4 -3 0 454 V -3 -3 -5 -6 -3 -4 -5 -6 -5 3 3 -4 2 -2 -5 -4 -3 -5 -3 5 455 K -2 1 1 4 -5 0 -1 -2 1 -4 -2 4 -3 -2 -3 0 -1 -5 -2 -3 456 N 1 1 3 0 -4 -1 1 0 -3 -4 -4 3 -2 -5 -2 2 -2 -5 -4 -4 457 D -3 -2 5 5 -1 -1 1 -1 0 -5 -4 0 -2 -5 -1 0 -2 -6 -4 -5 458 L -3 -1 0 -3 0 -3 -2 3 -4 -2 3 0 1 1 -2 -2 -3 5 -1 -3 catalytic loop cell-surface receptor daf-1 [Caenorhabditis elegans] ; gi|25145039|ref|NP_499868.2; -j3 vs nr >blastpgp -i NP_499868.2 -d nr -j 3 -Q NP_499868.pssm
Local Alignment Statistics Expect Value E = number of database hits you expect to find by chance, ≥ S (applies to ungapped alignments) E = Kmne-S or E = mn2-S’ K = scale for search space = scale for scoring system S’ = bitscore = (S - lnK)/ln2 Lambda -- the expected increase in reliability of an alignment associated with a unit increase in alignment score. E (range 0 to infinity) can be extrapolated to an expectation over the entire database search, by converting the pairwise expectation to a probability (range 0-1) and multiplying the result by the ratio of the entire database size to the length of the database sequence: E_database = (1 - exp(-E)) D / d where D is the size of the database; d is the length of the matching database sequence; and the quantity (1 - exp(-E)) is the probability, P, corresponding to the expectation E for the pairwise comparison. As E approaches 0, E and P approach equality. Due to inaccuracy in the statistical methods as applied in the BLAST programs, whenever E and P are less than about 0.05, the two values can be practically treated as equal. Effective length: computed according to: Length_eff = MAX( Length_real - Lambda S / H , 1) where H is the relative entropy of the target and background residue frequencies (Karlin and Altschul, 1990). H : the information expected to be obtained from each pair of aligned residues in a real alignment that distinguishes the alignment from a random one. More info: The Statistics of Sequence Similarity Scores
An Alignment BLAST Cannot Make 1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACCACGCTATTCTTGCTGTTG || | || || || | || || || || | ||| |||||| | | || | ||| | 1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG 61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT | || || || ||| || | |||||| || | |||||| ||||| | | 61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT 121 GGGGTGAACAAGGTTATTTCAGGCTTGCTCGTGGTAAAAAC |||| || ||||| || || | | |||| || ||| 121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC Reason: no contiguous exact match of 7 bp.
An Alignment BLAST Can Make Score = 290 bits (741), Expect = 7e-77 Identities = 147/331 (44%), Positives = 206/331 (61%), Gaps = 8/331 (2%) Frame = +3 Solution: compare protein sequences; BLASTX BLAST 2 Sequences (blastx) output:
Other BLAST Algorithms Megablast Discontiguous Megablast PSI-BLAST PHI-BLAST
Megablast: NCBI’s Genome Annotator Long alignments of similar DNA sequences Greedy algorithm Concatenation of query sequences Faster than blastn; less sensitive
MegaBLAST & Word Size Trade-off: sensitivity vs speed blastn megablast 2 3 blastp 8 28 megablast 7 11 blastn minimum default WORD SIZE Consider for each search
Discontiguous Megablast Uses discontiguous word matches Better for cross-species comparisons
Templates for Discontiguous Words W = 11, t = 16, coding: 1101101101101101 W = 11, t = 16, non-coding: 1110010110110111 W = 12, t = 16, coding: 1111101101101101 W = 12, t = 16, non-coding: 1110110110110111 W = 11, t = 18, coding: 101101100101101101 W = 11, t = 18, non-coding: 111010010110010111 W = 12, t = 18, coding: 101101101101101101 W = 12, t = 18, non-coding: 111010110010110111 W = 11, t = 21, coding: 100101100101100101101 W = 11, t = 21, non-coding: 111010010100010010111 W = 12, t = 21, coding: 100101101101100101101 W = 12, t = 21, non-coding: 111010010110010010111 W = word size; # matches in template t = template length Reference: Ma, B, Tromp, J, Li, M. PatternHunter: faster and more sensitive homology search. Bioinformatics March, 2002; 18(3):440-5
Discontiguous (Cross-species) MegaBLAST
Discontiguous Word Options Non-affine gap scores: gap open penalty = 0; gap extension penalty = match/2 - mismatch (mismatch < 0). Non-affine gapping parameters tend to yield alignments with more gaps, but the gap lengths are shorter. The latter formula is applied automatically if -G and -E command line parameters are both set to 0.
Disco. Megablast Example . . . Query: NM_078651 Drosophila melanogaster CG18582-PA (mbt) mRNA, (3244 bp) /note= mushroom bodies tiny; synonyms: Pak2, STE20, dPAK2 Database: nr (nt), Mammalia[orgn] MegaBLAST = “No significant similarity found.” Discontiguous megaBLAST = numerous hits . . .
Ex: Discontiguous MegaBLAST
Ex: BLASTN
PSI-BLAST Position-specific Iterated BLAST Example: Confirming relationships of purine nucleotide metabolism proteins
PSI-BLAST E value cutoff for PSSM >gi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE (ADENOSINE MAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGF VIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVD EQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAY RTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGA VRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKK 0.005 E value cutoff for PSSM
RESULTS: Initial BLASTP Same results as protein-protein BLAST; different format
Results of First PSSM Search Other purine nucleotide metabolizing enzymes not found by ordinary BLAST
Tenth PSSM Search: Convergence Just below threshold, another nucleotide metabolism enzyme Check to add to PSSM
PHI-BLAST [GA]xxxxGK[ST] >gi|231729|sp|P30429|CED4_CAEEL CELL DEATH PROTEIN 4 MLCEIECRALSTAHTRLIHDFEPRDALTYLEGKNIFTEDHSELISKMSTRLERIANFLRIYRRQASE LIDFFNYNNQSHLADFLEDYIDFAINEPDLLRPVVIAPQFSRQMLDRKLLLGNVPKQMTCYIREYHV IKKLDEMCDLDSFFLFLHGRAGSGKSVIASQALSKSDQLIGINYDSIVWLKDSGTAPKSTFDLFTDI LKSEDDLLNFPSVEHVTSVVLKRMICNALIDRPNTLFVFDDVVQEETIRWAQELRLRCLVTTRDVEI ASQTCEFIEVTSLEIDECYDFLEAYGMPMPVGEKEEDVLNKTIELSSGNPATLMMFFKSCEPKTFEK [GA]xxxxGK[ST]
What’s New?
BLAST Databases nr = nr Nucleotide refseq_rna = NM_*, XM_* refseq_genomic = NC_*, NG_* env_nt environmental sample[filter], e.g., 16S rRNA Protein refseq = NP_*, XP_* env_nr nr = nr
New Formatter Select lower case Select red
BLAST Output: Alignments & Filter low complexity sequence filtered
BLAST Output: CDS Feature
Advanced Options Limit to Organism Example Entrez Queries all[filter] NOT ma Example Entrez Queries all[Filter] NOT mammalia[Organism] ray finned fishes[Organism] srcdb refseq[Properties] Nucleotide only: biomol mrna[Properties] biomol genomic[Properties] OtherAdvanced –e 10000 expect value -v 2000 descriptions -b 2000 alignments -e 10000 -v 2000
Genome BLAST
Genome BLAST via Map Viewer
Example: Human Genome BLAST TGCCTCCTTTGGTGAAGGTGACACATCATGTGACCTCTTCAGTGACCACTCTACGGTGTCGGGCCTTGAACTACTACCCCCAGAAC ATCACCATGAAGTGGCTGAAGGATAAGCAGCCAATGGATGCCAAGGAGTTCGAACCTAAAGACGTATTGCCCAATGGGGATGGGAC CTACCAGGGCTGGATAACCTTGGCTGTACCCCCTGGGGAAGAGC Human EST
Human Genome BLAST: Results
Human Genome BLAST: MapViewer Entrez Gene
Example: Mapping Oligos Onto a Genome ? >forward CCATGGCGACCCTGGAAAAGC >reverse CAGCAGCGGCTGTGCCTGCGG ? ?
Map Oligos Onto Genome forward primer reverse primer -W 7 –e 1000 >CCATGGCGACCCTGGAAAAGCNNNNNNNNNNCAGCAGCGGCTGTGCCTGCGG forward primer reverse primer -W 7 –e 1000
Genome BLAST Results
Primer Alignments reverse primer forward primer
MapViewer
MapViewer
Sequence View (sv) forward reverse
Service Addresses BLAST blast-help@ncbi.nlm.nih.gov General Help info@ncbi.nlm.nih.gov Wayne Matten matten@ncbi.nlm.nih.gov