NCBI FieldGuide NCBI Molecular Biology Resources Part 2 November 2008 Peter Cooper
NCBI FieldGuide Genomic Resources NCBI BLAST NCBI Resources: Part 2
NCBI FieldGuide Genome Resources
NCBI FieldGuide Complete Genomes including draft assemblies, Oct 2008 Organisms: Viruses (2,187) Archaea (60) Bacteria (1,284) Eukaryotes (191) Organelles: Mitochondria (1,537) Plastids (147)
NCBI FieldGuide Higher Eukaryotic Genomes Oct 2008 Animals (78) –Placozoa (1) –Cnidaria (2) –Nematodes (7) –Mollusks (1) –Arthropods (23) Insects (21) Crustaceans (1) Arachnids (1) –Echinoderms (1) –Chordates (42) Fungi (57) –Ascomycetes (58) –Basidiomycetes (9) Land Plants –Angiosperms (7) –Mosses (1) metazoa[organism] OR dikarya[organism] OR streptophyta[organism] 153 Total species
NCBI FieldGuide Genome Resources: All Genomes
NCBI FieldGuide Eukaryotic Genomes Only
NCBI FieldGuide Microbial Genomes Only: COGs and Protein Clusters
NCBI FieldGuide Selected Eukaryotic Genomes
NCBI FieldGuide NM_000249: Genome Links
NCBI FieldGuide Map Viewer: MLH1 Customizable NCBI Assembly EST Hits Gene Annotations Models Transcripts Download data and sequences
NCBI FieldGuide Maps and Options
NCBI FieldGuide Mapped Variations
NCBI FieldGuide Synteny: Mammalian Genomes Albumin Gene Family
NCBI FieldGuide The New Homologene early globin gene A-chain gene B-chain gene frog A chick A mouse Amouse B chick B frog B paralogs orthologs gene duplication No longer UniGene based Protein similarities first Guided by taxonomic tree Includes orthologs and paralogs No longer UniGene based Protein similarities first Guided by taxonomic tree Includes orthologs and paralogs
NCBI FieldGuide Finding Homologs: HomoloGene Gene Provides Neighboring Function Gene
HomoloGene Cluster
NCBI FieldGuide Expanded Coverage: UniGene Fathead Minnow MLH1
HomoloGene Downloader Protein mRNA Genomic Protein mRNA Genomic
NCBI FieldGuide Microbial Genomes
NCBI FieldGuide E. coli mutL Gene Record
NCBI FieldGuide Entrez Genomes View
NCBI FieldGuide New Sequence Viewer (All Genomes)
NCBI FieldGuide Incipient Genome Browser
NCBI FieldGuide COGs Analysis E.Coli K12 Genome
NCBI FieldGuide Protein Clusters (Update for COGs) Genomic order
NCBI FieldGuide Sequence Similarity Searching Basic Local Alignment Search Tool
NCBI FieldGuide The Flavors of BLAST Position independent scoring –Standard BLAST traditional contiguous word hit nucleotide, protein and translations –Megablast can use discontiguous words nucleotide only optimized for large batch searches Position dependent scoring –PSI-BLAST constructs PSSMs automatically searches protein database with PSSMs –RPS BLAST searches a database of PSSMs basis of conserved domain database
NCBI FieldGuide Basic BLAST: Databases
NCBI FieldGuide BLAST Databases: Non-redundant protein nr ( non-redundant protein sequences ) –GenBank CDS translations –NP_, XP_ RefSeqs –Outside Protein PIR, Swiss-Prot, PRF PDB (sequences from structures) pat protein patents env_nr environmental samples nr ( non-redundant protein sequences ) –GenBank CDS translations –NP_, XP_ RefSeqs –Outside Protein PIR, Swiss-Prot, PRF PDB (sequences from structures) pat protein patents env_nr environmental samples Services blastp blastx
NCBI FieldGuide Nucleotide Databases: Human and Mouse Human and mouse genomic and transcript now default Separate sections in output for mRNA and genomic Direct links to Map Viewer for genomic sequences Megablast, blastn service
NCBI FieldGuide Nucleotide Databases: Traditional Services blastn tblastn tblastx
NCBI FieldGuide Nucleotide Databases: Traditional nr (nt) –Traditional GenBank –NM_ and XM_ RefSeqs refseq_rna refseq_genomic –NC_ RefSeqs dbest –EST Division est_human, mouse, others htgs –HTG division gss –GSS division wgs –whole genome shotgun env_nt –environmental samples Databases are mostly non-overlapping
NCBI FieldGuide WWW BLAST
NCBI FieldGuide WWW BLAST Interface
NCBI FieldGuide The BLAST homepage New URL:
NCBI FieldGuide Universal Form: Protein
NCBI FieldGuide Universal Form: Nucleotide Speed Sensitivity More Less More
NCBI FieldGuide Limiting Database: Organism Organism autocomplete
NCBI FieldGuide Limiting Database: Entrez Query all[filter] NOT mammals[organism] gene_in_mitochondrion[Properties] 2006:2007 [Modification Date] Nucleotide biomol_mrna[Properties] biomol_genomic[Properties] all[filter] NOT mammals[organism] gene_in_mitochondrion[Properties] 2006:2007 [Modification Date] Nucleotide biomol_mrna[Properties] biomol_genomic[Properties]
NCBI FieldGuide Algorithm parameters: Protein Adjust to set stringency May limit results Default statistics adjustment for compositional bias Default statistics adjustment for compositional bias Off now by default. Conflicts with comp-based stats Off now by default. Conflicts with comp-based stats Expand
NCBI FieldGuide Automatic Short Sequence Adjustment e-value Word Size 2 MatrixPAM30 Comp Stats Off Low Comp FilterOff Nucleotide and Protein
NCBI FieldGuide Algorithm parameters: Nucleotide blastn Masks species-specific interspersed repeats Essential for genomic query sequences Masks species-specific interspersed repeats Essential for genomic query sequences Prevents starting alignment in masked region Allows extensions through masked regions Prevents starting alignment in masked region Allows extensions through masked regions Masks LC sequence (simple repeats)
NCBI FieldGuide BLAST Formatting Options
NCBI FieldGuide Formatting Page (Now on Results) Alignment View Pairwise Pairwise with dots for identities Query-anchored with dots for identities Query-anchored with letters for identities Flat query-anchored with dots for identities Flat-query anchored with letters for identities Alignment View Pairwise Pairwise with dots for identities Query-anchored with dots for identities Query-anchored with letters for identities Flat query-anchored with dots for identities Flat-query anchored with letters for identities
NCBI FieldGuide Download Options (Now on Results) Structured Formats Saved Settings Reusable on Web Portable to Standalone PSSM Reusable on Web Portable to Standalone Standalone formatter (future)
NCBI FieldGuide Structured formats: XML and ASN.1 − 1 gi|730028|sp|P40692|MLH1_HUMAN − DNA mismatch repair protein Mlh1 (MutL protein homolog 1) P − Seq-annot ::= { desc { user { type str "Hist Seqalign", data { { label str "Hist Seqalign", data bool TRUE } } }, user { type str "Blast Type", data { { label id 0, data int 0 } } }, user { type str "BLAST database title", data { { label str "Non-redundant SwissProt Seq-annot ::= { desc { user { type str "Hist Seqalign", data { { label str "Hist Seqalign", data bool TRUE } } }, user { type str "Blast Type", data { { label id 0, data int 0 } } }, user { type str "BLAST database title", data { { label str "Non-redundant SwissProt XML ASN.1
NCBI FieldGuide The Hit Table # BLASTP (Aug ) # Query: gi| |ref|NP_ | MutL protein homolog 1 [Homo sapiens] # Database: swissprot # Fields: query id, subject ids, % identity, % positives, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score # 80 hits found ref|NP_ ||gi| gi| |sp|P38920|MLH1_YEAST e ref|NP_ ||gi| gi| |sp|Q9P7W6|MLH1_SCHPO e ref|NP_ ||gi| gi| |sp|Q8RA70|MUTL_THETN e ref|NP_ ||gi| gi| |sp|Q8KAX3|MUTL_CHLTE e ref|NP_ ||gi| gi|127552|sp|P |MUTL_ECOLI e ref|NP_ ||gi| gi| |sp|Q8FAK9|MUTL_ECOL e ref|NP_ ||gi| gi| |sp|Q8XDN4|MUTL_ECO e ref|NP_ ||gi| gi| |sp|Q72PF7|MUTL_LEPIC e ref|NP_ ||gi| gi| |sp|P57886|MUTL_PASMU e ref|NP_ ||gi| gi| |sp|P44494|MUTL_HAEIN e ref|NP_ ||gi| gi| |sp|Q8ZIW4|MUTL_YERPE e ref|NP_ ||gi| gi| |sp|Q9JYT2|MUTL_NEIMB e ref|NP_ ||gi| gi| |sp|Q9KAC1|MUTL_BACHD e ref|NP_ ||gi| gi| |sp|Q87L05|MUTL_VIBPA e ref|NP_ ||gi| gi| |sp|Q9JTS2|MUTL_NEIMA e ref|NP_ ||gi| gi| |sp|Q6GHD9|MUTL_STAAR e ref|NP_ ||gi| gi| |sp|Q8NWX9|MUTL_STAAW e ref|NP_ ||gi| gi| |sp|Q5HGD5|MUTL_STAAC e ref|NP_ ||gi| gi| |sp|P65492|MUTL_STAAN e ref|NP_ ||gi| gi| |sp|Q9KV13|MUTL_VIBCH e ref|NP_ ||gi| gi|127553|sp|P14161|MUTL_SALTY e ref|NP_ ||gi| gi| |sp|Q9CDL1|MUTL_LACLA e ref|NP_ ||gi| gi| |sp|Q7MH01|MUTL_VIBVY e ref|NP_ ||gi| gi| |sp|Q8Z187|MUTL_SALTI e ref|NP_ ||gi| gi| |sp|Q8DCV0|MUTL_VIBVU e ref|NP_ ||gi| gi| |sp|Q5E2C6|MUTL_VIBF e ref|NP_ ||gi| gi| |sp|Q88DD1|MUTL_PSEPK e Also available in comma separated format for Excel
NCBI FieldGuide PSSMs: Restart PSI-BLAST ASCII encoded, Web only ASN.1 ScoreMat, Portable
NCBI FieldGuide BLAST TreeView Black bear mt genome vs. RefSeq Genomic
NCBI FieldGuide Distance Tree Carnivore Mitochondrial Genome bears walrus fur seal sea lions true seals dogs mongooses cats red panda weasels raccoon
NCBI FieldGuide Genome and Specialized BLAST
NCBI FieldGuide Nucleotide Databases: Human and Mouse Human and mouse genomic and transcript now default Separate sections in output for mRNA and genomic Direct links to Map Viewer for genomic sequences Megablast, blastn service
NCBI FieldGuide Genome BLAST pages
NCBI FieldGuide Map Viewer Homepage
NCBI FieldGuide Poplar Genome BLAST
NCBI FieldGuide tblastn Genome BLAST Results Protein-nucleotide alignments Exons and genes mixed
NCBI FieldGuide Genomic Context of BLAST Hits
NCBI FieldGuide Hits in Map Viewer
NCBI FieldGuide Specialized BLAST Pages
NCBI FieldGuide BLAST URL API
NCBI FieldGuide BLAST: standalone, clients, databases ftp> open ftp.ncbi.nih.gov. ftp> cd blast ftp> open ftp.ncbi.nih.gov. ftp> cd blast
NCBI FieldGuide Standalone BLAST C toolkit BLASTC++ toolkit BLAST BLAST E:\Blast\bin>blastall –i purf.fsa –d swissprot –p blastp BLAST+ E:\blast+\bin>blastp –query purf.fsa –db swissprot BLAST E:\Blast\bin>blastall –i purf.fsa –d swissprot –p blastp BLAST+ E:\blast+\bin>blastp –query purf.fsa –db swissprot
NCBI FieldGuide Service Addresses General Help BLAST Telephone support: