Presentation is loading. Please wait.

Presentation is loading. Please wait.

NCBI FieldGuide NCBI Molecular Biology Resources Part 2 November 2008 Peter Cooper.

Similar presentations


Presentation on theme: "NCBI FieldGuide NCBI Molecular Biology Resources Part 2 November 2008 Peter Cooper."— Presentation transcript:

1 NCBI FieldGuide NCBI Molecular Biology Resources Part 2 November 2008 Peter Cooper

2 NCBI FieldGuide Genomic Resources NCBI BLAST NCBI Resources: Part 2

3 NCBI FieldGuide Genome Resources

4 NCBI FieldGuide Complete Genomes including draft assemblies, Oct 2008 Organisms: Viruses (2,187) Archaea (60) Bacteria (1,284) Eukaryotes (191) Organelles: Mitochondria (1,537) Plastids (147)

5 NCBI FieldGuide Higher Eukaryotic Genomes Oct 2008 Animals (78) –Placozoa (1) –Cnidaria (2) –Nematodes (7) –Mollusks (1) –Arthropods (23) Insects (21) Crustaceans (1) Arachnids (1) –Echinoderms (1) –Chordates (42) Fungi (57) –Ascomycetes (58) –Basidiomycetes (9) Land Plants –Angiosperms (7) –Mosses (1) metazoa[organism] OR dikarya[organism] OR streptophyta[organism] 153 Total species

6 NCBI FieldGuide Genome Resources: All Genomes

7 NCBI FieldGuide Eukaryotic Genomes Only

8 NCBI FieldGuide Microbial Genomes Only: COGs and Protein Clusters

9 NCBI FieldGuide Selected Eukaryotic Genomes

10 NCBI FieldGuide NM_000249: Genome Links

11 NCBI FieldGuide Map Viewer: MLH1 Customizable NCBI Assembly EST Hits Gene Annotations Models Transcripts Download data and sequences

12 NCBI FieldGuide Maps and Options

13 NCBI FieldGuide Mapped Variations

14 NCBI FieldGuide Synteny: Mammalian Genomes Albumin Gene Family

15 NCBI FieldGuide The New Homologene early globin gene A-chain gene B-chain gene frog A chick A mouse Amouse B chick B frog B paralogs orthologs gene duplication No longer UniGene based Protein similarities first Guided by taxonomic tree Includes orthologs and paralogs No longer UniGene based Protein similarities first Guided by taxonomic tree Includes orthologs and paralogs

16 NCBI FieldGuide Finding Homologs: HomoloGene Gene Provides Neighboring Function Gene

17 HomoloGene Cluster

18 NCBI FieldGuide Expanded Coverage: UniGene Fathead Minnow MLH1

19 HomoloGene Downloader Protein mRNA Genomic Protein mRNA Genomic

20 NCBI FieldGuide Microbial Genomes

21 NCBI FieldGuide E. coli mutL Gene Record

22 NCBI FieldGuide Entrez Genomes View

23 NCBI FieldGuide New Sequence Viewer (All Genomes)

24 NCBI FieldGuide Incipient Genome Browser

25 NCBI FieldGuide COGs Analysis E.Coli K12 Genome

26 NCBI FieldGuide Protein Clusters (Update for COGs) Genomic order

27 NCBI FieldGuide Sequence Similarity Searching Basic Local Alignment Search Tool

28 NCBI FieldGuide The Flavors of BLAST Position independent scoring –Standard BLAST traditional contiguous word hit nucleotide, protein and translations –Megablast can use discontiguous words nucleotide only optimized for large batch searches Position dependent scoring –PSI-BLAST constructs PSSMs automatically searches protein database with PSSMs –RPS BLAST searches a database of PSSMs basis of conserved domain database

29 NCBI FieldGuide Basic BLAST: Databases

30 NCBI FieldGuide BLAST Databases: Non-redundant protein nr ( non-redundant protein sequences ) –GenBank CDS translations –NP_, XP_ RefSeqs –Outside Protein PIR, Swiss-Prot, PRF PDB (sequences from structures) pat protein patents env_nr environmental samples nr ( non-redundant protein sequences ) –GenBank CDS translations –NP_, XP_ RefSeqs –Outside Protein PIR, Swiss-Prot, PRF PDB (sequences from structures) pat protein patents env_nr environmental samples Services blastp blastx

31 NCBI FieldGuide Nucleotide Databases: Human and Mouse Human and mouse genomic and transcript now default Separate sections in output for mRNA and genomic Direct links to Map Viewer for genomic sequences Megablast, blastn service

32 NCBI FieldGuide Nucleotide Databases: Traditional Services blastn tblastn tblastx

33 NCBI FieldGuide Nucleotide Databases: Traditional nr (nt) –Traditional GenBank –NM_ and XM_ RefSeqs refseq_rna refseq_genomic –NC_ RefSeqs dbest –EST Division est_human, mouse, others htgs –HTG division gss –GSS division wgs –whole genome shotgun env_nt –environmental samples Databases are mostly non-overlapping

34 NCBI FieldGuide WWW BLAST

35 NCBI FieldGuide WWW BLAST Interface

36 NCBI FieldGuide The BLAST homepage New URL: http://blast.ncbi.nlm.nih.gov/

37 NCBI FieldGuide Universal Form: Protein

38 NCBI FieldGuide Universal Form: Nucleotide Speed Sensitivity More Less More

39 NCBI FieldGuide Limiting Database: Organism Organism autocomplete

40 NCBI FieldGuide Limiting Database: Entrez Query all[filter] NOT mammals[organism] gene_in_mitochondrion[Properties] 2006:2007 [Modification Date] Nucleotide biomol_mrna[Properties] biomol_genomic[Properties] all[filter] NOT mammals[organism] gene_in_mitochondrion[Properties] 2006:2007 [Modification Date] Nucleotide biomol_mrna[Properties] biomol_genomic[Properties]

41 NCBI FieldGuide Algorithm parameters: Protein Adjust to set stringency May limit results Default statistics adjustment for compositional bias Default statistics adjustment for compositional bias Off now by default. Conflicts with comp-based stats Off now by default. Conflicts with comp-based stats Expand

42 NCBI FieldGuide Automatic Short Sequence Adjustment e-value 20000 Word Size 2 MatrixPAM30 Comp Stats Off Low Comp FilterOff Nucleotide and Protein

43 NCBI FieldGuide Algorithm parameters: Nucleotide blastn Masks species-specific interspersed repeats Essential for genomic query sequences Masks species-specific interspersed repeats Essential for genomic query sequences Prevents starting alignment in masked region Allows extensions through masked regions Prevents starting alignment in masked region Allows extensions through masked regions Masks LC sequence (simple repeats)

44 NCBI FieldGuide BLAST Formatting Options

45 NCBI FieldGuide Formatting Page (Now on Results) Alignment View Pairwise Pairwise with dots for identities Query-anchored with dots for identities Query-anchored with letters for identities Flat query-anchored with dots for identities Flat-query anchored with letters for identities Alignment View Pairwise Pairwise with dots for identities Query-anchored with dots for identities Query-anchored with letters for identities Flat query-anchored with dots for identities Flat-query anchored with letters for identities

46 NCBI FieldGuide Download Options (Now on Results) Structured Formats Saved Settings Reusable on Web Portable to Standalone PSSM Reusable on Web Portable to Standalone Standalone formatter (future)

47 NCBI FieldGuide Structured formats: XML and ASN.1 − 1 gi|730028|sp|P40692|MLH1_HUMAN − DNA mismatch repair protein Mlh1 (MutL protein homolog 1) P40692 756 − 1 1568.9 4061 0 1 756 1 756 0 756 Seq-annot ::= { desc { user { type str "Hist Seqalign", data { { label str "Hist Seqalign", data bool TRUE } } }, user { type str "Blast Type", data { { label id 0, data int 0 } } }, user { type str "BLAST database title", data { { label str "Non-redundant SwissProt Seq-annot ::= { desc { user { type str "Hist Seqalign", data { { label str "Hist Seqalign", data bool TRUE } } }, user { type str "Blast Type", data { { label id 0, data int 0 } } }, user { type str "BLAST database title", data { { label str "Non-redundant SwissProt XML ASN.1

48 NCBI FieldGuide The Hit Table # BLASTP 2.2.17 (Aug-26-2007) # Query: gi|4557757|ref|NP_000240.1| MutL protein homolog 1 [Homo sapiens] # Database: swissprot # Fields: query id, subject ids, % identity, % positives, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score # 80 hits found ref|NP_000240.1||gi|4557757 gi|1709056|sp|P38920|MLH1_YEAST 36.68 56.91 796 426 18 8 756 5 769 7e-138 491 ref|NP_000240.1||gi|4557757 gi|48474996|sp|Q9P7W6|MLH1_SCHPO 37.24 54.04 768 371 16 8 756 9 684 8e-122 437 ref|NP_000240.1||gi|4557757 gi|25090753|sp|Q8RA70|MUTL_THETN 37.44 54.62 390 231 7 8 394 4 383 5e-59 229 ref|NP_000240.1||gi|4557757 gi|25090732|sp|Q8KAX3|MUTL_CHLTE 35.95 54.05 370 229 5 8 375 4 367 5e-55 215 ref|NP_000240.1||gi|4557757 gi|127552|sp|P23367.2|MUTL_ECOLI 35.99 58.11 339 202 7 8 334 3 338 8e-55 214 ref|NP_000240.1||gi|4557757 gi|29427778|sp|Q8FAK9|MUTL_ECOL6 35.99 58.11 339 202 7 8 334 3 338 1e-54 214 ref|NP_000240.1||gi|4557757 gi|20455084|sp|Q8XDN4|MUTL_ECO57 35.99 58.11 339 202 7 8 334 3 338 1e-54 214 ref|NP_000240.1||gi|4557757 gi|59798328|sp|Q72PF7|MUTL_LEPIC 36.27 55.20 375 221 8 6 375 2 363 3e-54 213 ref|NP_000240.1||gi|4557757 gi|13431695|sp|P57886|MUTL_PASMU 35.48 58.94 341 213 6 8 345 3 339 4e-54 212 ref|NP_000240.1||gi|4557757 gi|1171080|sp|P44494|MUTL_HAEIN 35.74 59.87 319 198 6 8 323 3 317 5e-54 212 ref|NP_000240.1||gi|4557757 gi|20455102|sp|Q8ZIW4|MUTL_YERPE 36.01 58.63 336 207 6 8 339 3 334 6e-54 212 ref|NP_000240.1||gi|4557757 gi|20455152|sp|Q9JYT2|MUTL_NEIMB 33.96 55.35 374 224 8 8 376 4 359 2e-53 210 ref|NP_000240.1||gi|4557757 gi|20139217|sp|Q9KAC1|MUTL_BACHD 35.39 55.90 356 214 6 8 362 4 344 2e-53 209 ref|NP_000240.1||gi|4557757 gi|31076794|sp|Q87L05|MUTL_VIBPA 35.33 58.38 334 210 5 8 338 3 333 3e-53 209 ref|NP_000240.1||gi|4557757 gi|20455150|sp|Q9JTS2|MUTL_NEIMA 36.94 58.28 314 183 5 8 316 4 307 5e-53 209 ref|NP_000240.1||gi|4557757 gi|56749233|sp|Q6GHD9|MUTL_STAAR 38.28 58.46 337 193 7 6 335 2 330 1e-52 207 ref|NP_000240.1||gi|4557757 gi|25090739|sp|Q8NWX9|MUTL_STAAW 38.28 58.46 337 193 7 6 335 2 330 1e-52 207 ref|NP_000240.1||gi|4557757 gi|71151979|sp|Q5HGD5|MUTL_STAAC 38.28 58.46 337 193 7 6 335 2 330 1e-52 207 ref|NP_000240.1||gi|4557757 gi|54037875|sp|P65492|MUTL_STAAN 38.28 58.46 337 193 7 6 335 2 330 2e-52 207 ref|NP_000240.1||gi|4557757 gi|20043258|sp|Q9KV13|MUTL_VIBCH 35.74 58.56 333 204 6 8 335 3 330 2e-52 207 ref|NP_000240.1||gi|4557757 gi|127553|sp|P14161|MUTL_SALTY 35.10 56.93 339 205 7 8 334 3 338 3e-52 206 ref|NP_000240.1||gi|4557757 gi|20455140|sp|Q9CDL1|MUTL_LACLA 36.31 56.55 336 196 5 6 334 2 326 4e-52 206 ref|NP_000240.1||gi|4557757 gi|61214242|sp|Q7MH01|MUTL_VIBVY 34.63 58.51 335 213 5 8 339 3 334 4e-52 206 ref|NP_000240.1||gi|4557757 gi|20455099|sp|Q8Z187|MUTL_SALTI 35.10 56.93 339 205 7 8 334 3 338 4e-52 206 ref|NP_000240.1||gi|4557757 gi|31076809|sp|Q8DCV0|MUTL_VIBVU 34.63 58.51 335 213 5 8 339 3 334 6e-52 205 ref|NP_000240.1||gi|4557757 gi|71648717|sp|Q5E2C6|MUTL_VIBF1 36.71 59.81 316 186 6 8 316 3 311 1e-51 204 ref|NP_000240.1||gi|4557757 gi|37999611|sp|Q88DD1|MUTL_PSEPK 30.34 48.97 435 278 7 8 419 7 439 2e-51 203 Also available in comma separated format for Excel

49 NCBI FieldGuide PSSMs: Restart PSI-BLAST ASCII encoded, Web only ASN.1 ScoreMat, Portable

50 NCBI FieldGuide BLAST TreeView Black bear mt genome vs. RefSeq Genomic

51 NCBI FieldGuide Distance Tree Carnivore Mitochondrial Genome bears walrus fur seal sea lions true seals dogs mongooses cats red panda weasels raccoon

52 NCBI FieldGuide Genome and Specialized BLAST

53 NCBI FieldGuide Nucleotide Databases: Human and Mouse Human and mouse genomic and transcript now default Separate sections in output for mRNA and genomic Direct links to Map Viewer for genomic sequences Megablast, blastn service

54 NCBI FieldGuide Genome BLAST pages

55 NCBI FieldGuide Map Viewer Homepage

56 NCBI FieldGuide Poplar Genome BLAST

57 NCBI FieldGuide tblastn Genome BLAST Results Protein-nucleotide alignments Exons and genes mixed

58 NCBI FieldGuide Genomic Context of BLAST Hits

59 NCBI FieldGuide Hits in Map Viewer

60 NCBI FieldGuide Specialized BLAST Pages

61 NCBI FieldGuide BLAST URL API http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html

62 NCBI FieldGuide BLAST: standalone, clients, databases ftp> open ftp.ncbi.nih.gov. ftp> cd blast ftp> open ftp.ncbi.nih.gov. ftp> cd blast

63 NCBI FieldGuide Standalone BLAST C toolkit BLASTC++ toolkit BLAST BLAST E:\Blast\bin>blastall –i purf.fsa –d swissprot –p blastp BLAST+ E:\blast+\bin>blastp –query purf.fsa –db swissprot BLAST E:\Blast\bin>blastall –i purf.fsa –d swissprot –p blastp BLAST+ E:\blast+\bin>blastp –query purf.fsa –db swissprot

64 NCBI FieldGuide Service Addresses General Help info@ncbi.nlm.nih.gov BLAST blast-help@ncbi.nlm.nih.gov Telephone support: 301- 496- 2475


Download ppt "NCBI FieldGuide NCBI Molecular Biology Resources Part 2 November 2008 Peter Cooper."

Similar presentations


Ads by Google