Presentation is loading. Please wait.

Presentation is loading. Please wait.

NCBI Field Guide NCBI Molecular Biology Resources November 2008 NCBI Databases.

Similar presentations


Presentation on theme: "NCBI Field Guide NCBI Molecular Biology Resources November 2008 NCBI Databases."— Presentation transcript:

1 NCBI Field Guide NCBI Molecular Biology Resources November 2008 NCBI Databases

2 NCBI Field Guide The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH –Establish public databases –Research in computational biology –Develop software tools for sequence analysis –Disseminate biomedical information Bethesda,MD

3 NCBI Field Guide Web Access: www.ncbi.nlm.nih.gov

4 NCBI Field Guide NCBI Databases and Services GenBank primary sequence database Free public access to biomedical literature –PubMed free Medline (3 million searches per day) –PubMed Central full text online access Entrez integrated molecular and literature databases BLAST highest volume sequence search service (100 – 200 K searches per day) VAST structure similarity searches Software and Databases

5 NCBI Field Guide Types of Databases Primary Databases –Original submissions by experimentalists –Content controlled by the submitter Examples: GenBank, SNP, GEO Derivative Databases –Built from primary data –Content controlled by third party (NCBI) Examples: Refseq, TPA, RefSNP, UniGene, NCBI Protein, Structure, Conserved Domain

6 NCBI Field Guide NCBI Nucleotide Sequences Primary GenBank / EMBL / DDBJ 149,949,987 Derivative RefSeq 3,457,825 Third Party Annotation 6,378 PDB 9,021 Total 153,423,040

7 NCBI Field Guide What is GenBank? NCBI’s Primary Sequence Database Nucleotide only sequence database Archival in nature –Historical –Reflective of submitter point of view (subjective) –Redundant GenBank Data –Direct submissions (traditional records) –Batch submissions (EST, GSS, STS) –ftp accounts (genome data) Three collaborating databases –GenBank –DNA Database of Japan (DDBJ) –European Molecular Biology Laboratory (EMBL) Database

8 NCBI Field Guide EBI GenBank DDBJ EMBL EMBL Entrez SRS getentry NIG CIB NCBI NIH Submissions Updates Submissions Updates Submissions Updates International Sequence Database Collaboration

9 NCBI Field Guide GenBank: NCBI’s Primary Sequence Database Records 46,108,952 Whole Genome Shotgun Bases 136,085,973,423 Bases 97,381,682,336 Records 96,400,790 Total Records Total Bases 142,509,742 233,467,655,759 October 2008Release 168 ftp.ncbi.nih.gov/genbank/ full release every two months incremental updates daily available only via ftp full release every two months incremental updates daily available only via ftp

10 NCBI Field Guide The Growth of GenBank November 2008 Doubling time 12-14 months GenBank Release: 97 billion bases WGS: 136 billion bases

11 NCBI Field Guide Organization of GenBank: Traditional Divisions Records are divided into 18 Divisions. 12 Traditional 6 Bulk Traditional Divisions: Direct Submissions (Sequin and BankIt) Accurate Well characterized PRI Primate PLN Plant and Fungal BCT Bacterial and Archeal INV Invertebrate ROD Rodent VRL Viral VRT Other Vertebrate MAM Mammalian PHG Phage SYN Synthetic (cloning vectors) ENV Environmental Samples UNA Unannotated Entrez query: gbdiv_xxx[Properties]

12 NCBI Field Guide Organization of GenBank: Bulk Divisions Records are divided into 18 Divisions. 12 Traditional 6 Bulk BULK Divisions: Batch Submission (Email and FTP) Inaccurate Poorly characterized EST Expressed Sequence Tag GSS Genome Survey Sequence HTG High Throughput Genomic STS Sequence Tagged Site HTC High Throughput cDNA PAT Patent Entrez query: gbdiv_xxx[Properties]

13 NCBI Field Guide A Traditional GenBank Record LOCUS AF124527 2540 bp mRNA linear PLN 29-JAN-2004 DEFINITION Prunus persica ethylene receptor (ETR1) mRNA, complete cds. ACCESSION AF124527 VERSION AF124527.1 GI:6841074 KEYWORDS. SOURCE Prunus persica (peach) ORGANISM Prunus persica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids; eurosids I; Rosales; Rosaceae; Amygdaloideae; Prunus. REFERENCE 1 (bases 1 to 2540) AUTHORS Bassett,C.L., Artlip,T.S. and Callahan,A.M. TITLE Characterization of the peach homologue of the ethylene receptor, PpETR1, reveals some unusual features regarding transcript processing JOURNAL Planta 215 (4), 679-688 (2002) PUBMED 12172852 REFERENCE 2 (bases 1 to 2540) AUTHORS Bassett,C.B., Artlip,T.S. and Nickerson,M.L. TITLE Direct Submission JOURNAL Submitted (29-JAN-1999) Appalachian Fruit Research Station, USDA-ARS, 45 Wiltshire Road, Kearneysville, WV 25430, USA FEATURES Location/Qualifiers source 1..2540 /organism="Prunus persica" /mol_type="mRNA" /cultivar="Loring" /db_xref="taxon:3760" /dev_stage="III B/C fruit" gene 1..2540 /gene="ETR1" CDS 269..2485 /gene="ETR1" /codon_start=1 /product="ethylene receptor" /protein_id="AAF28893.1" /db_xref="GI:6841075" /translation="MEACNCIEPQWPADELLMKYQYISDFFIALAYFSIPLELIYFVK KSAVFPYRWVLVQFGAFIVLCGATHLINLWTFSMHSRTVAIVMTTAKVLTAVVSCATA LMLVHIIPDLLSVKTRELFLKNKAAELDREMGLIRTQEETGRHVRMLTHEIRSTLDRH TILKTTLVELGRTLALEECALWMPTRTGLELQLSYTLRQQNPVGYTVPIHLPVINQVF SSNRALKISPNSPVARMRPLAGKHMPGEVVAVRVPLLHLSNFQINDWPELSTKRYALM VLMLPSDSARQWHVHELELVEVVADQVAVALSHAAILEESMRARDLLMEQNIALDLAR REAETAIRARNDFLAVMNHEMRTPMHAIIALSSLLQETELTPEQRLMVETILKSSHLL ATLINDVLDLSRLEDGSLQLEIATFNLHSVFREVHNLIKPVASVKKLSVSLNLAADLP VQAVGDEKRLMQIVLNVVGNAVKFSKEGSISITAFVAKSESLRDFRAPEFFPAQSDNH FYLRVQVKDSGSGINPQDIPKLFTKFAQTQSLATRNSGGSGLGLAICKRFVNLMEGHI WIESEGPGKGCTAIFIVKLGFAERSNESKLPFLTKVQANHVQTNFPGLKVLVMDDNGS VTKGLLVHLGCDVTTVSSIDEFLHVISQEHKVVFMDVCMPGIDGYELAVRIHEKFTKR HERPVLVALTGNIDKMTKENCMRVGMDGVILKPVSVDKMRSVLSELLEHRVLFEAM" ORIGIN 1 gcacgagggc tcaccgagcg agctagctct tcaggagtca aggcttctgg gtgaggggaa 61 gaagaagaag cttctttgat gtgttggggt gccaatctaa agaggaagaa gaaggcctct 121 aatgtattga ggtcggctgt ctgggctgcc gatctgtgtt gaatggatag tttggtagag 181 atgcttcaac gacatagggt ggctgaaaag ggtttgaaga aagtgaagga ggaaaccaag... 2401 tatactgaaa cctgtctcag ttgataaaat gaggagtgtt ttatcagaac tgttggagca 2461 tcgagtttta tttgaggcta tgtaagatat aggaaaattg ttctagtgaa ggaaagattt 2521 aaatggaaaa aaaaaaaaaa // Header Feature Table Sequence The Flatfile Format

14 NCBI Field Guide Traditional GenBank Record ACCESSION U07418 VERSION U07418.1 GI:466461 ACCESSION U07418 VERSION U07418.1 GI:466461 Accession Stable Reportable Universal Accession Stable Reportable Universal Version Tracks changes in sequence Version Tracks changes in sequence GI number NCBI internal use GI number NCBI internal use well annotated the sequence is the data

15 NCBI Field Guide Bulk Divisions Expressed Sequence Tag –1 st pass single read cDNA Genome Survey Sequence –1 st pass single read gDNA High Throughput Genomic –incomplete sequences of genomic clones Sequence Tagged Site –PCR-based mapping reagents Batch Submission and htg (email and ftp) Inaccurate Poorly Characterized

16 NCBI Field Guide GenBank Bulk Sequence: EST poorly characterized poorly characterized

17 NCBI Field Guide Expressed Sequence Tags in Entrez Total 59 million records Human 8.1 million Mouse 4.9 million Pig2.2 million Maize2.0 million Arabidopsis1.5 million Cow1.5 million Zebrafish 1.4 million Soybean1.4 million Xenopus tropicalis1.3 million Rice1.2 million Ciona intestinalis1.2 million Wheat1.0 million Rat 1.0 million Total 59 million records Human 8.1 million Mouse 4.9 million Pig2.2 million Maize2.0 million Arabidopsis1.5 million Cow1.5 million Zebrafish 1.4 million Soybean1.4 million Xenopus tropicalis1.3 million Rice1.2 million Ciona intestinalis1.2 million Wheat1.0 million Rat 1.0 million

18 NCBI Field Guide Whole Genome Shotgun Projects ftp.ncbi.nih.gov/genbank/wgs/ >900 Projects >800 Taxa –585 Bacteria –8 Archaea –17 metagenomes –255 eukaryotes 86 fungi 89 animals 7 flowering plants >900 Projects >800 Taxa –585 Bacteria –8 Archaea –17 metagenomes –255 eukaryotes 86 fungi 89 animals 7 flowering plants

19 NCBI Field Guide Now 50 species, including… Duck-billed platypus Nine-banded armadillo Northern tree shrew Domestic rabbit Pika Guinea pig Mouse Rat Thirteen-lined ground squirrel Small-eared galago Mouse lemur Orangutan Human Chimpanzee Gorilla Rhesus macaque Tenrec African elephant Dog Cat Horse European hedgehog Eurasian shrew Little brown bat Cow Gray short-tailed opossum Now 50 species, including… Duck-billed platypus Nine-banded armadillo Northern tree shrew Domestic rabbit Pika Guinea pig Mouse Rat Thirteen-lined ground squirrel Small-eared galago Mouse lemur Orangutan Human Chimpanzee Gorilla Rhesus macaque Tenrec African elephant Dog Cat Horse European hedgehog Eurasian shrew Little brown bat Cow Gray short-tailed opossum Mammalian WGS

20 NCBI Field Guide Plant WGS

21 NCBI Field Guide Derivative Databases

22 NCBI Field Guide Entrez Protein: Derivative Database Data Source GenPept Sequences 16,076,221 RefSeq 6,035,597 Third Party Annotation 6,034 Swiss Prot 399,8106 PIR 21,703 PRF 12,079 PDB 123,996 Total 18,971,426 BLAST nr total (no patents, 1 million; no env_nr, 6 million) 7,269,299

23 NCBI Field Guide FEATURES Location/Qualifiers source 1..2484 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /chromosome="3" /map="3p22-p23" gene 1..2484 /gene="MLH1" CDS 22..2292 /gene="MLH1" /note="homolog of S. cerevisiae PMS1 (Swiss-Prot Accession Number P14242), S. cerevisiae MLH1 (GenBank Accession Number U07187), E. coli MUTL (Swiss-Prot Accession Number P23367), Salmonella typhimurium MUTL (Swiss-Prot Accession Number P14161) and Streptococcus pneumoniae (Swiss-Prot Accession Number P14160)" /codon_start=1 /product="DNA mismatch repair protein homolog" /protein_id="AAC50285.1" /db_xref="GI:463989" /translation="MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKS TSIQVIVKEGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGE ALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIA TRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRS GenPept: GenBank CDS translations >gi|463989|gb|AAC50285.1| DNA mismatch repair prote... MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV... EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD... >gi|463989|gb|AAC50285.1| DNA mismatch repair prote... MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV... EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD...

24 NCBI Field Guide Redundant Proteins >gi|741682|prf||2007430A DNA mismatch repair protei... MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV... EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD... >gi|730028|sp|P40692|MLH1_HUMAN DNA mismatch repair... MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV... EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD... >gi|463989|gb|AAC50285.1| DNA mismatch repair prote... MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV... EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD... >gi|4557757|ref|NP_000240.1| MutL protein homolog 1... MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV... EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD... >gi|13905126|gb|AAH06850.1| MutL protein homolog 1... MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV... EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD... >gi|1079787|gb|AAA82079.1| DNA mismatch repair prot... MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV... EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD... GenPept NCBI RefSeq Swiss-Prot PRF 20 Proteins Etc.

25 NCBI Field Guide Protein Sequences from Structures >gi|5542073|pdb|1B63|A Chain A, Mutl Complexed With Adpnp SHMPIQVLPPQLANQIAAGEVVERPASVVKELVENSLDAGATRIDIDIERGGAKLIRIRDNGCGIKKDEL ALALARHATSKIASLDDLEAIISLGFRGEALASISSVSRLTLTSRTAEQQEAWQAYAEGRDMNVTVKPAA HPVGTTLEVLDLFYNTPARRKFLRTEKTEFNHIDEIIRRIALARFDVTINLSHNGKIVRQYRAVPEGGQK ERRLGAICGTAFLEQALAIEWQHGDLTLRGWVADPNHTTPALAEIQYCYVNGRMMRDRLINHAIRQACED KLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQSRLVHDFIYQGVLSVLQ >gi|5542073|pdb|1B63|A Chain A, Mutl Complexed With Adpnp SHMPIQVLPPQLANQIAAGEVVERPASVVKELVENSLDAGATRIDIDIERGGAKLIRIRDNGCGIKKDEL ALALARHATSKIASLDDLEAIISLGFRGEALASISSVSRLTLTSRTAEQQEAWQAYAEGRDMNVTVKPAA HPVGTTLEVLDLFYNTPARRKFLRTEKTEFNHIDEIIRRIALARFDVTINLSHNGKIVRQYRAVPEGGQK ERRLGAICGTAFLEQALAIEWQHGDLTLRGWVADPNHTTPALAEIQYCYVNGRMMRDRLINHAIRQACED KLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQSRLVHDFIYQGVLSVLQ

26 NCBI Field Guide RefSeq: NCBI’s Derivative Sequence Database Curated transcripts and proteins –reviewed –human, mouse, rat, fruit fly, zebrafish, arabidopsis microbial genomes (proteins), and more Model transcripts and proteins Assembled Genomic Regions (contigs) –human genome –mouse genome –rat genome Chromosome records –Human genome –microbial –organelle ftp://ftp.ncbi.nih.gov/refseq/release / srcdb_refseq[Properties] – chicken – honeybee – sea urchin

27 NCBI Field Guide Genomes: Two Paths NCBI Eukaryotic Genomes –Since 1999 –Map Viewer –UniGene –HomoloGene –Contigs, Transcripts and Proteins Microbial Genomes Outside Eukaryotic Genomes (Plants, Fungi) –Since 1993 –Comparative Proteomics Clusters of Orthologous Groups (COGs) Protein Clusters –Chromosomes and Proteins

28 NCBI Field Guide Selected RefSeq Accession Numbers mRNAs and Proteins NM_123456 Curated mRNA NP_123456 Curated Protein NR_123456 Curated non-coding RNA XM_123456 Predicted mRNA XP_123456 Predicted Protein XR_123456 Predicted non-coding RNA Gene Records NG_123456 Reference Genomic Sequence Chromosome NC_123455 Microbial replicons, organelle genomes, human chromosomes Assemblies NT_123456 Contig NW_123456 WGS Supercontig

29 NCBI Field Guide Two Paths to RefSeq NC_003075 Arabidopsis MLH1 Sequences Genomic Annotations NM_116983 CAB78038 Protein Transcript AJ270058 AJ270060 AL161471 AL161472  AL161595 AL161596 Human MLH1 Sequences mRNA U07343 AU127758 BC006850 NM_000249 Genomic. AC006583 AC011816. NT_022517 (36974983..37032341) NC_000003 (37009983..37067341) NCBI Annotated Genomes and Selected Model Organisms Submitted Genomes and Annotation

30 NCBI Field Guide GenBank to RefSeq: NCBI Organisms

31 NCBI Field Guide RefSeqs: Annotation Reagents Genomic DNA (NC, NT, NW) Model mRNA (XM) (XR) Curated mRNA (NM) (NR) Model protein (XP) Curated Protein (NP) Scanning.... = ? GenBank Sequences RefSeq

32 NCBI Field Guide RefSeq Benefits Non-redundancy Explicitly linked nucleotide and protein sequences Updates to reflect current sequence data and biology Data validation Format consistency Distinct accession series Stewardship by NCBI staff and collaborators

33 NCBI Field Guide Mouse Assembly RefSeq Contig RefSeq Contig BAC Other GenBank Other GenBank RefSeq Transcript RefSeq Transcript UniGene Transcript UniGene Transcript

34 NCBI Field Guide Expressed Sequences UniGene GEO

35 NCBI Field Guide NCBI Expressed Sequences 62,282,583 mRNA sequences 60,705,055 GenBank (58,955,534 EST Division) 1,575,789 Reference Sequences

36 NCBI Field Guide A gene-oriented view of sequence entries MegaBlast based automated sequence clustering Now informed by genome hits Nonredundant set of gene oriented clusters Each cluster a unique gene Information on tissue types and map locations Includes known genes and uncharacterized ESTs Useful for gene discovery and selection of mapping reagents What is UniGene?

37 NCBI Field Guide EST hits: Human mRNA Thrombin mRNA 5’ EST hits 3’ EST hits

38 NCBI Field Guide Chordates Plants Invertebrates Fungi et al. UniGene

39 NCBI Field Guide Gene Catalog: Fathead Minnow MLH1Cluster Uncharacterized ESTs

40 NCBI Field Guide Associating Sequences: Human Thrombin

41 NCBI Field Guide Expression Data

42 NCBI Field Guide Other NCBI Databases Structure: imported structures (PDB) Cn3D viewer, NCBI curation CDD: conserved domain database Protein families (COGs and KOGs) Single domains (PFAM, SMART, CD) dbSNP: nucleotide polymorphism Gene: gene records Unifies LocusLink and Microbial Genomes HomoloGene: neighboring function for Gene

43 NCBI Field Guide MM MMDB: Molecular Modeling Data Base Derived from experimentally determined PDB records Value added to PDB records including: –Addition of explicit chemical graph information –Validation (secondary structure elements) –Inclusion of Taxonomy, Citation –Conversion to ASN.1 data description language Structure neighbors determined by Vector Alignment Search Tool (VAST)

44 NCBI Field Guide Cn3D 4.1: Bacillus thuringiensis Toxin

45 NCBI Field Guide VAST: Structure Neighbors Vector Alignment Search Tool For each protein chain, locate SSEs (secondary structure elements), and represent them as individual vectors. 1 2 3 4 5 6 Human IL-4 IL-4 & Leptin align the vectors

46 NCBI Field Guide Protein Domains Structural Domain –Discrete independently folding unit of a protein Conserved Domain (sequence-based) –Protein region with recognizable position-specific pattern of sequence conservation Sequence-based domains often roughly correspond to structural domains Domains often have distinct, identifiable functions

47 NCBI Field Guide NCBI’s Conserved Domain Database PSI-BLAST –based score matrices Searchable with RPS-BLAST Sources –SMART –PFAM –COGs –NCBI curated domains structure informed alignments

48 NCBI Field Guide Src Domains Four 3d domains Three conserved domains Four 3d domains Three conserved domains

49 NCBI Field Guide Structure vs Conserved Domain SH2 SH3 TyrKC SH2 Conserved phosphotyrosine binding residues

50 NCBI Field Guide NCBI’s SNP Database Primary Database and Derivative (RefSNP) Single Nucleotide Polymorphism Repeat polymorphisms Insertion-Deletion Polymorphisms 29 Species Over 46 million submissions (submitted SNPs) Over 26 million reference SNPs

51 NCBI Field Guide The Gene Database Gene Centered Information Unifies NCBI-annotated and Submitted Genomes 4.6 million records for 5,588 taxa Human40,286Sea Urchin 30,412 Chimpanzee31,570Mosquito 12,936 Mouse61,928Fruit Fly 22,722 Rat37,087C. elegans 21,185 Dog20,190Fungi355,726 Cow26, 600Green Plants 145,845 Chicken19, 936Archaea120,103 Zebrafish37, 460Bacteria2,685,548

52 NCBI Field Guide NCBI Molecular Biology Resources November 2008 Using Entrez

53 NCBI Field Guide WWW Access Entrez & BLAST

54 NCBI Field Guide Gene Homologene Entrez: Database Integration PubMed abstracts Nucleotide sequences Protein sequences 3-D Structure Word weight VAST BLAST Hard Link Neighbors Related Sequences Neighbors Related Sequences BLink Domains Neighbors Related Structures

55 NCBI Field Guide The Links Menu: Access to Neighbors and Links SNP GEO Gene PubMed Protein

56 NCBI Field Guide The Links Menu: Access to Neighbors and Links Neighbors: BLAST Link pre-computed BLAST Neighbors: BLAST Link pre-computed BLAST Neighbors: pre-computed CDD search Neighbors: pre-computed CDD search

57 NCBI Field Guide The Links Menu: Access to Neighbors and Links Neighbors Hard Links

58 NCBI Field Guide Database Searching with Entrez uUsing limits and field restriction to find human MutL homolog uLinking and neighboring with MutL uMapping SNPs onto structure

59 NCBI Field Guide Global NCBI (Entrez) Search colon cancer

60 NCBI Field Guide Global Entrez Search Results

61 NCBI Field Guide OMIM: Human Disease Genes Conserved Domain

62 NCBI Field Guide Nucleotide Sequences Nucleotide database now three parts EST: expressed sequence tags GSS: genome survey sequences Nucleotide: everything else

63 NCBI Field Guide Core Nucleotide Results with Gene Preview Gene Preview More relevant results Gene Preview More relevant results Taxonomy Filters

64 NCBI Field Guide Advanced Search Options Tabs Taxonomy filter

65 NCBI Field Guide More Precise Nucleotides Search colon cancer[Title] AND nonpolyposis[Title] AND human[Organism] AND biomol_mrna[Properties] AND srcdb_refseq[Properties]

66 NCBI Field Guide Useful Field Restrictions [Title]: Definition line in GenBank / GenPept format shown in Summary format glyceraldehyde 3 phosphate dehydrogenase[Title] [Organism]: NCBI’s taxonomy. Organizing system for molecular databases mouse[organism]; green plants[organism]; Streptomyces coelicolor[organism] [Properties]: molecule type, location, database source biomol_mrna[properties]; biomol_genomic[properties]; gene_in_mitochondrion[properties]; srcdb pdb[properties] [Filter]: subsets of data, Entrez links all[filter]; nucleotide mapview[filter]; nucleotide omim[filter]

67 NCBI Field Guide Entrez Tip: Start Searches in Gene UniGene Other Entrez DBs BLink Homologene: Gene Neighbors

68 NCBI Field Guide Gene Results nonpolyposis colon cancer AND human[Organism]

69 NCBI Field Guide Precise Results MLH1[Gene Name] AND Human[Organism] NCBI Taxonomy

70 NCBI Field Guide Organism Field: NCBI’s Taxonomy All molecular databases All molecular databases

71 NCBI Field Guide MLH1 Gene Record

72 NCBI Field Guide MLH1 Gene Record: Interactions and GO

73 NCBI Field Guide MLH1 Sequences

74 NCBI Field Guide MLH1 Gene Record: Sequences

75 NCBI Field Guide MLH1:Links to Sequence

76 NCBI Field Guide Gene Table: Genomic Sequences

77 NCBI Field Guide Finding Protein Homologs

78 NCBI Field Guide BLink: BLAST Link Gene Protein

79 NCBI Field Guide BLink: BLAST Link (Best Hits) Redundant Proteins BLAST Tomato homolog

80 NCBI Field Guide Finding Polymorphisms Gene Links Protein Links

81 NCBI Field Guide GeneView: Variations Human MLH1 ATPase domain

82 NCBI Field Guide MLH1 Structure Model and Mapping Polymorphisms

83 NCBI Field Guide Related Structures: Structure Model

84 NCBI Field Guide Sequence Similar Structures Conserved Domain Conserved Domain Link to Structure Link to Alignment

85 NCBI Field Guide E. coli MutL Structure Cn3D viewer Conserved Domain

86 NCBI Field Guide Alignment Based Model : Mapping Polymorphisms Ile - Val Mg 2+ binding site

87 NCBI Field Guide Better Model: Conserved Domain Protein Related Structures Gene

88 NCBI Field Guide Better Model: Conserved Domain Mg 2+ binding site Ile – Val Position 32 Ile – Val Position 32


Download ppt "NCBI Field Guide NCBI Molecular Biology Resources November 2008 NCBI Databases."

Similar presentations


Ads by Google