Download presentation
Presentation is loading. Please wait.
1
Genome Related Biological Databases
2
Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website
3
Nucleotide databases GenBankEMBL DDBJ Housed at EBI European Bioinformatics Institute www.ebi.ac.uk/embl/ Housed at NCBI National Center for Biotechnology Information www.ncbi.nlm.nih.gov/Genbank/ Housed in Japan www.ddbj.nig.ac.jp/ Welcome-e.html The underlying raw DNA sequences are identical
4
>100,000 species are represented in GenBank all species 196,538 viruses 5,214 bacteria 14,258 archaea 500 eukaryota 171,843
6
NCBI nucleotide databases GenBank Individual submissions Bulk submissions (Genome centers) High throughput sequencing (DNA) Expressed Sequence Tags (mRNA) RefSeq Curated subset of GenBank “Reference” sequence Single sequence per locus / molecule
7
Protein databases NCBI RefSeq and Protein EBI Swiss-Prot, PIR and TrEMBL → UniProt Translated from nucleotide sequence Curated Combined
8
UniProt versus GenBank and RefSeq UniProt Produced by SIB, EBI & Georgetown U. Protein data only Curated in SwissProt, not in TrEMBL GenBank/RefSeq Produced by INSDC and NCBI Protein and nucleotide data Curated in RefSeq, not in GenBank
9
Accession numbers Label to unambiguously identify a sequence Examples (all for retinol-binding protein, RBP4): protein DNA RNA X02775GenBank genomic DNA sequence NT_030059Genomic contig Rs7079946dbSNP (single nucleotide polymorphism) RBP4HUGO genenames N91759.1An expressed sequence tag (1 of 170) NM_006744RefSeq DNA sequence (from a transcript) NP_007635RefSeq protein AAC02945GenBank protein Q28369UniProt protein 1KT7Protein Data Bank structure record
10
From Sequence to Genes Gene prediction Extrinsic Search for genes based on observed mRNA / Protein sequences UniGene Ab initio Predict genes based on genomic sequence alone Promoter sequence Poly(A) tail binding sites, CpG islands, splicing sites
11
UniGene Predict genes based on ESTs EST: DNA sequence corresponding to mRNA from expressed gene ~500 base pairs long Sequenced from a cDNA library Cluster ESTs from many cDNA libraries to predict distinct genes
12
EST clusters This is a gene with 1 EST associated; the cluster size is 1 This is a gene with 10 ESTs associated; the cluster size is 10
13
Likely to be real genes
14
Gene databases Ensembl (EBI) Automatic annotation: mRNA and protein sequence Curated annotation: Vega project Entrez Gene (NCBI) Links RefSeq sequences to external annotations
15
Web sites for biological databases NCBIwww.ncbi.nlm.nih.govwww.ncbi.nlm.nih.gov EBIwww.ebi.ac.ukwww.ebi.ac.uk ENSEMBL www.ensembl.org (= at EBI)www.ensembl.org
16
NCBI website
26
PubMed
27
Ensembl website
28
Ensembl structure Gene: ENSG… Transcript: ENST… Protein: ENSP…
29
Ensembl search
30
OTTHUMGXXX (Curated) ENSGXXXX (Predicted)
31
Vega gene page
32
Ensembl gene page
34
Ensembl transcript page
35
Ensembl protein page
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.