Class European Resources Protein Focused
Protein Databases EBI – European Bioinformatics Institute
What is the difference between dealing with nucleotide DBs and protein DBs?
Protein information Name & description Gene encoded from Organism Function (only one?) Enzyme? Ligands? PTMs? Interactions? Biological processes. Structure. Sequence. Localization More...
Protein DB -short history Pre-UniProt Swiss-Prot: created in July 1986; since 1987, a collaboration of the SIB and the EMBL/EBI; TrEMBL: created at the EBI in 1996 as a computer-annotated protein sequence database supplementing Swiss-Prot. It was introduced to deal with the increased data flow from genome projects
PIR EBI SIB
The three-layered approach The UniProt Archive (UniParc) UniProtKB + all other protein sequences publicly available Completeness The UniProt Reference Clusters (UniRef) Non-redundant views of UniProtKB + selected UniParcsets Speed The UniProt Knowledgebase (UniProtKB) Central database of annotated protein sequences and functional information UniProtKB/Swiss-Prot + UniProtKB/TrEMBL
Protein DBs Swiss-Prot - manually annotated. TrEMBL – translated EMBL, automatically annotated. UniProtKB – The UniProt Knowledge UniParc – The Achieve pf UniProt PIR - Protein Information Resource UniRef – The UniProt Reference Clusters PDB – Protein Data Bank – structure PRIDE – Resource for experimental proteomics (not in this class)
Databases growth
Protein DBs Swiss-Prot - manually annotated ~100, ~400,000
. TrEMBL – translated EMBL, automatically annotated.
Protein Names Different DBs – different accessions AccessionsDB P12345TrEMBL MAPK_HUMANSwiss-Prot (to be changed..) NP_ XP_ RefSeq UniRef100_P99999 UniRef90_P99999 UniRef50_P99999 UniRef ENSP Ensembl
Protein DBs Swiss-Prot - manually annotated. TrEMBL – translated EMBL, automatically annotated. UniProtKB – The UniProt Knowledge UniParc – The Achieve pf UniProt PIR - Protein Information Resource UniRef – The UniProt Reference Clusters PDB – Protein Data Bank – structure PRIDE – Resource for experimental proteomics (not in this class)
Principles
More in UniProt a complete annotated protein sequence database The Universal Protein Resource for protein sequences.UniProt A non-redundant archive of protein sequences extracted from public databases and contains only protein sequences. UniProt Archive Features clustering of similar sequences to yield a representative subset of sequences. This produces very fast search times. UniProt/UniRef A repository specifically developed for metagenomic and environmental data. UniProt/UniMES
Protein DBs Swiss-Prot - manually annotated. TrEMBL – translated EMBL, automatically annotated. UniProtKB – The UniProt Knowledge UniParc – The Achieve pf UniProt PIR - Protein Information Resource UniRef – The UniProt Reference Clusters PDB – Protein Data Bank – structure PRIDE – Resource for experimental proteomics (not in this class)
How is it built?
What’s in UniProt?
EBI interface
PIR – Protein Information Resource Protein Family Classification System Integrated Protein Knowledgebase Integrated Protein Literature, Information and Knowledge
END If you got lost…(class exercise) some more slides…
EB-eye search
NCBI - Entrez