Download presentation
Presentation is loading. Please wait.
Published byLucy Gaines Modified over 9 years ago
1
central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank, PDB, 2D- PAGE, OMIM, TAIR, FlyBase, InterPro, PROSITE, etc.) avoid redundancy improve sequence reliability In order to avoid redundancy and improve sequence reliability, all protein sequences encoded by a given gene are merged into a single entry (on average: 1 human entry -> more than 6 cross-references to EMBL). Differences found between merged entries are documented. Evidence on protein existence are provided. sources of data Our main sources of data are publications (~1 ’ 900 journals cited), external scientific expertise and high- performance bioinformatics tools. Swiss-Prot Swiss-Prot (55.5, June 2008) 389’046 entries / 11’419 species Bacteria/Archae 777 proteomes Homo sapiens 19’804entries Other mammals 42’674 entries Plants 22’919 entries Virus 12’283 entries TrEMBL TrEMBL (38.5, June 2008) 5’906’286 entries / 165’662 species Swiss-Prot + TrEMBL give access to all publicly available protein sequences. Once in Swiss-Prot, an entry is no more in TrEMBL. Highlights of an UniProtKB/Swiss-Prot entry in the UniProt view format UniProtKB/Swiss-Prot is the manually annotated section of the UniProt knowledgebase. Manual annotation consists of a critical review of experimentally proven or predicted data about each protein, including the protein sequence. Data are continuously updated by an expert team of biologists. biological events which generate protein diversity A special emphasis is laid on the annotation of biological events which generate protein diversity but are not always predictable at the genomic level. Alternative products (alternative splicing, RNA editing…) and post- translational modifications are extensively annotated. In mammals, polymorphisms (SAPs) and strain differences are also integrated. GenBank/DDBJ/EMBL, Ensembl and other protein ressources UniProt Knowledgebase (UniProtKB) Annotation priorities complete microbial proteomes, plastid – encoded proteins, human and mammalian orthologous proteins, plant proteins (A.thaliana and rice), fungal proteomes, proteome of representative subsets of strains of virus, toxins and anti-microbial peptides, Drosophila, Zebrafish, Xenopus, and C.elegans proteomes … UniProtKB/Swiss-Prot - the manually annotated section of the UniProt Knowledgebase - - the manually annotated section of the UniProt Knowledgebase - provides a link between protein sequences and state-of-the-art knowledge www.uniprot.org … We need your feedback ! help@uniprot.org UniProtKB/Swiss-Prot provides a link between protein sequences and state-of-the-art knowledge UniProt Consortium Swiss Institute of Bioinformatics, European Bioinformatics Institute, Protein Information Resource www.uniprot.org UniProtKB/TrEMBL Unreviewed protein sequences Automatic annotation UniProtKB/Swiss-Prot Reviewed protein sequences Manual annotation: sequence accuracy, no redundancy, high quality annotation, numerous cross-references …
2
UniRef UniParc UniProt Knowledgebase Gives access to archived protein sequences, found in publicly accessible databases (UniProtKB, PIR, EMBL, Ensembl, IPI, PDB, RefSeq, FlyBase, WormBase, Patent Offices…) UniParc allows the tracking of a protein sequence and its integration into various databases. One UniRef100 entry groups identical sequences (including fragments). One UniRef90 entry groups sequences that have at least 90% or more identity -> database size reduction of ~ 40%. One UniRef50 entry groups sequences that are at least 50 % identical -> database size reduction of ~ 65%. Clustering across species. Three collections of sequence clusters (UniRef100, UniRef90, UniRef50) based on UniProtKB and selected UniParc records UniRef is useful for comprehensive BLAST similarity searches by providing sets of representative sequences. Use with caution: also contains pseudogenes, incorrect CDS predictions, etc. Gives access to publicly available protein sequences with a maximum of biological information. UniProtKB is composed of two sections: UniProtKB/TrEMBL and UniProtKB/Swiss-Prot UniProtKB/TrEMBL Unreviewed protein sequences - Computer annotated entries - 5’906’286 entries (Rel. 38.5, June 2008): Available protein sequences are automatically integrated into TrEMBL with: Merge of 100% identical sequences derived from the same organism, Protein family and domain attribution (InterPro), Automated annotation. UniProtKB/Swiss-Prot Reviewed protein sequences - Manually annotated entries - 389’046 entries (Rel. 55.5, June 2008) TrEMBL sequences are manually integrated into Swiss-Prot. This process involves: Merge of all variant sequences derived from the same gene in a single species (polymorphisms, alternative splicing, RNA editing, etc.): low redundancy and high accuracy of the protein sequence; Integration of biological and medical data derived from publications, external expertise, as well as high-performance bioinformatic tools, etc.:high- quality manual annotation; Addition of cross-references to relevant databases: links to about 100 databases are available: central hub for biological data. UniProt The Universal Protein Resource One UniParc entry groups identical sequences across species. Each entry contains a protein sequence, taxonomic data and cross- references to source databases. Swiss Institute of Bioinformatics (SIB) European Bioinformatics Institute (EMBL-EBI) Protein Information Resource (PIR) UniProt is mainly supported by the National Institutes of Health (NIH) grant 2 U01 HG02712-04. Additional support for the EBI's involvement in UniProt comes from the European Commission (EC)'s FELICS grant (021902RII3) and from the NIH grant 1R01HGO2273-01. UniProtKB/Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the Federal Office of Education and Science. PIR activities are also supported by the NIH grants and contracts HHSN266200400061C, NCI-caBIG, and 1R01GM080646-01, and the National Science Foundation (NSF) grant IIS-0430743. UniMES UniProt Metagenomic and Environmental Sequences Currently the database contains only data from the Global Ocean Sampling Expedition (GOS). UniMES is released in FASTA format together with an UniMES matches to InterPro method file. The UniProt Consortium The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. UniProt provides four databases, each optimized for different uses:UniProtKB, UniRef, UniParc and UniMES. UniProt is produced by SIB, EBI and PIR. UniMes Metagenomic UniParc UniParc Sequence archive EMBL/GenBank/DDBJ, Ensembl, VEGA, RefSeq, other protein resources UniRef Sequence clusters Expert manual annotation UniProtKB/TrEMBL Unreviewed Automated annotation UniProtKB/Swiss-Prot Reviewed UniProtKB Protein sequence knowledgebase help@uniprot.org Contact: help@uniprot.org www.uniprot.org Web site: www.uniprot.org
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.