Presentation is loading. Please wait.

Presentation is loading. Please wait.

UniProt: Universal Protein Resource

Similar presentations


Presentation on theme: "UniProt: Universal Protein Resource"— Presentation transcript:

1 UniProt: Universal Protein Resource
Central Resource of Protein Sequence and Function International Consortium PIR at GUMC European Bioinformatics Institute Swiss Institute of Bioinformatics Unifies PIR-PSD, Swiss-Prot, TrEMBL Protein Sequence Databases

2 UniProt Databases UniParc: Comprehensive Sequence Archive with Sequence History UniRef: Non-redundant Reference Databases for Sequence Search UniProtKB: Knowledgebase with Full Classification and Functional Annotation 3-yr, $15M

3 UniProt Archive (UniParc)
An archive for tracking protein sequences Comprehensive: All published protein sequences Non-Redundant: Merge identical sequence strings Traceable: Versioned, with ‘Active’ or ‘Obsolete’ status tag Concise: no annotation of function, species, tissue, etc. 5 million unique entries from 13 million source-database entries

4 UniProt Reference Clusters (UniRef)
Non-Redundant Reference Clusters for Sequence Searching UniRef100 for Comprehensive Sequence Similarity Search 100% sequence identity from all species, merging sub-fragments Derived from UniProtKB – Splice variants as separate entries Additional UniParc sources (e.g. Ensembl, IPI, EMBL_WGS) Sub-fragments WGS (whole genome shotgun) UniParc since Sep 2004 Splice variants

5 UniProt Reference Clusters (UniRef)
UniRef90/50 for Faster Searches using Reduced Data Sets UniRef90: 90% sequence identity (35% reduction from UniRef100) UniRef50: 50% sequence identity (65% reduction) Representative Sequence for cluster Release 4.4 (03/29/05) Database Size WGS (whole genome shotgun) UniParc since Sep 2004

6 UniProt Knowledgebase (UniProtKB)
Objective: Stable, Comprehensive, Fully Classified, Richly and Accurately Annotated Describe in a single record all protein products derived from a certain gene in a given species Information Content Isoform Presentation: Alternatively Spliced Forms, Proteolytic Cleavage, and Post-Translational Modification (each with FTid) Nomenclature: Gene/Protein Names (Nomenclature Committees) Family Classification and Domain Identification: InterPro and PIRSF Functional Annotation: Function, Functional Site, Developmental Stage, Catalytic Activity, Modification, Regulation, Induction, Pathway, Tissue Specificity, Subcellular Location, Disease, Process VARSPLIC derived by alternative splicing, proteolytic cleavage, and post-translational modification own identifiers

7 UniProtKB Report (I)

8 UniProtKB Report (II)


Download ppt "UniProt: Universal Protein Resource"

Similar presentations


Ads by Google