UniProt: Universal Protein Resource

Slides:



Advertisements
Similar presentations
Genome Annotation: A Protein-centric Perspective.
Advertisements

SWISS-PROT The SWISS-PROT database consists of sequence entries. It contains high-quality annotation, is non-redundant and cross- referenced to many other.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Protein Databases EBI – European Bioinformatics Institute
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Proteins and Protein Function Charles Yan Spring 2006.
Bioinformatics. Analysis of proteomic data. Dr Richard J Edwards 28 August 2009; CALMARO workshop. ©Gary Larson (In not much detail)
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Protein and Function Databases
UniProt - The Universal Protein Resource
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Protein Sequence Databases Computational Molecular Biology Biochem 218 – BioMedical.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
The PIR-PSD current release 78.03, November 24, 2003, contains entries. 65 proteins The PIR was established in 1984 by the National Biomedical.
Integration of PRO and UniProtKB Amherst, NY May 16, 2013 Cathy H. Wu, Ph.D. PRO-PO-GO Meeting.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
1 Protein Bioinformatics – Advances and Challenges Sona Vasudevan Peter McGarvey BY.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
UniProt Non-redundant Reference Cluster (UniRef) Databases Swiss Institute of Bioinformatics (SIB) European Bioinformatics Institute (EMBL-EBI)
Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’
Part I: Identifying sequences with … Speaker : S. Gaj Date
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
PROTEIN DATABASES. The ideal sequence database for computational analyses and data-mining: I t must be complete with minimal redundancy It must contain.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
1 The PIRSF Protein Classification System as a Basis for Automated UniProt Protein Annotation Darren A. Natale, Ph.D. Project Manager and Senior Scientist,
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Motif discovery and Protein Databases Tutorial 5.
1 EMBL Outstation — The European Bioinformatics Institute Removing redundancy in SWISS-PROT and TrEMBL.
Protein Sequence Databases for Proteomics The good, the bad & the ugly US HUPO: Bioinformatics for Proteomics Nathan Edwards – March 12, 2005.
EMBL – EBI European Bioinformatics Institute UniProt - The Universal Protein Resource Claire O’Donovan.
EBI is an Outstation of the European Molecular Biology Laboratory. EBI patent related services Jennifer McDowall Senior Scientist, EMBL-EBI 3 rd Annual.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Protein Sequence Databases for Proteomics The good, the bad & the ugly US HUPO: Bioinformatics for Proteomics Nathan Edwards – March 12, 2006.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
? Functional Site rule: tags active site, binding, other residue- specific information Functional Annotation rule: gives name, EC, other activity- specific.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Tutorial: Bioinformatics Resources ( georgetown
Web Databases for Drosophila
Organellar Proteomics: Turning Inventories into Insights
Protein databases Henrik Nielsen
VectorBase genome annotation
Demo: Protein Information Resource
Biological Sequence Databases
Sandra Orchard EMBL-EBI
UniProt: the Universal Protein Resource
Annotation: linking literature to gene products
PIR: Protein Information Resource
Introduction to Bioinformatics
Literature Data Mining and Protein Ontology Development
Tutorial: Bioinformatics Resources
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview -
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Overview of Enzyme, Protein and Network Databases
Presentation transcript:

UniProt: Universal Protein Resource Central Resource of Protein Sequence and Function International Consortium PIR at GUMC European Bioinformatics Institute Swiss Institute of Bioinformatics Unifies PIR-PSD, Swiss-Prot, TrEMBL Protein Sequence Databases http://www.uniprot.org

UniProt Databases UniParc: Comprehensive Sequence Archive with Sequence History UniRef: Non-redundant Reference Databases for Sequence Search UniProtKB: Knowledgebase with Full Classification and Functional Annotation 3-yr, $15M

UniProt Archive (UniParc) An archive for tracking protein sequences Comprehensive: All published protein sequences Non-Redundant: Merge identical sequence strings Traceable: Versioned, with ‘Active’ or ‘Obsolete’ status tag Concise: no annotation of function, species, tissue, etc. 5 million unique entries from 13 million source-database entries

UniProt Reference Clusters (UniRef) Non-Redundant Reference Clusters for Sequence Searching UniRef100 for Comprehensive Sequence Similarity Search 100% sequence identity from all species, merging sub-fragments Derived from UniProtKB – Splice variants as separate entries Additional UniParc sources (e.g. Ensembl, IPI, EMBL_WGS) Sub-fragments WGS (whole genome shotgun) UniParc since Sep 2004 Splice variants

UniProt Reference Clusters (UniRef) UniRef90/50 for Faster Searches using Reduced Data Sets UniRef90: 90% sequence identity (35% reduction from UniRef100) UniRef50: 50% sequence identity (65% reduction) Representative Sequence for cluster Release 4.4 (03/29/05) Database Size WGS (whole genome shotgun) UniParc since Sep 2004

UniProt Knowledgebase (UniProtKB) Objective: Stable, Comprehensive, Fully Classified, Richly and Accurately Annotated Describe in a single record all protein products derived from a certain gene in a given species Information Content Isoform Presentation: Alternatively Spliced Forms, Proteolytic Cleavage, and Post-Translational Modification (each with FTid) Nomenclature: Gene/Protein Names (Nomenclature Committees) Family Classification and Domain Identification: InterPro and PIRSF Functional Annotation: Function, Functional Site, Developmental Stage, Catalytic Activity, Modification, Regulation, Induction, Pathway, Tissue Specificity, Subcellular Location, Disease, Process VARSPLIC derived by alternative splicing, proteolytic cleavage, and post-translational modification own identifiers

UniProtKB Report (I)

UniProtKB Report (II) http://www.pir.uniprot.org/cgi-bin/upEntry?id=PH4H_HUMAN