Class 3 2009 European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute

Slides:



Advertisements
Similar presentations
Genome Annotation: A Protein-centric Perspective.
Advertisements

On line (DNA and amino acid) Sequence Information Lecture 7.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Protein Databases EBI – European Bioinformatics Institute
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Bioinformatics. Strategies for proteomics: which database? Dr Richard J Edwards 27 August 2009; CALMARO workshop.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Proteins and Protein Function Charles Yan Spring 2006.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
UniProt - The Universal Protein Resource
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
An Introduction to Bioinformatics Molecular Biology Databases.
The PIR-PSD current release 78.03, November 24, 2003, contains entries. 65 proteins The PIR was established in 1984 by the National Biomedical.
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Tunis, March 2007 A. Auchincloss UniProtKB and ExPASy 1 Practical exercises Answers…
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Biological Databases By : Lim Yun Ping E mail :
UniProt Non-redundant Reference Cluster (UniRef) Databases Swiss Institute of Bioinformatics (SIB) European Bioinformatics Institute (EMBL-EBI)
Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips.
Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results.
Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
PROTEIN DATABASES. The ideal sequence database for computational analyses and data-mining: I t must be complete with minimal redundancy It must contain.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.
Introduction to Bioinformatics and Biological databases Nicky Mulder:
1 EMBL Outstation — The European Bioinformatics Institute Removing redundancy in SWISS-PROT and TrEMBL.
EMBL – EBI European Bioinformatics Institute UniProt - The Universal Protein Resource Claire O’Donovan.
EBI is an Outstation of the European Molecular Biology Laboratory. PRIDE centric exercise: BioMart interface PRIDE team, Proteomics Services Group PANDA.
Computer Storage of Sequences
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
EBI is an Outstation of the European Molecular Biology Laboratory. EBI patent related services Jennifer McDowall Senior Scientist, EMBL-EBI 3 rd Annual.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Bioinformatics Summer School June 2011
? Functional Site rule: tags active site, binding, other residue- specific information Functional Annotation rule: gives name, EC, other activity- specific.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall Protein Sequence Database:
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD.
Protein databases Henrik Nielsen
Archives and Information Retrieval
Biological Sequence Databases
생물정보학 Bioinformatics.
UniProt: Universal Protein Resource
UniProt: the Universal Protein Resource
Introduction to Bioinformatics
Tutorial: Bioinformatics Resources
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview -
Introduction to Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Overview of Enzyme, Protein and Network Databases
Presentation transcript:

Class European Resources Protein Focused

Protein Databases EBI – European Bioinformatics Institute

What is the difference between dealing with nucleotide DBs and protein DBs?

Protein information Name & description Gene encoded from Organism Function (only one?) Enzyme? Ligands? PTMs? Interactions? Biological processes. Structure. Sequence. Localization More...

Protein DB -short history Pre-UniProt Swiss-Prot: created in July 1986; since 1987, a collaboration of the SIB and the EMBL/EBI; TrEMBL: created at the EBI in 1996 as a computer-annotated protein sequence database supplementing Swiss-Prot. It was introduced to deal with the increased data flow from genome projects

PIR EBI SIB

The three-layered approach The UniProt Archive (UniParc) UniProtKB + all other protein sequences publicly available Completeness The UniProt Reference Clusters (UniRef) Non-redundant views of UniProtKB + selected UniParcsets Speed The UniProt Knowledgebase (UniProtKB) Central database of annotated protein sequences and functional information UniProtKB/Swiss-Prot + UniProtKB/TrEMBL

Protein DBs Swiss-Prot - manually annotated. TrEMBL – translated EMBL, automatically annotated. UniProtKB – The UniProt Knowledge UniParc – The Achieve pf UniProt PIR - Protein Information Resource UniRef – The UniProt Reference Clusters PDB – Protein Data Bank – structure PRIDE – Resource for experimental proteomics (not in this class)

Databases growth

Protein DBs Swiss-Prot - manually annotated ~100, ~400,000

. TrEMBL – translated EMBL, automatically annotated.

Protein Names Different DBs – different accessions AccessionsDB P12345TrEMBL MAPK_HUMANSwiss-Prot (to be changed..) NP_ XP_ RefSeq UniRef100_P99999 UniRef90_P99999 UniRef50_P99999 UniRef ENSP Ensembl

Protein DBs Swiss-Prot - manually annotated. TrEMBL – translated EMBL, automatically annotated. UniProtKB – The UniProt Knowledge UniParc – The Achieve pf UniProt PIR - Protein Information Resource UniRef – The UniProt Reference Clusters PDB – Protein Data Bank – structure PRIDE – Resource for experimental proteomics (not in this class)

Principles

More in UniProt a complete annotated protein sequence database The Universal Protein Resource for protein sequences.UniProt A non-redundant archive of protein sequences extracted from public databases and contains only protein sequences. UniProt Archive Features clustering of similar sequences to yield a representative subset of sequences. This produces very fast search times. UniProt/UniRef A repository specifically developed for metagenomic and environmental data. UniProt/UniMES

Protein DBs Swiss-Prot - manually annotated. TrEMBL – translated EMBL, automatically annotated. UniProtKB – The UniProt Knowledge UniParc – The Achieve pf UniProt PIR - Protein Information Resource UniRef – The UniProt Reference Clusters PDB – Protein Data Bank – structure PRIDE – Resource for experimental proteomics (not in this class)

How is it built?

What’s in UniProt?

EBI interface

PIR – Protein Information Resource Protein Family Classification System Integrated Protein Knowledgebase Integrated Protein Literature, Information and Knowledge

END If you got lost…(class exercise) some more slides…

EB-eye search

NCBI - Entrez