Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips.

Slides:



Advertisements
Similar presentations
On line (DNA and amino acid) Sequence Information Lecture 7.
Advertisements

Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Biological databases.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Protein Databases EBI – European Bioinformatics Institute
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
How to use the web for bioinformatics Ethan Strauss X 1171
URL: European Bioinformatics Institute (EMBL-EBI) Swiss Institute of Bioinformatics (SIB) Protein Information Resource.
UniProt - The Universal Protein Resource
Bioinformatics Lecture 3 BCH 550 Arjumand Warsy. Retrieving Protein Sequences.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
On line (DNA and amino acid) Sequence Information
Bioinformatics.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Tunis, March 2007 A. Auchincloss UniProtKB and ExPASy 1 Practical exercises Answers…
Integration of PRO and UniProtKB Amherst, NY May 16, 2013 Cathy H. Wu, Ph.D. PRO-PO-GO Meeting.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Sequence Databases What are they and why do we need them.
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
Bsubt.embl complete entry in EMBL format (DNA and Features) bsubt.embl.Z bsubt.fasta complete DNA sequence in Fasta format bsubt.fasta.Z bsubt.con construct.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Biological databases Nicky Mulder:
Biological Databases By : Lim Yun Ping E mail :
Discover the UniProt Blast tool. Murcia, February, 2011Protein Sequence Databases Customize the BLAST results.
Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Biological databases an introduction By Dr. Erik Bongcam-Rudloff LCB-UU/SLU ILRI 2007 By Dr. Erik Bongcam-Rudloff LCB-UU/SLU ILRI 2007.
PROTEIN DATABASES. The ideal sequence database for computational analyses and data-mining: I t must be complete with minimal redundancy It must contain.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Wilson Leung08/2015.
1 EMBL Outstation — The European Bioinformatics Institute Removing redundancy in SWISS-PROT and TrEMBL.
EMBL – EBI European Bioinformatics Institute UniProt - The Universal Protein Resource Claire O’Donovan.
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
EBI is an Outstation of the European Molecular Biology Laboratory. EBI patent related services Jennifer McDowall Senior Scientist, EMBL-EBI 3 rd Annual.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
1 EMBL Outstation — The European Bioinformatics Institute Mus musculus - a model organism in SWISS-PROT.
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall Protein Sequence Database:
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Protein databases Henrik Nielsen
Archives and Information Retrieval
UniProt: Universal Protein Resource
Welcome to the Protein Database Tutorial
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Everything you always wanted to know about UniProtKB/Swiss-Prot… and others were not afraid to ask !

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Two main contact points:

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Some have problems finding a protein…

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Troubles finding a protein… IgG protein from Lama pacas I cannot find the IgG protein from Lama pacas in your server. “Lama pacas” = Lama guanicoe pacos (Alpaca) (Lama pacos)

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Troubles finding a protein… IgG protein from Lama pacas I cannot find the IgG protein from Lama pacas in your server. “Lama pacas” = Lama guanicoe pacos (Alpaca) (Lama pacos) 40 entries in UniProtKB (5 Swiss-Prot, 35 TrEMBL), but no IgG; 98 entries at the EMBL database, no IgG; In addition: Ig are not annotated in UniProtKB/Swiss-Prot (currently many Ig sequences are stored only in UniParc); Lama pacos is not an annotation priority.

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Model-organism oriented annotation 1.Complete microbial proteomes and plastid–encoded proteins (HAMAP) (SP131&132) 2.Human proteins and their orthologs in other mammals (HPI) (SP129) 3.Plant proteins (A.thaliana and rice) (PPAP) (SP133) 4.Fungal proteomes (FPAP) (SP134) 5.Proteomes of representative subsets of viral strains (SP135) 6.Toxins and anti-microbial peptides (ToxProt) (SP139) 7.Drosophila proteome (SP137) 8.C.elegans proteome (SP138) 9.Xenopus proteome (SP136) … Priorities shared by all organisms 1.Post-Translational Modifications (PTMs) (SP126) 2.3D structures (SP128) 3.Protein-protein interactions (SP) … UniProtKB/Swiss-Prot annotation priorities (see poster SP106)

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Troubles finding a protein… Dear Folks, I cannot find an entry for human apolipoprotein B100 in Swiss-Prot/TrEMBL. Am I doing something wrong?

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers In the annotation process, we try to add all synonyms found for a given protein/gene in the literature and other databases. In the future, our search engines will cope with dashes, Roman/Arabic figures, etc.

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Troubles finding a protein… beta-2 adrenoreceptor I am trying to locate the entry ofr the human beta-2 adrenoreceptor protein, but I don't seem to get any entries. Can you help me to locate this entry, please? The missing synonym was added

Fortaleza 31.VII.2006 UniProtKB: Questions and answers protein gi/ I could not find the information of protein gi/ Troubles finding a protein… From the NCBI documentation: => 1. restricted to GenBank (not agreed upon with EMBL and DDBJ) 2. not stable identifiers Of note, cross-references to RefSeq soon available from UniProtKB

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers … eventually they find it !

Fortaleza 31.VII.2006 UniProtKB: Questions and answers help #12995 This is my new question: DO all the Swiss-Prot proteins of human and Arabidopsis have CDS nucleotide sequences in database? DO all the Swiss-Prot proteins of human and Arabidopsis have CDS nucleotide sequences in database? What should I do to get them ?

From EMBL to TrEMBL CDS

From EMBL to TrEMBL

CDS From EMBL to TrEMBL Ref.

CDS From EMBL to UniProtKB/TrEMBL Ref.

Fortaleza 31.VII.2006 UniProtKB: Questions and answers 8’133 UniProtKB/Swiss-Prot entries In the current UniProt release (8.4 – 25-Jul-2006), there are 8’133 UniProtKB/Swiss-Prot entries without cross-references to EMBL/GenBank/DDBJ (over a total of 230’133 entries – 3.5%).

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers help #12995 This is my new question: What should I do to get them ? DO all the Swiss-Prot proteins of human and Arabidopsis have CDS nucleotide sequences in database? What should I do to get them ?

Fortaleza 31.VII.2006 UniProtKB: Questions and answers MAPKAKK3 is still a TREMBL entry (since 1996) I found that the UNIPROT entry for human MAPKAKK3 is still a TREMBL entry (since 1996) and could not be found in SWISSPROT. Is there a specific reason why certain entries do not enter the SWISSPROT section and get an'correct UNIPROT ID' ?

Fortaleza 31.VII.2006 UniProtKB: Questions and answers MAPKAKK3 is still a TREMBL entry (since 1996) I found that the UNIPROT entry for human MAPKAKK3 is still a TREMBL entry (since 1996) and could not be found in SWISSPROT. Is there a specific reason why certain entries do not enter the SWISSPROT section and get an'correct UNIPROT ID' ? MAPKAKK3 is not a valid gene name; the corresponding TrEMBL entry was not found and could not be annotated. Please use the update request form (or cite accession numbers)!

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Is there a specific reason why certain entries do not enter the SWISSPROT section I found that the UNIPROT entry for human MAPKAKK3 is still a TREMBL entry (since 1996) and could not be found in SWISSPROT. Is there a specific reason why certain entries do not enter the SWISSPROT section and get an'correct UNIPROT ID' ?

UniProtKB: From TrEMBL to Swiss-Prot and ~60 uperannotators at SIB and EBI supported by a dedicated programming team

Sequence merge & analysis High performance bioinformatics tools UniProtKB: From TrEMBL to Swiss-Prot

Sequencing errors ? Polymorphisms ? Alternative splicing ? Alternative initiation ? Usage of an alternative promoter ? RNA editing ? Sequence annotation Selenocysteine ? Fragment ? Same gene ? 1 gene / 1 species = 1 Swiss-Prot entry -> Annotation and documentation of all the differences

Sequence merge & analysis High performance bioinformatics tools Literature information (>1’700 journals cited) Databases and external scientific expertise In order to avoid redundancy, once manually annotated and integrated into Swiss-Prot, the entry is deleted from TrEMBL X Annotation and sequence check UniProtKB: From TrEMBL to Swiss-Prot

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Dear Curator, One of two phosphorylation sites, ser 176 described by us in the paper is not listed in the expasy web site. I am the main author of the paper describing two new phopshorylation sites for human growth hormone (P01241) published in Proteomics 4: (2004). One of two phosphorylation sites, ser 176 described by us in the paper is not listed in the expasy web site. If the curator simply missed the site, please make the necessary update. If ser 176 was not included in the table feature for other reasons, please let us know.

Fortaleza 31.VII.2006 UniProtKB: Questions and answers The reference has been added… … and the modifications described

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Searching UniProtKB/Swiss-Prot I wish to retrieve separately, all the bacteria and viruses protein sequences with virulence factors I wish to retrieve separately, all the bacteria and viruses protein sequences with virulence factors, but what I manage to get when i type "virulence" as a keyword are all the protein sequences with virulence as a keyword. Are the sequences i got here only from bacterial and virus? Any other organisms have this virulence factors? How could I specified the sequences,based on viral and bacterial virulense factors? I ll be really appreciated if you could help me. Thank you.

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Currently: Sequence Retrieval System (SRS)

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers (PR#6943) Dear Sir/Madame, I have a question concerning selection of data from UniProt protein database. I wonder if there are any examples of two or more protein entries, which concern exactly the same protein of two or more individuals representing the same species. In other words, I would like to know, if each protein of a given species is represented by exactly one amino acids sequence. If there are some proteins of a given species which are represented by more than one amino acids sequence, which line of the entry should I use to group such entries together?

Fortaleza 31.VII.2006 UniProtKB: Questions and answers One Swiss-Prot entry  All protein products encoded by one gene in one species (including fragments, variations/polymorphisms, splice variants, sequencing errors…) UniProtKB/Swiss-Prot is non-redundant:

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Post-translational modifications (PTMs) alternative promoter usage alternative splicing mRNA editing etc. ~ 100’000 human transcripts ~ 25’000 human genes (with polymorphisms) ~ 1'000'000 human proteins Increase in complexity Genome Transcriptome Proteome

- 13 sequences (complete or partial) - derived from mRNA (n=6) or genomic DNA (n=7)

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Multiple alignment of the C-terminus of available GCR sequences Annotation of the sequence differences Alternative splicing ? Polymorphism ? Disease mutation ? Sequencing error (frameshift) ? Sequencing error (conflict) ? RNA editing ?

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Multiple alignment of C-terminus of the available GCR sequences

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Where to find the annotation about alternative splicing in UniProtKB/Swiss-Prot ?

Fortaleza 31.VII.2006 UniProtKB: Questions and answers View « by default » on the ExPASy server Sequence description (Feature Table) Sequence (SQ) Cross-references (DR) Keywords (KW) References (RN, RP, RC, RX, RA, RL) Identifier & accession nr. (ID, AC, DT) Comments (CC) Protein and gene names Taxonomy (DE, GN, OC, OS, OG)

Fortaleza 31.VII.2006 UniProtKB: Questions and answers View « by default » on the ExPASy server Sequence description (Feature Table) Sequence (SQ) Cross-references (DR) Keywords (KW) References (RN, RP, RC, RX, RA, RL) Comments (CC) Identifier & accession nr. (ID, AC, DT) Protein and gene names Taxonomy (DE, GN, OC, OS, OG)

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

… P04150 (GCR_HUMAN)

Fortaleza 31.VII.2006 UniProtKB: Questions and answers All the alternative sequences are available for Blast searches and protein identification tools (on the ExPASy server).

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers Homo sapiens Currently in UniProtKB/Swiss-Prot, for Homo sapiens, 14’445 entries (~ as many genes) 7’975 alternative splicing isoforms -> 22’420 human sequences described not taking into account other diversity generating events…

Fortaleza 31.VII.2006 UniProtKB: Questions and answers How to download the sequences ?

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers

Fortaleza 31.VII.2006 UniProtKB: Questions and answers We ask you to send us the example Deacetylase for chitin and its price. Dear Sirs, We need deacetylase for the following purposes: 1. Deacetylation of fiber obtained from chitin. 2. Chitin deacetylation for obtaining chitosan oligosaccahrides. Evidently, it will be different types of deacetylase, because in case of the fiber decrease of molecular weight is not allowed, while in case of chitin deacetylation it is allowable and even desirable for oligomerisation of the product during deacetylation. We ask you to send us the example Deacetylase for chitin and its price. Dear, Could you inform me the price and delivery time ? At this moment I am looking for : bovine TGF beta1 I saw in web that you have this product with part# P18341 Could you inform me the price and delivery time ? And if bioinformatics is not funded properly, we could start a new business…