1 EMBL Outstation — The European Bioinformatics Institute Added-Value Proteome Databases: SWISS-PROT, TrEMBL, InterPro
2 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data: The Integrative Approach of SWISS-PROT + TrEMBL
3 EMBL Outstation — The European Bioinformatics Institute Times are changing
4 EMBL Outstation — The European Bioinformatics Institute ‘Data Waves’ F Biological sequences F Mutation F Metabolism F Polymorphism F Signaling F Expression F Size F Complexity F Integration
5 EMBL Outstation — The European Bioinformatics Institute The Challenge of the Genome Era F rapidly growing amounts of data lacking experimental determination of the biological function enhances the need for computational analyses of the data
6 EMBL Outstation — The European Bioinformatics Institute Need for Bioinformatics
7 EMBL Outstation — The European Bioinformatics Institute Bioinformatics: 5 years ago..... F Pharmaceutical companies were not interested F Life scientists believed that it was an outlet for failed biologists who like to play with computers F Computer scientists did not even know of its existence
8 EMBL Outstation — The European Bioinformatics Institute Bioinformatics: today..... F Pharmaceutical companies believe that it is a way to streamline the drug discovery process F Some life scientists believe that it is the solution to all problems in life sciences F Computer scientists find it most useful as a new way to get grants
9 EMBL Outstation — The European Bioinformatics Institute Bioinformatics: In 5 years..... F Pharmaceutical companies use it routinely complementary to experimental work F Life scientists use it efficiently and therefore forget that it exists F Computer scientists have jumped on another hot subject
10 EMBL Outstation — The European Bioinformatics Institute Bioinformatics F is a complement but no substitute of experimental research: it can help to plan experiments, but not replace experiments F is not cheap F takes a significant amount of time to be any good F Quality control is crucial: Some garbage in, a lot of garbage out!
11 EMBL Outstation — The European Bioinformatics Institute Materials and Methods F Materials: biological data F Methods: a wide range of computational techniques
12 EMBL Outstation — The European Bioinformatics Institute Essential in Bioinformatics: Databases as a tool for computational analysis and data- mining (with SWISS-PROT being the gold-standard)
13 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT F is a curated protein sequence data bank established in July 1986 by Amos Bairoch in Geneva and maintained collaboratively with EMBL since June 1987 F contains currently protein sequence entries
14 EMBL Outstation — The European Bioinformatics Institute Essential criteria for a sequence data bank ¶ it must be complete with minimal redundancy · it must contain as much up-to-date information as possible on each sequence ¸ all the information items must be retrievable by computer programs in a consistent manner ¹ it should be integrated (cross-referenced) with other sequence related data banks
15 EMBL Outstation — The European Bioinformatics Institute Integration with other databases F SWISS-PROT entries F abstracted from > references F linked by > direct pointers to 30 related or specialized data collections
16 EMBL Outstation — The European Bioinformatics Institute Integration with other databases F EMBL Nucleotide Sequence Database F PDB F Genomic databases (FlyBase, SubtiList, MaizeDB, EcoGene, LISTA, SGD, StyGene) F 2D-Gel databases (ECO2DBASE, SWISS- 2DPAGE, Aarhus/Ghent, YEPD, Harefield) F Specialized collections (OMIM, PROSITE, ENZYME, GCRDB, Transfac, HSSP)
17 EMBL Outstation — The European Bioinformatics Institute Connections between databases
18 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT Growth
19 EMBL Outstation — The European Bioinformatics Institute Nucleotide sequence database growth
20 EMBL Outstation — The European Bioinformatics Institute The Bottleneck: Annotation
21 EMBL Outstation — The European Bioinformatics Institute Annotation consists of the description of: F Function(s) of the protein F Post-translational modification(s) F Domains and sites F Secondary structure F Quaternary structure F Similarities to other proteins F Disease(s) associated with deficiencie(s) in the protein F Sequence conflicts, variants, etc.
22 EMBL Outstation — The European Bioinformatics Institute Annotation sources: F publications that report new sequence data F review articles to periodically update the annotation of families or groups of proteins F external experts
23 EMBL Outstation — The European Bioinformatics Institute TrEMBL F is a Computer-annotated supplement to SWISS-PROT F consists of entries in SWISS-PROT format F translations of CDS in the Nucleotide Sequence Database not in SWISS-PROT
24 EMBL Outstation — The European Bioinformatics Institute August 1998: SWISS-PROT 36 + TrEMBL 7 F CDS in corresponding EMBL release F SWISS-PROT entries F CDS integrated in SWISS-PROT F the remaining CDS were merged whenever possible to reduce redundancy
25 EMBL Outstation — The European Bioinformatics Institute TrEMBL release 7 F TrEMBL entries F amino acids F linked by > direct pointers to F 14 related or specialized data collections
26 EMBL Outstation — The European Bioinformatics Institute The Production of TrEMBL ¶ translation and entry creation · sorting the entries ¸ post-processing the SP-TrEMBL entries
27 EMBL Outstation — The European Bioinformatics Institute Translation and entry creation ¶ translation of every CDS not yet cross-referenced to SWISS-PROT · parsing of information in EMBL entries into TrEMBL entries
28 EMBL Outstation — The European Bioinformatics Institute Sorting the entries F into SP-TrEMBL and REM-TrEMBL F SP-TrEMBL is split in taxonomic divisions
29 EMBL Outstation — The European Bioinformatics Institute Post-processing ¶ reducing redundancy · enhancing the information content
30 EMBL Outstation — The European Bioinformatics Institute Improving Automatic Annotation F will streamline flow into TrEMBL F will bring TrEMBL nearer to SWISS- PROT quality F will make the transition from TrEMBL to SWISS- PROT easier
31 EMBL Outstation — The European Bioinformatics Institute Demands on a system for automated data analysis and annotation F Correctness F Scalability F Updateable F Low level of redundant information F Completeness F Standardized vocabulary
32 EMBL Outstation — The European Bioinformatics Institute Standardized transfer of annotation from characterized proteins in SWISS-PROT to TrEMBL entries F TrEMBL entry is reliably recognized by a given method as a member of a certain group of proteins F corresponding group of proteins in SWISS-PROT shares certain annotation F common annotation is transferred to the TrEMBL entry and flagged as annotated by similarity
33 EMBL Outstation — The European Bioinformatics Institute Environment for Distributed Information Transfer to TrEMBL (EDITtoTrEMBL) F RuleBase F Analyzers F Dispatchers
34 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL
35 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: RuleBase F SWISS-PROT as source of annotation: correctness and controlled vocabulary F Rules can be semi-automatically and/or manually created F Rules can be updated
36 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: Analyzers F Directly implement an algorithm or communicate with external programs F Query other databases F Use rules to add information to TrEMBL entries
37 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: Examples of Analyzers F sequence analysis tools (PROSITE, PFAM, PRINTS, TM, Coiled Coils, Signal etc) F sequence similarity searching (FASTA, SW, BLAST) F database scanning/parsing (MGD, FlyBase, ENZYME, etc)
38 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: Dispatchers F Control of annotation flow F Error checking F Removal of redundant information
39 EMBL Outstation — The European Bioinformatics Institute Automated post-processing of TrEMBL entries F redundancy removal: affects currently around 20% of the entries F improvements of annotation: affects currently around 25% of the entries
40 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT + TrEMBL F complete and up-to-date protein sequence collection F minimal redundancy: SP_TR_NRDB F linked by > direct pointers to 30 related or specialized data collections F deeper integration between the EMBL Nucleotide Sequence Database and SWISS- PROT + TrEMBL by using PID numbers
41 EMBL Outstation — The European Bioinformatics Institute Integrated resource of Protein domain and functional sites (InterPro) F Integration of different pattern recognition methods (PROSITE, PRINTS and PFAM) F Incorporation of new families and domains into InterPro F Enhancing the functional annotation of TrEMBL entries F Enhancing genome annotation
42 EMBL Outstation — The European Bioinformatics Institute The InterPro project participants F Co-ordinated by EBI (R. Apweiler) F PROSITE (A. Bairoch, P. Bucher) F PRINTS (T. Attwood) F PFAM (R. Durbin, E. Birney, A. Bateman, E. Sonnhammer) F PRODOM (D. Kahn) F PRATT (I. Jonassen) F GENE-IT (J.-J. Codani) F LION bioscience AG (R. Schneider)
43 EMBL Outstation — The European Bioinformatics Institute : SWISS-PROT ceased to be in the public domain
44 EMBL Outstation — The European Bioinformatics Institute What has changed F No changes for academic users F Almost no restrictions on the redistribution of SWISS-PROT by academic servers or software companies F Commercial users are required to pay yearly subscription fees. These fees will be used to complement the existing grants in order to provide stable long-term funding
45 EMBL Outstation — The European Bioinformatics Institute Credits SWISS-PROT at EBI F Rolf Apweiler F Sergio Contrino F Wolfgang Fleischmann F Gill Fraser F Henning Hermjakob F Viv Junker F Alexander Kanapin F Youla Karavidopoulou F Evguenia Kriventseva F Fiona Lang F Claire O'Donovan F Michele Magrane F Maria Jesus Martin F Nicoletta Mitaritonna F Steffen Moeller F Evgenui Zdobnov Collaborators F Amos Bairoch F Jean-Jacques Codani F Keith Tipton F Marvin Edelman F Compugen F Paracel F Sue Povey and Julia White F MGD F Flybase F Neil Rawlings F Network of > 200 external experts
46 EMBL Outstation — The European Bioinformatics Institute Take-home message: F Bioinformatics is not essential for biologists, since 2 months in the lab can easily save you an afternoon at the computer