1 EMBL Outstation — The European Bioinformatics Institute Added-Value Proteome Databases: SWISS-PROT, TrEMBL, InterPro.

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
BIOINFORMATICS Ency Lee.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Protein Databases EBI – European Bioinformatics Institute
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Proteins and Protein Function Charles Yan Spring 2006.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
UniProt - The Universal Protein Resource
Bioinformatics Lecture 3 BCH 550 Arjumand Warsy. Retrieving Protein Sequences.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Joint EBI-Wellcome Trust Summer School June 2010.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
Bsubt.embl complete entry in EMBL format (DNA and Features) bsubt.embl.Z bsubt.fasta complete DNA sequence in Fasta format bsubt.fasta.Z bsubt.con construct.
Secondary Databases Ansuman sahoo Roll: Y Bioinformatics Class Presentation 30 Jan 2013.
Biological Databases By : Lim Yun Ping E mail :
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Organizing information in the post-genomic era The rise of bioinformatics.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Protein Database David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
1 EMBL Outstation — The European Bioinformatics Institute Automatic and Reliable Functional Annotation of Proteins.
1 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL Automated High-Quality Sequence Annotation Steffen Möller, Ulf Leser, Wolfgang Fleischmann,
Protein and RNA Families
Biological databases an introduction By Dr. Erik Bongcam-Rudloff LCB-UU/SLU ILRI 2007 By Dr. Erik Bongcam-Rudloff LCB-UU/SLU ILRI 2007.
PROTEIN DATABASES. The ideal sequence database for computational analyses and data-mining: I t must be complete with minimal redundancy It must contain.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.
1 EMBL Outstation — The European Bioinformatics Institute Removing redundancy in SWISS-PROT and TrEMBL.
Copyright OpenHelix. No use or reproduction without express written consent1.
EMBL – EBI European Bioinformatics Institute UniProt - The Universal Protein Resource Claire O’Donovan.
Bioinformatics and Computational Biology
Computer Storage of Sequences
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
1 EMBL Outstation — The European Bioinformatics Institute Mus musculus - a model organism in SWISS-PROT.
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
1 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data.
Protein databases Henrik Nielsen
Archives and Information Retrieval
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Predicting Active Site Residue Annotations in the Pfam Database
InterPro An Introduction
Introduction to Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

1 EMBL Outstation — The European Bioinformatics Institute Added-Value Proteome Databases: SWISS-PROT, TrEMBL, InterPro

2 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data: The Integrative Approach of SWISS-PROT + TrEMBL

3 EMBL Outstation — The European Bioinformatics Institute Times are changing

4 EMBL Outstation — The European Bioinformatics Institute ‘Data Waves’ F Biological sequences F Mutation F Metabolism F Polymorphism F Signaling F Expression F Size F Complexity F Integration

5 EMBL Outstation — The European Bioinformatics Institute The Challenge of the Genome Era F rapidly growing amounts of data lacking experimental determination of the biological function enhances the need for computational analyses of the data

6 EMBL Outstation — The European Bioinformatics Institute Need for Bioinformatics

7 EMBL Outstation — The European Bioinformatics Institute Bioinformatics: 5 years ago..... F Pharmaceutical companies were not interested F Life scientists believed that it was an outlet for failed biologists who like to play with computers F Computer scientists did not even know of its existence

8 EMBL Outstation — The European Bioinformatics Institute Bioinformatics: today..... F Pharmaceutical companies believe that it is a way to streamline the drug discovery process F Some life scientists believe that it is the solution to all problems in life sciences F Computer scientists find it most useful as a new way to get grants

9 EMBL Outstation — The European Bioinformatics Institute Bioinformatics: In 5 years..... F Pharmaceutical companies use it routinely complementary to experimental work F Life scientists use it efficiently and therefore forget that it exists F Computer scientists have jumped on another hot subject

10 EMBL Outstation — The European Bioinformatics Institute Bioinformatics F is a complement but no substitute of experimental research: it can help to plan experiments, but not replace experiments F is not cheap F takes a significant amount of time to be any good F Quality control is crucial: Some garbage in, a lot of garbage out!

11 EMBL Outstation — The European Bioinformatics Institute Materials and Methods F Materials: biological data F Methods: a wide range of computational techniques

12 EMBL Outstation — The European Bioinformatics Institute Essential in Bioinformatics: Databases as a tool for computational analysis and data- mining (with SWISS-PROT being the gold-standard)

13 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT F is a curated protein sequence data bank established in July 1986 by Amos Bairoch in Geneva and maintained collaboratively with EMBL since June 1987 F contains currently protein sequence entries

14 EMBL Outstation — The European Bioinformatics Institute Essential criteria for a sequence data bank ¶ it must be complete with minimal redundancy · it must contain as much up-to-date information as possible on each sequence ¸ all the information items must be retrievable by computer programs in a consistent manner ¹ it should be integrated (cross-referenced) with other sequence related data banks

15 EMBL Outstation — The European Bioinformatics Institute Integration with other databases F SWISS-PROT entries F abstracted from > references F linked by > direct pointers to 30 related or specialized data collections

16 EMBL Outstation — The European Bioinformatics Institute Integration with other databases F EMBL Nucleotide Sequence Database F PDB F Genomic databases (FlyBase, SubtiList, MaizeDB, EcoGene, LISTA, SGD, StyGene) F 2D-Gel databases (ECO2DBASE, SWISS- 2DPAGE, Aarhus/Ghent, YEPD, Harefield) F Specialized collections (OMIM, PROSITE, ENZYME, GCRDB, Transfac, HSSP)

17 EMBL Outstation — The European Bioinformatics Institute Connections between databases

18 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT Growth

19 EMBL Outstation — The European Bioinformatics Institute Nucleotide sequence database growth

20 EMBL Outstation — The European Bioinformatics Institute The Bottleneck: Annotation

21 EMBL Outstation — The European Bioinformatics Institute Annotation consists of the description of: F Function(s) of the protein F Post-translational modification(s) F Domains and sites F Secondary structure F Quaternary structure F Similarities to other proteins F Disease(s) associated with deficiencie(s) in the protein F Sequence conflicts, variants, etc.

22 EMBL Outstation — The European Bioinformatics Institute Annotation sources: F publications that report new sequence data F review articles to periodically update the annotation of families or groups of proteins F external experts

23 EMBL Outstation — The European Bioinformatics Institute TrEMBL F is a Computer-annotated supplement to SWISS-PROT F consists of entries in SWISS-PROT format F translations of CDS in the Nucleotide Sequence Database not in SWISS-PROT

24 EMBL Outstation — The European Bioinformatics Institute August 1998: SWISS-PROT 36 + TrEMBL 7 F CDS in corresponding EMBL release F SWISS-PROT entries F CDS integrated in SWISS-PROT F the remaining CDS were merged whenever possible to reduce redundancy

25 EMBL Outstation — The European Bioinformatics Institute TrEMBL release 7 F TrEMBL entries F amino acids F linked by > direct pointers to F 14 related or specialized data collections

26 EMBL Outstation — The European Bioinformatics Institute The Production of TrEMBL ¶ translation and entry creation · sorting the entries ¸ post-processing the SP-TrEMBL entries

27 EMBL Outstation — The European Bioinformatics Institute Translation and entry creation ¶ translation of every CDS not yet cross-referenced to SWISS-PROT · parsing of information in EMBL entries into TrEMBL entries

28 EMBL Outstation — The European Bioinformatics Institute Sorting the entries F into SP-TrEMBL and REM-TrEMBL F SP-TrEMBL is split in taxonomic divisions

29 EMBL Outstation — The European Bioinformatics Institute Post-processing ¶ reducing redundancy · enhancing the information content

30 EMBL Outstation — The European Bioinformatics Institute Improving Automatic Annotation F will streamline flow into TrEMBL F will bring TrEMBL nearer to SWISS- PROT quality F will make the transition from TrEMBL to SWISS- PROT easier

31 EMBL Outstation — The European Bioinformatics Institute Demands on a system for automated data analysis and annotation F Correctness F Scalability F Updateable F Low level of redundant information F Completeness F Standardized vocabulary

32 EMBL Outstation — The European Bioinformatics Institute Standardized transfer of annotation from characterized proteins in SWISS-PROT to TrEMBL entries F TrEMBL entry is reliably recognized by a given method as a member of a certain group of proteins F corresponding group of proteins in SWISS-PROT shares certain annotation F common annotation is transferred to the TrEMBL entry and flagged as annotated by similarity

33 EMBL Outstation — The European Bioinformatics Institute Environment for Distributed Information Transfer to TrEMBL (EDITtoTrEMBL) F RuleBase F Analyzers F Dispatchers

34 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL

35 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: RuleBase F SWISS-PROT as source of annotation: correctness and controlled vocabulary F Rules can be semi-automatically and/or manually created F Rules can be updated

36 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: Analyzers F Directly implement an algorithm or communicate with external programs F Query other databases F Use rules to add information to TrEMBL entries

37 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: Examples of Analyzers F sequence analysis tools (PROSITE, PFAM, PRINTS, TM, Coiled Coils, Signal etc) F sequence similarity searching (FASTA, SW, BLAST) F database scanning/parsing (MGD, FlyBase, ENZYME, etc)

38 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: Dispatchers F Control of annotation flow F Error checking F Removal of redundant information

39 EMBL Outstation — The European Bioinformatics Institute Automated post-processing of TrEMBL entries F redundancy removal: affects currently around 20% of the entries F improvements of annotation: affects currently around 25% of the entries

40 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT + TrEMBL F complete and up-to-date protein sequence collection F minimal redundancy: SP_TR_NRDB F linked by > direct pointers to 30 related or specialized data collections F deeper integration between the EMBL Nucleotide Sequence Database and SWISS- PROT + TrEMBL by using PID numbers

41 EMBL Outstation — The European Bioinformatics Institute Integrated resource of Protein domain and functional sites (InterPro) F Integration of different pattern recognition methods (PROSITE, PRINTS and PFAM) F Incorporation of new families and domains into InterPro F Enhancing the functional annotation of TrEMBL entries F Enhancing genome annotation

42 EMBL Outstation — The European Bioinformatics Institute The InterPro project participants F Co-ordinated by EBI (R. Apweiler) F PROSITE (A. Bairoch, P. Bucher) F PRINTS (T. Attwood) F PFAM (R. Durbin, E. Birney, A. Bateman, E. Sonnhammer) F PRODOM (D. Kahn) F PRATT (I. Jonassen) F GENE-IT (J.-J. Codani) F LION bioscience AG (R. Schneider)

43 EMBL Outstation — The European Bioinformatics Institute : SWISS-PROT ceased to be in the public domain

44 EMBL Outstation — The European Bioinformatics Institute What has changed F No changes for academic users F Almost no restrictions on the redistribution of SWISS-PROT by academic servers or software companies F Commercial users are required to pay yearly subscription fees. These fees will be used to complement the existing grants in order to provide stable long-term funding

45 EMBL Outstation — The European Bioinformatics Institute Credits SWISS-PROT at EBI F Rolf Apweiler F Sergio Contrino F Wolfgang Fleischmann F Gill Fraser F Henning Hermjakob F Viv Junker F Alexander Kanapin F Youla Karavidopoulou F Evguenia Kriventseva F Fiona Lang F Claire O'Donovan F Michele Magrane F Maria Jesus Martin F Nicoletta Mitaritonna F Steffen Moeller F Evgenui Zdobnov Collaborators F Amos Bairoch F Jean-Jacques Codani F Keith Tipton F Marvin Edelman F Compugen F Paracel F Sue Povey and Julia White F MGD F Flybase F Neil Rawlings F Network of > 200 external experts

46 EMBL Outstation — The European Bioinformatics Institute Take-home message: F Bioinformatics is not essential for biologists, since 2 months in the lab can easily save you an afternoon at the computer