Protein Annotation Ontology The BioSapiens Virtual Institute for Genome Annotations Janet Thornton & Gabby Reeves AFP/BioSapiens Vienna: July 07.

Slides:

Advertisements

Similar presentations

Pre-SIG meeting " Genome Annotation" A BioSapiens initiative Goal of the workshop were - to create an open forum to discuss current problems on function.

Advertisements

Genome Annotation: A Protein-centric Perspective.

Kino : Making Semantic Annotations Easier Ajith Ranabahu #, Priti Parikh #, Maryam Panahiazar #, Amit Sheth # and Flora Logan- Klumpler* # Ohio Center.

5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.

The IntAct Database Sandra Orchard & Birgit Meldal.

5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.

Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○

Gene Ontology John Pinney

The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.

Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.

Archives and Information Retrieval

Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.

Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center

IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.

Internet tools for genomic analysis: part 2

EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:

UniProt - The Universal Protein Resource

ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.

BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD

Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.

1 SRI International Bioinformatics Advanced PGDB Editing: Regulation GO Terms Ingrid M. Keseler Bioinformatics Research Group SRI International

Bioinformatics.

Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose

Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.

Bioinformatics for biomedicine

Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.

GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.

NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.

The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.

Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.

Biological Databases By : Lim Yun Ping E mail :

1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

Adding GO for Large Datasets COST Functional Modeling Workshop April, Helsinki.

The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:

Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.

Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:

EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,

1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.

+ => Bioinformatics: from Sequence to Knowledge Outline: Introduction to bioinformatics The TAU Bioinformatics unit Useful bioinformatics issues and databases:

Pfam, DAS and the future Rob Finn DAS Workshop 2009.

PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.

Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.

Protein and RNA Families

Copyright OpenHelix. No use or reproduction without express written consent1.

Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.

Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill

Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.

Towards a Glossary of Activities in the Ontology Engineering Field Mari Carmen Suárez-Figueroa and Asunción Gómez-Pérez {mcsuarez, Ontology.

Bioinformatics and Computational Biology

An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.

Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.

Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.

EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.

Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.

Dasty2 DAS workshop th March Rafael Jimenez.

A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.

Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,

Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis

InterPro Sandra Orchard.

Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.

Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.

Protein databases Henrik Nielsen

Building a community for genome and proteome annotation

생물정보학 Bioinformatics.

Department of Genetics • Stanford University School of Medicine

Genome Annotation Continued

Advanced PGDB Editing: Regulation GO Terms

Ensembl Genome Repository.

Advanced PGDB Editing: Gene Ontology (GO) Terms

Annotator Interface GUS 3.0 Workshop June 18-21, 2002.

Presentation transcript:

Protein Annotation Ontology The BioSapiens Virtual Institute for Genome Annotations Janet Thornton & Gabby Reeves AFP/BioSapiens Vienna: July 07

Outline Integrating annotations -- why it is so important to think about it. Progress made by the BioSapiens towards the virtual institute for genome annotations. Creating the ontology ontology rules software (OBO) The Ontology – a brief outline

The European Virtual Institute for Genome Annotation Funded by the European Commission

BioSapiens Network of Excellence 26 partners in 14 different countries The objective of the BIOSAPIENS Network of Excellence is to provide a large-scale, concerted effort to annotate genome data by laboratories distributed around Europe, using both informatics tools and input from experimentalists.

Many tools have been developed for the annotation of proteins – many make similar predictions. These tools come from a number of different labs in different locations BIOSAPIENS

BioSapiens Genome Annotation DNA Annotation Proteome Annotation Functional Annotation Gene definition/ alternative splicing Regulators and promoters Expression Variation (haplotypes and SNPs) Protein families, orthologues Membrane proteins and ligands 3D protein structure Post translational modification and localisation Sequence and structure to function Protein-protein complexes Pathways and networks How can we provide an integrated view of this information for the biologist?

69 sources from 19 partner sites, providing approximately 330 annotations. Information provided but not functionally ordered. Without a defined ontology, accurate interpretation of these annotations is impossible. The servers providing annotations also need sensible IDs to allow adequate identification and administration Functional Grouping of Annotations

Integrating Annotations Sequencing projects, structural genomics initiatives, ever increasing experimental based knowledge of biological systems. 1.Additional information needs to be added to already existing entries. e.g. EMBL/Genbank/DDBJ Third Party Annotation pilot studyThird Party Annotation pilot study Entries via the website, marked as TPA entriesEntries via the website, marked as TPA entries Checked carefully by curators before published.Checked carefully by curators before published.

UniProt - proposals The adopt a protein scheme, - a research community in a particular area would be responsible for the update of informationThe adopt a protein scheme, - a research community in a particular area would be responsible for the update of information Making use of grey matter – using the growing population of retired scientists at home – with broadband accounts and nothing to do.Making use of grey matter – using the growing population of retired scientists at home – with broadband accounts and nothing to do. Quality and uniformity of curation is an issue – input fields free text/drop down menusQuality and uniformity of curation is an issue – input fields free text/drop down menus Distributed Annotation System Distributed Annotation System allows a system of decentralised annotation Integrating Annotations 2.Manually curated databases are struggling with the influx of information.

What it is –The distributed annotation system (DAS) is a specification of a client-server system for sharing various types of sequence annotations. –An annotation is an entity which is anchored to a reference subsequence with a start and a stop position, together with some information about the type and method of annotation, and possibly some other textual information. –Today, DAS is used for serving positional annotations on genomes and on proteins, and for serving global annotations on genes. DAS, the distributed annotation system

Distributed Annotation System Viewer DAS Protocol

Dasty2 Rafael Jimenez

Spice Andreas Prlic

1.Cluster like annotations together to aid comparison between sources. What will the ontology do?

Information on metal binding sites from two sources

1.Cluster like annotations together to aid comparison between sources. 2.Facilitate the identification of exact duplications in the data (e.g. Pfam domains are provided by Interpro and UniProt). What will the ontology do?

Duplications in the data.

1.Cluster like annotations together to aid comparison between sources. 2.Facilitate the identification of exact duplications in the data (e.g. Pfam domains are provided by Interpro and UniProt). 3.Standardise the vocabulary used by each partner. This will allow us to manipulate the data in a more powerful way. What will the ontology do?

Standardisation of information provided by all DAS servers. Sometimes annotation types on some servers are exactly the same as names on other servers Server Annotation

1.Cluster like annotations together to aid comparison between sources. 2.Facilitate the identification of exact duplications in the data (e.g. Pfam domains are provided by Interpro and UniProt). 3.Standardise the vocabulary used by each partner site. This will allow us to manipulate the data in a more powerful way. 4.Provide evidence for each annotation to give an indication on how the information can be used. What will the ontology do?

Evidence codes. Each annotation must have at least one evidence code associated with it. ECOEvidence codes can be selected from the Evidence Code Ontology It is up to each partner to decide the evidence codes for their own annotations as each case is very individual.

Designing an Ontology The provision of a controlled vocabulary which can be shared between data sources. Needs Approval of the community. The creation of terms and clustering can only be done properly by an expert in the field rather than an expert in ontologies. Clear Goals essential: What relationship are necessary; What should they show. Increased complexity becomes laborious and time-intensive Continuous evolution. Once agreed, the ontology will be deposited with the SO for maintenance.

Ontology Rules Terms: computer friendly Phrase spacing: terms do not include white space. e.g. binding_site. Case: terms are always in lowercase except where demanded by context e.g. mRNA Abbreviations: If there is a common abbreviation, it is used for the name of the term, eg UTR. Symbols: Symbols and greek letters are generally spelled out in full. Full stops, slashes, and hyphens are not allowed, underscores used instead. Brackets (){}[] are not allowed. Synonyms: They facilitate searching the ontology. Types of synonym: The long version of the words in the abbreviated phrase spelled out, different words that mean the same thing. Synonym rules: There is no limit on synonym number, one synonym can be used more than once, Synonyms do not have to be computer friendly. They can begin with numbers and include punctuation such as hyphens. Definitions: Each term should have a definition. A definition must have a reference to its origin. (PubMed, database, website, the person that created it). The format of a definition: a bicycle -- has two wheels a tandem -- is a bicycle with two saddles and two sets of handle bars. (inherits all the features of bicycle – therefore the definition for bicycle definition cannot state a saddle and a set of handlebars) Understanding relationships: Currently there are 3 types of relationship in SO; is_a, part_of and derived_from

The OBO Editor

The Ontology Still in draft form.

Acknowledgements Gaby Reeves Midori Harris (GO), Karen Eilbeck (SO) Luisa Montecchi, Henning Hermjakob, Eugene Kulesha, Andreas Prlic Members of UniProt (EBI and SIB): Alan Bridge, Michele Magrane, Clare ODonovan, and Anne-Lise Veuthey BioSapiens Workshop held in February: University of Bologna, CNIO, University of Dundee, EBI, ENZIM Hungary, Hebrew University MPI, Sanger and UCL