Download presentation
Presentation is loading. Please wait.
Published byAidan Thornton Modified over 11 years ago
1
Protein Annotation Ontology The BioSapiens Virtual Institute for Genome Annotations Janet Thornton & Gabby Reeves AFP/BioSapiens Vienna: July 07
2
Outline Integrating annotations -- why it is so important to think about it. Progress made by the BioSapiens towards the virtual institute for genome annotations. Creating the ontology ontology rules software (OBO) The Ontology – a brief outline
3
The European Virtual Institute for Genome Annotation Funded by the European Commission
4
BioSapiens Network of Excellence 26 partners in 14 different countries The objective of the BIOSAPIENS Network of Excellence is to provide a large-scale, concerted effort to annotate genome data by laboratories distributed around Europe, using both informatics tools and input from experimentalists.
5
Many tools have been developed for the annotation of proteins – many make similar predictions. These tools come from a number of different labs in different locations BIOSAPIENS
7
BioSapiens Genome Annotation DNA Annotation Proteome Annotation Functional Annotation Gene definition/ alternative splicing Regulators and promoters Expression Variation (haplotypes and SNPs) Protein families, orthologues Membrane proteins and ligands 3D protein structure Post translational modification and localisation Sequence and structure to function Protein-protein complexes Pathways and networks How can we provide an integrated view of this information for the biologist?
8
69 sources from 19 partner sites, providing approximately 330 annotations. Information provided but not functionally ordered. Without a defined ontology, accurate interpretation of these annotations is impossible. The servers providing annotations also need sensible IDs to allow adequate identification and administration Functional Grouping of Annotations
9
Integrating Annotations Sequencing projects, structural genomics initiatives, ever increasing experimental based knowledge of biological systems. 1.Additional information needs to be added to already existing entries. e.g. EMBL/Genbank/DDBJ Third Party Annotation pilot studyThird Party Annotation pilot study Entries via the website, marked as TPA entriesEntries via the website, marked as TPA entries Checked carefully by curators before published.Checked carefully by curators before published.
10
UniProt - proposals The adopt a protein scheme, - a research community in a particular area would be responsible for the update of informationThe adopt a protein scheme, - a research community in a particular area would be responsible for the update of information Making use of grey matter – using the growing population of retired scientists at home – with broadband accounts and nothing to do.Making use of grey matter – using the growing population of retired scientists at home – with broadband accounts and nothing to do. Quality and uniformity of curation is an issue – input fields free text/drop down menusQuality and uniformity of curation is an issue – input fields free text/drop down menus Distributed Annotation System Distributed Annotation System allows a system of decentralised annotation Integrating Annotations 2.Manually curated databases are struggling with the influx of information.
11
What it is –The distributed annotation system (DAS) is a specification of a client-server system for sharing various types of sequence annotations. –An annotation is an entity which is anchored to a reference subsequence with a start and a stop position, together with some information about the type and method of annotation, and possibly some other textual information. –Today, DAS is used for serving positional annotations on genomes and on proteins, and for serving global annotations on genes. DAS, the distributed annotation system
12
Distributed Annotation System Viewer DAS Protocol
13
Dasty2 Rafael Jimenez
14
Spice Andreas Prlic
15
1.Cluster like annotations together to aid comparison between sources. What will the ontology do?
16
Information on metal binding sites from two sources
17
1.Cluster like annotations together to aid comparison between sources. 2.Facilitate the identification of exact duplications in the data (e.g. Pfam domains are provided by Interpro and UniProt). What will the ontology do?
18
Duplications in the data.
19
1.Cluster like annotations together to aid comparison between sources. 2.Facilitate the identification of exact duplications in the data (e.g. Pfam domains are provided by Interpro and UniProt). 3.Standardise the vocabulary used by each partner. This will allow us to manipulate the data in a more powerful way. What will the ontology do?
20
Standardisation of information provided by all DAS servers. Sometimes annotation types on some servers are exactly the same as names on other servers Server Annotation
21
1.Cluster like annotations together to aid comparison between sources. 2.Facilitate the identification of exact duplications in the data (e.g. Pfam domains are provided by Interpro and UniProt). 3.Standardise the vocabulary used by each partner site. This will allow us to manipulate the data in a more powerful way. 4.Provide evidence for each annotation to give an indication on how the information can be used. What will the ontology do?
22
Evidence codes. Each annotation must have at least one evidence code associated with it. ECOEvidence codes can be selected from the Evidence Code Ontology It is up to each partner to decide the evidence codes for their own annotations as each case is very individual. http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=ECO
23
Designing an Ontology The provision of a controlled vocabulary which can be shared between data sources. Needs Approval of the community. The creation of terms and clustering can only be done properly by an expert in the field rather than an expert in ontologies. Clear Goals essential: What relationship are necessary; What should they show. Increased complexity becomes laborious and time-intensive Continuous evolution. Once agreed, the ontology will be deposited with the SO for maintenance.
24
Ontology Rules Terms: computer friendly Phrase spacing: terms do not include white space. e.g. binding_site. Case: terms are always in lowercase except where demanded by context e.g. mRNA Abbreviations: If there is a common abbreviation, it is used for the name of the term, eg UTR. Symbols: Symbols and greek letters are generally spelled out in full. Full stops, slashes, and hyphens are not allowed, underscores used instead. Brackets (){}[] are not allowed. Synonyms: They facilitate searching the ontology. Types of synonym: The long version of the words in the abbreviated phrase spelled out, different words that mean the same thing. Synonym rules: There is no limit on synonym number, one synonym can be used more than once, Synonyms do not have to be computer friendly. They can begin with numbers and include punctuation such as hyphens. Definitions: Each term should have a definition. A definition must have a reference to its origin. (PubMed, database, website, the person that created it). The format of a definition: a bicycle -- has two wheels a tandem -- is a bicycle with two saddles and two sets of handle bars. (inherits all the features of bicycle – therefore the definition for bicycle definition cannot state a saddle and a set of handlebars) Understanding relationships: Currently there are 3 types of relationship in SO; is_a, part_of and derived_from
25
The OBO Editor
26
The Ontology Still in draft form.
27
Acknowledgements Gaby Reeves Midori Harris (GO), Karen Eilbeck (SO) Luisa Montecchi, Henning Hermjakob, Eugene Kulesha, Andreas Prlic Members of UniProt (EBI and SIB): Alan Bridge, Michele Magrane, Clare ODonovan, and Anne-Lise Veuthey BioSapiens Workshop held in February: University of Bologna, CNIO, University of Dundee, EBI, ENZIM Hungary, Hebrew University MPI, Sanger and UCL
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.