Download presentation
Presentation is loading. Please wait.
Published byAnissa Emmet Modified over 10 years ago
1
Everything must be logical Terms should be in the right place with good definition and the correct relationships I need very specific terms to describe my sequences. I need these terms yesterday. DATA SHARING Annotations come in many varieties – GenBank, SwissProt, CHADO, GFF3 to name a few MOD’s use different storage mechanisms – even if they use the same vocabulary MOD’s may use different *legitimate* ways to describe the same thing. A split location CDS conveys the same information as exon/intron structure with start and stop codons. We need to share SO compliant annotations between groups. We need to convert SO compliant annotations to other formats – for example to submit to GenBank. What needs to be done Build upon existing adapters Develop mapping files between SO and other formats CROSS PRODUCT TERMS Cross products are terms created by joining together 2 (or more) concepts from different ontologies or different aspects of the same ontology. In SO we have located_sequence_feature terms which can be placed within coordinates on a sequence, and sequence_attributes; terms which cannot exist on their own but describe a property of a feature. An example of a cross product of two of these terms is: [Term] id: SO:0000280 name: engineered_gene intersection_of: SO:0000704 ! gene intersection_of: has_quality SO:0000783 ! engineered How does this help Terms can be generated quickly. OBOEdit can help. The readable definitions are easy to produce. The terms are computable; defined by rules. What needs to be done Many of the terms in the Sequence attributes were created before cross products became mature and so are ‘place holders’ at the moment. These terms must be evaluated, the quality extracted and the cross products made. Example mRNA_with_frameshift should not be a sequence attribute but a cross product of mRNA with the quality frameshifted. Parsers and reasoners need to be aware of cross products if they are not already. DISPLAY Users of the ontology have many reasons to view and browse the ontology. To look for the correct term to annotate with. To understand the composition of a concept. To find the detail of the terminology. To find the right level for their query. Search and display functionality already exists with the Gene Ontology Consortium Editor OBO-Edit. This tool however provides more functionality than the casual user, or experienced user with a quick query is looking for. We have developed miSO, as a prototype browsing tool for SO (and any other modest sized OBO ontology). It is web based, and uses Javascript to provide both a graphical tree structure view and an autocompleing term finder. Each term is shown with definitions, links to cross references and the parent/child relationships. Advantages to this approach. Easy to use webpage Quick browsing Up to date with latest release (Also displays metadata fields) Future Work It should display different kinds of relationships in the tree. It should also search synonyms. It will need to understand cross product terms and display them correctly. VALIDATION SERVICES If MODS are going to make and distribute SO compliant annotations, we must provide tools and services to aid this process. Does the GFF3 contain the controlled vocabulary? (We have found some examples of MOD GFF3 that does not adhere to the SO controlled vocabulary - for example the word ‘coding’ is used instead of CDS) Do the assertions made in the annotation match the knowledge in the ontology? We are proposing a web based validation service for annotation formats such as GFF3, where the developer can paste in annotation and view the validation process to ensure that the semantic and syntactic information is correct. The Sequence Ontology has reached a point in development where we now have an annotating community, and a software community dispersed within the model organism groups, and now prospective users are also appearing. This is has several implications and SO is being pulled in different directions. There is a demand for new and more specific terms to be generated quickly from the annotation community, the software community need to make use of the knowledge contained in the ontology, and the prospective users want to be able to look and learn. Here we outline approaches to meet these needs I am a naïve user I want to learn more about SO I want to look at SO I want to query annotations using SO I am a skilled user I want to make my own SO compliant annotations I want to query the annotations using SO Software must understand the meaning of relationships Software must be able to traverse sequence data labeled with ontology terms. DEFINITION OVERHAUL The definition should reflect what the sequence is, not what the corresponding molecule is. There are several problems with some of the textual definitions in SO, especially where we have used an existing definition from another source. For example Definitions should not use terms like RNA or DNA because sequence can undergo transformations. We can do things like locate the catalytic sequence of a protein back to its nucleotide sequence so to pin a region down to one kind of molecule does not make a lot of sense. Also we need to make sure that our definitions do not contradict the relationships in the ontology. Currently the definition of exon restricts it to being a part of mRNA only, but in the ontology it is part of a transcript so all kinds of transcript may have exons. (Also non coding sequence may have exons) The definition overhaul will involved many of the terms we have been taking for granted such as exon and transcript. UNDERSTANDING RELATIONSHIPS Many of the relationships in SO are transitive. We can follow them and make inferences about things labeled with the terms. But what do the relationships actually mean with regards to the sequence? We have started to subtype the part_of relationship into two of the six subtypes classified in Winston et al. (1987). A member_of relationship is not necessarily spationally connected to a cohesive whole. For example regulatory regions may be found many kilobases away from the gene, or even on another chromosome. A composite_part_of relationship however denotes the part must be enclosed within the whole. This can be used computationally to validate the annotation. We are also using some spatial relationships (see Egenhofer) to describe the topology between features. Adjacent_to (meets) means that the two feature share a boundary (junction) in common. What needs to be done The relationships used in SO must be fully documents, and the ontology updated to reflect exactly what we mean by our spatial and containment relationships to allow reasoning to be more effective. Winston, M, Chaffin, R, Herrmann: A taxonomy of part-whole relations. Cog Sci 1987, 11:417-444. Egenhofer MJ: A formal definition of binary topological relationships. Lecture Notes Comp Sci 1989, 367:457-472.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.