Download presentation
Presentation is loading. Please wait.
Published byThomas Lyons Modified over 9 years ago
2
GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation epidermal cell differentiation regulation of flower development interleukin-18 receptor complex B-cell differentiation dorsal ectoderm
4
biosynthesis is_a metabolism
5
cysteine is_a serine family amino acid is_a amino acid is_a amine
6
cysteine is_a serine family amino acid is_a amino acid is_a serine
7
Composed terms currently cause problems –No link to external ontology term –Redundancy –Inconsistency –Extra work –Annotation bottleneck –Tangled DAGs and confusing displays we have no way to disentangle Solution so far: –fix errors based on results of term name parsing (Obol) reactive, not proactive
8
Solution: actively manage composed terms Explicit pre-coordination –Composed terms should now/soon be coordinated using oboedit plugin building block terms are recorded in ontology along with composite term Benefits: –Correct DAG structure can be inferred from external ontologies e.g. make sure GO + CHEBI “align” –placement & consistency checking automated –additional work can be automated synonyms, text definitions
9
How will terms be pre- coordinated by oboedit? How do we record a definition for a composite term? –using a logical definition (computational essence) A logical definition consists of: –a generic term (aka genus) –relationships to other terms which serve to discriminate this specific term from other is_a children of the generic term (aka differentiae) Can be written in natural language as: –A which
10
Example of pre-coordination cysteine biosynthesis generic term: –biosynthesis discriminating characteristics: –outputs cysteine –natural language (Aristotelian style): a biosynthesis process which outputs cysteine
11
Example in Obo format [Term] id: GO:0019344 name: cysteine biosynthesis intersection_of: GO:0009058 ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine is_a: GO:0009070 ! serine family amino acid biosynthesis is_a: GO:0006534 ! cysteine metabolism
12
Alternate syntax used in pheno-syntax more compact similar to OWL abstract syntax I use Obo1.2 format or natural language in the rest of this presentation GO:cysteine_biosynthesis == GO:biosynthesis ∏ outputs(CHEBI:cysteine)
13
This allows us to dynamically untangle Process axis view (primary is_as, via generic term): –biological_process metabolism –biosynthesis »cysteine biosynthesis Process participant axis view: –amine amino acid –serine family amino acid »cysteine Combined view –(same as current tangled diamond lattice)
14
Obol demo http://yuri.lbl.gov/amigo/obol
15
Recording the relationship is important Why not just a simple cross-product? –e.g. biosynthesis x cysteine Relationships are important for reasoning and querying –Consider: cysteine biosynthesis from serine mRNA export from nucleus during heat stress Without the relations, the logical definition is not specific enough –the essence is not captured Relations should come from RO –more required
16
Multiple discriminating characteristics are allowed Cysteine biosynthesis from serine –Generic term: biosynthesis –Discriminating characteristics: output cysteine input serine [Term] name: cysteine biosynthesis from serine intersection_of: GO:0009058 ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine intersection_of: input CHEBI:17822 ! serine
17
Composite terms can be nested [Term] id: GO:xxxxxxx name: regulation of cysteine biosynthesis intersection_of: GO:0050789 ! regulation of biological process intersection_of: regulates GO:0019344 ! cysteine biosynthesis [Term] id: GO:0019344 name: cysteine biosynthesis intersection_of: GO:0009058 ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine regulation^regulates(biosynthesis^outputs(cysteine)) regulation^regulates(biosynthesis)^outputs(cysteine) YES NO
18
Composite terms can optionally be manufactured in bulk Generic term: {metabolism,biosynthesis} Differentia: has_output {serine, cysteine, …} With caution… –Sparse vs dense matrices –not all combinations are types
19
On the importance of necessary and sufficient conditions Why intersection_of? Why not just make normal links in the GO DAG? –normal relationships are for necessary conditions only –we want both necessary and sufficient conditions captures the essence of the term
20
Normal DAG links only capture necessary conditions, not essence immune cell activation inflammatory response part_of A change in morphology and behavior of a macrophage resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor text def: macrophage activation is_a
21
Indistinguishable by DAG immune cell activation inflammatory response part_of A change in morphology and behavior of a monocyte resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor text def: monocyte activation is_a
22
essence captured by genus- differentia macrophage activation immune cell activation is_a inflammatory response part_of id: GO:macrophage_activation intersection_of: GO:cell_activation intersection_of: activates CL:macrophage
23
essence captured by genus- differentia macrophage activation immune cell activation is_a inflammatory response part_of id: GO:macrophage_activation intersection_of: GO:cell_activation intersection_of: activates CL:macrophage CL:macrophage cell activation is_a genus activates
24
Current status of pre- coordinated terms SO already contains composite terms –46 pre-coordinated terms –A silenced gene is a gene which has the quality of being silenced GO-BP/CL integration underway –retrospectively pre-coordinated terms Obol page has pre-coordinated terms from automatic parsing –http://www.fruitfly.org/~cjm/obolhttp://www.fruitfly.org/~cjm/obol
25
Pre- vs post- coordinated Pre-coordination –terms are in ontology with IDs and computable definitions –increases complexity of ontology –complexity can be managed by tools e.g. new oboedit features Post-coordination –terms are combined in the database –forces more complexity in database schema and database applications
26
Pre-coordination is useful in moderation Commonly used terms should be pre- coordinated eg cysteine biosynthesis; oocyte differentiation; pectoral fin Avoid taking to extremes cf ICD-9 Where do we draw the line? –ontologies should be built around one or a few axes of classification term ‘explosion’ typically gets large when multiple axes are combined –we can change our minds later pre- and post- coordination is commensurable
27
Commensurability Annotator annotates to –nucleus^part_of(astrocyte) Anatomy editor creates new term –uses oboedit cross-product plugin –astrocyte_nucleus = nucleus^part_of(astrocyte) Annotation can be dynamically ‘promoted’ to new term in answer to queries –various software techniques for achieving this
28
Post-coordination in GO annotations Pre- and post- coordination are compatible and commensurable We should extend the annotation format to allow denoting more specific classes –e.g. cholesterol transport in liver –advanced applications can query this –standard applications suffer no loss –extended annotations can be used to help seed new terms in the ontology This is already being done (MGI,Dicty) –we just want to capture this in interopeable way
29
Post-composition in gene association files New column in GA file format Gene Product Term ID…Properties AABC1GO:0030301 (cholesterol transport) located_in(MA:liver) AABC2GO:0048663 (neuron fate development) has_participant(FBbt:Y_neuron)
30
Database issues Chado and GO DB can handle pre- and post- coordination –in theory anyway not yet fully tested How does it work? –‘anonymous term’ created for coordinated term –documentation in chado cvs chado/modules/cv/doc/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.