Download presentation
Presentation is loading. Please wait.
1
Real-life ontology development: lessons from the Gene Ontology
2
What is GO? Evolution of GO Mechanisms of updating GO Tools for ontology development Lessons learned What is GO? Evolution of GO Mechanisms of updating GO Tools for ontology development Lessons learned
3
Gene Ontology Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases” Applicable to all species Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases” Applicable to all species
4
Gene Ontology - scope Three disjoint axes: –molecular function molecular role e.g. catalytic activity, binding –biological process broad biological phenomena e.g. mitosis, growth, digestion –cellular component sub-cellular location e.g nucleus, ribosome, origin recognition complex Three disjoint axes: –molecular function molecular role e.g. catalytic activity, binding –biological process broad biological phenomena e.g. mitosis, growth, digestion –cellular component sub-cellular location e.g nucleus, ribosome, origin recognition complex
5
Gene Ontology Directed acyclic graph (DAG) Terms connected by two transitive relations (edges): –is_a –part_of Directed acyclic graph (DAG) Terms connected by two transitive relations (edges): –is_a –part_of
6
Gene Ontology Developed by an international consortium –about 50 members Editorial office, 4 full-time editors (ish) Many other part-time editors at databases Multiple changes made a day –made live immediately Developed by an international consortium –about 50 members Editorial office, 4 full-time editors (ish) Many other part-time editors at databases Multiple changes made a day –made live immediately
7
Gene Ontology Main ontology format OBO flat file Changes are live immediately –no releases Propagated to GO database –monthly snapshots archived Main ontology format OBO flat file Changes are live immediately –no releases Propagated to GO database –monthly snapshots archived
8
Evolution of GO Original GO created in 2000 Three databases involved: –FlyBase (Drosophila) –MGI (Mouse) –SGD (S. cerevisae) Used immediately Original GO created in 2000 Three databases involved: –FlyBase (Drosophila) –MGI (Mouse) –SGD (S. cerevisae) Used immediately
9
Evolution of GO Later databases: –TAIR (Arabadopsis) –TIGR (microbes including prokaryotes) –SWISS-PROT (several thousand species inc. human) –PSU (P. falciparum) Recent additions –ZFIN (zebrafish) –PAMGO (plant pathogens) Later databases: –TAIR (Arabadopsis) –TIGR (microbes including prokaryotes) –SWISS-PROT (several thousand species inc. human) –PSU (P. falciparum) Recent additions –ZFIN (zebrafish) –PAMGO (plant pathogens)
10
Evolution of GO GO development traditionally annotation-driven –development directed by use Terms added as new species annotated Terms added on as as-needed basis GO development traditionally annotation-driven –development directed by use Terms added as new species annotated Terms added on as as-needed basis
11
Evolution of GO Resulted in ‘organic’ structure, little formality Ontological formality added subsequently –philosophical and logical Resulted in ‘organic’ structure, little formality Ontological formality added subsequently –philosophical and logical
12
Growth of GO
13
Modifying the graph: Before:
14
Modifying the graph: But then I need to annotate VW Beetles, pre-1980 The graph no longer works, because the engine is in the boot But then I need to annotate VW Beetles, pre-1980 The graph no longer works, because the engine is in the boot
15
Modifying the graph: After:
16
Mechanisms for ontology change Small incremental changes Initially all changes to the ontologies made this way Small incremental changes Initially all changes to the ontologies made this way
17
Mechanisms for ontology change Suggested changes initially submitted by email Moved to an online tracking system when this became unmanageable Suggested changes initially submitted by email Moved to an online tracking system when this became unmanageable
18
Requesting changes to GO - curator requests tracker Web-based tracking system hosted at SourceForge.net Public Tracker item for each new request or question Web-based tracking system hosted at SourceForge.net Public Tracker item for each new request or question
19
Curator requests tracker
20
Mechanisms for ontology change Problems: –Larger questions about the higher ontology structure remain unresolved –Makes some items impossible to close –No sense of the ‘big picture’ –Large areas of the ontologies missing or incomplete because no annotations –Massive volume needed to increase the number of editors Problems: –Larger questions about the higher ontology structure remain unresolved –Makes some items impossible to close –No sense of the ‘big picture’ –Large areas of the ontologies missing or incomplete because no annotations –Massive volume needed to increase the number of editors
21
Mechanisms for ontology change Larger-scale changes: –content meetings –interest groups Larger-scale changes: –content meetings –interest groups
22
Content meetings Short meetings aimed at developing specific areas of GO ontology content –proposals refined and discussed before meeting –small number of people (10-15) –invited experts –specific topics Short meetings aimed at developing specific areas of GO ontology content –proposals refined and discussed before meeting –small number of people (10-15) –invited experts –specific topics
23
Content meetings Further refinements made following meeting by email Changes are made once consensus reached Large number of terms typically added (500+) Further refinements made following meeting by email Changes are made once consensus reached Large number of terms typically added (500+)
24
Content meetings Recent meetings: –immunology –interactions between organisms –CNS development Recent meetings: –immunology –interactions between organisms –CNS development
25
Content meetings Advantages –Allows a lot of detailed work to be done on a very specific area –Involves external expertise Advantages –Allows a lot of detailed work to be done on a very specific area –Involves external expertise
26
Content meetings Problems: –Expensive - everyone has to be in the same location –Only works for very specific topics –Long lag time getting terms into ontologies Problems: –Expensive - everyone has to be in the same location –Only works for very specific topics –Long lag time getting terms into ontologies
27
Interest groups Groups of experts for a specific topic –e.g. development, cell cycle, plants Includes GO curators/annotators and external experts Don’t typically meet face to face Groups of experts for a specific topic –e.g. development, cell cycle, plants Includes GO curators/annotators and external experts Don’t typically meet face to face
28
Interest groups Communicate via email, desktop sharing etc Transporters area of the ontology recently revised this way Communicate via email, desktop sharing etc Transporters area of the ontology recently revised this way
29
Interest groups Advantages –Cheap, no travel required –Allows a lot of detailed work to be done on a very specific area –Involves external expertise Advantages –Cheap, no travel required –Allows a lot of detailed work to be done on a very specific area –Involves external expertise
30
Interest groups Disadvantages –Harder to reach consensus when not face to face –Projects tend to drag on Disadvantages –Harder to reach consensus when not face to face –Projects tend to drag on
31
Mechanisms for ontology change Systematic changes via small working groups
32
Systematic changes Projects not directly related to biological content Systematic changes throughout ontology Small group of GO consortium members –meets regularly by desktop sharing, voice over IP Experts recruited to meetings as needed Projects not directly related to biological content Systematic changes throughout ontology Small group of GO consortium members –meets regularly by desktop sharing, voice over IP Experts recruited to meetings as needed
33
Systematic changes Changes either –made on a branch of the ontology and merged in later always have big problems merging branched file into main file –merged directly into live ontology after session fast, but people get angry Changes either –made on a branch of the ontology and merged in later always have big problems merging branched file into main file –merged directly into live ontology after session fast, but people get angry
34
is_a complete GO contains both is_a and part_of relations Typically, graphs a mixture of incomplete is_a and part_of hierarchies A result of ‘organic’ evolution of GO All graphs now have complete is_a paths to root GO contains both is_a and part_of relations Typically, graphs a mixture of incomplete is_a and part_of hierarchies A result of ‘organic’ evolution of GO All graphs now have complete is_a paths to root
35
partial disjointness Biological process terms organised by granularity: –cellular process –multicellular organism process –multi-organism process To avoid massive increase in number of paths to root, these terms are disjoint –no is_a children in common Biological process terms organised by granularity: –cellular process –multicellular organism process –multi-organism process To avoid massive increase in number of paths to root, these terms are disjoint –no is_a children in common
36
sensu sensu (meaning ‘in the sense of’) used to disambiguate, by taxonomic group, terms with identical strings but different meanings e.g. sporulation (sensu Viridiplantae) v/s sporulation (sensu Bacteria) sensu (meaning ‘in the sense of’) used to disambiguate, by taxonomic group, terms with identical strings but different meanings e.g. sporulation (sensu Viridiplantae) v/s sporulation (sensu Bacteria)
37
sensu Current project to remove the sensu term strings Replace with strings that represent the true differentiae e.g. –cell wall (sensu Bacteria) -> peptidoglycan- based cell wall –cell wall (sensu Fungi) -> chitin- and beta- glucan-containing cell wall Current project to remove the sensu term strings Replace with strings that represent the true differentiae e.g. –cell wall (sensu Bacteria) -> peptidoglycan- based cell wall –cell wall (sensu Fungi) -> chitin- and beta- glucan-containing cell wall
38
Advantages –Fast –Efficient –Small number of people required Advantages –Fast –Efficient –Small number of people required Systematic changes to GO
39
Disadvantages –Difficult to obtain wider consensus –Changes sometimes have to be undone Disadvantages –Difficult to obtain wider consensus –Changes sometimes have to be undone Systematic changes to GO
40
Useful tools for ontology development WebEx –desktop sharing, can control each others desktops wiki –mainly internal Skype –free international calls! conference calls –not free WebEx –desktop sharing, can control each others desktops wiki –mainly internal Skype –free international calls! conference calls –not free
41
Tracking changes to GO General tracking –files stored in cvs, all differences trackable (in theory) –far from ideal - frequent discussion is should we history track, date-stamp terms? General tracking –files stored in cvs, all differences trackable (in theory) –far from ideal - frequent discussion is should we history track, date-stamp terms?
42
Tracking changes to GO Obsolete terms –formerly stored within the ontology –in OBO format made a special kind of deprecated term (tag is_obsolete) –Soon to create ‘replaced_by’ and ‘consider’ tags to point to live terms Obsolete terms –formerly stored within the ontology –in OBO format made a special kind of deprecated term (tag is_obsolete) –Soon to create ‘replaced_by’ and ‘consider’ tags to point to live terms
43
Tracking changes to GO Crediting experts –traditionally no mechanism for doing this –creating abstracts for content meetings, adding tag to term –as yet no mechanism for crediting individuals Crediting experts –traditionally no mechanism for doing this –creating abstracts for content meetings, adding tag to term –as yet no mechanism for crediting individuals
44
Useful tools for ontology development OBO-Edit –ontology editor originally developed for GO –can be used for any OBO format ontology –developed by group of users OBO-Edit –ontology editor originally developed for GO –can be used for any OBO format ontology –developed by group of users
45
Useful tools for ontology development Reasoner integrated into OBO-Edit –based on OBOL –detects missing links, redundant links, –soon misplaced terms, automatic term creation Validation system –typographical errors, is_a orphans, duplicate synonyms etc. Reasoner integrated into OBO-Edit –based on OBOL –detects missing links, redundant links, –soon misplaced terms, automatic term creation Validation system –typographical errors, is_a orphans, duplicate synonyms etc.
47
Lessons learned An ontology doesn’t have to be perfect or complete to be used For domain ontologies, external experts should be involved Communication is critical You will never please everyone An ontology doesn’t have to be perfect or complete to be used For domain ontologies, external experts should be involved Communication is critical You will never please everyone
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.