Presentation is loading. Please wait.

Presentation is loading. Please wait.

Real-life ontology development: lessons from the Gene Ontology.

Similar presentations


Presentation on theme: "Real-life ontology development: lessons from the Gene Ontology."— Presentation transcript:

1 Real-life ontology development: lessons from the Gene Ontology

2 What is GO? Evolution of GO Mechanisms of updating GO Tools for ontology development Lessons learned What is GO? Evolution of GO Mechanisms of updating GO Tools for ontology development Lessons learned

3 Gene Ontology Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases” Applicable to all species Built for a very specific purpose: “annotation of genes and proteins in genomic and protein databases” Applicable to all species

4 Gene Ontology - scope Three disjoint axes: –molecular function molecular role e.g. catalytic activity, binding –biological process broad biological phenomena e.g. mitosis, growth, digestion –cellular component sub-cellular location e.g nucleus, ribosome, origin recognition complex Three disjoint axes: –molecular function molecular role e.g. catalytic activity, binding –biological process broad biological phenomena e.g. mitosis, growth, digestion –cellular component sub-cellular location e.g nucleus, ribosome, origin recognition complex

5 Gene Ontology Directed acyclic graph (DAG) Terms connected by two transitive relations (edges): –is_a –part_of Directed acyclic graph (DAG) Terms connected by two transitive relations (edges): –is_a –part_of

6 Gene Ontology Developed by an international consortium –about 50 members Editorial office, 4 full-time editors (ish) Many other part-time editors at databases Multiple changes made a day –made live immediately Developed by an international consortium –about 50 members Editorial office, 4 full-time editors (ish) Many other part-time editors at databases Multiple changes made a day –made live immediately

7 Gene Ontology Main ontology format OBO flat file Changes are live immediately –no releases Propagated to GO database –monthly snapshots archived Main ontology format OBO flat file Changes are live immediately –no releases Propagated to GO database –monthly snapshots archived

8 Evolution of GO Original GO created in 2000 Three databases involved: –FlyBase (Drosophila) –MGI (Mouse) –SGD (S. cerevisae) Used immediately Original GO created in 2000 Three databases involved: –FlyBase (Drosophila) –MGI (Mouse) –SGD (S. cerevisae) Used immediately

9 Evolution of GO Later databases: –TAIR (Arabadopsis) –TIGR (microbes including prokaryotes) –SWISS-PROT (several thousand species inc. human) –PSU (P. falciparum) Recent additions –ZFIN (zebrafish) –PAMGO (plant pathogens) Later databases: –TAIR (Arabadopsis) –TIGR (microbes including prokaryotes) –SWISS-PROT (several thousand species inc. human) –PSU (P. falciparum) Recent additions –ZFIN (zebrafish) –PAMGO (plant pathogens)

10 Evolution of GO GO development traditionally annotation-driven –development directed by use Terms added as new species annotated Terms added on as as-needed basis GO development traditionally annotation-driven –development directed by use Terms added as new species annotated Terms added on as as-needed basis

11 Evolution of GO Resulted in ‘organic’ structure, little formality Ontological formality added subsequently –philosophical and logical Resulted in ‘organic’ structure, little formality Ontological formality added subsequently –philosophical and logical

12 Growth of GO

13 Modifying the graph: Before:

14 Modifying the graph: But then I need to annotate VW Beetles, pre-1980 The graph no longer works, because the engine is in the boot But then I need to annotate VW Beetles, pre-1980 The graph no longer works, because the engine is in the boot

15 Modifying the graph: After:

16 Mechanisms for ontology change Small incremental changes Initially all changes to the ontologies made this way Small incremental changes Initially all changes to the ontologies made this way

17 Mechanisms for ontology change Suggested changes initially submitted by email Moved to an online tracking system when this became unmanageable Suggested changes initially submitted by email Moved to an online tracking system when this became unmanageable

18 Requesting changes to GO - curator requests tracker Web-based tracking system hosted at SourceForge.net Public Tracker item for each new request or question Web-based tracking system hosted at SourceForge.net Public Tracker item for each new request or question

19 Curator requests tracker

20 Mechanisms for ontology change Problems: –Larger questions about the higher ontology structure remain unresolved –Makes some items impossible to close –No sense of the ‘big picture’ –Large areas of the ontologies missing or incomplete because no annotations –Massive volume needed to increase the number of editors Problems: –Larger questions about the higher ontology structure remain unresolved –Makes some items impossible to close –No sense of the ‘big picture’ –Large areas of the ontologies missing or incomplete because no annotations –Massive volume needed to increase the number of editors

21 Mechanisms for ontology change Larger-scale changes: –content meetings –interest groups Larger-scale changes: –content meetings –interest groups

22 Content meetings Short meetings aimed at developing specific areas of GO ontology content –proposals refined and discussed before meeting –small number of people (10-15) –invited experts –specific topics Short meetings aimed at developing specific areas of GO ontology content –proposals refined and discussed before meeting –small number of people (10-15) –invited experts –specific topics

23 Content meetings Further refinements made following meeting by email Changes are made once consensus reached Large number of terms typically added (500+) Further refinements made following meeting by email Changes are made once consensus reached Large number of terms typically added (500+)

24 Content meetings Recent meetings: –immunology –interactions between organisms –CNS development Recent meetings: –immunology –interactions between organisms –CNS development

25 Content meetings Advantages –Allows a lot of detailed work to be done on a very specific area –Involves external expertise Advantages –Allows a lot of detailed work to be done on a very specific area –Involves external expertise

26 Content meetings Problems: –Expensive - everyone has to be in the same location –Only works for very specific topics –Long lag time getting terms into ontologies Problems: –Expensive - everyone has to be in the same location –Only works for very specific topics –Long lag time getting terms into ontologies

27 Interest groups Groups of experts for a specific topic –e.g. development, cell cycle, plants Includes GO curators/annotators and external experts Don’t typically meet face to face Groups of experts for a specific topic –e.g. development, cell cycle, plants Includes GO curators/annotators and external experts Don’t typically meet face to face

28 Interest groups Communicate via email, desktop sharing etc Transporters area of the ontology recently revised this way Communicate via email, desktop sharing etc Transporters area of the ontology recently revised this way

29 Interest groups Advantages –Cheap, no travel required –Allows a lot of detailed work to be done on a very specific area –Involves external expertise Advantages –Cheap, no travel required –Allows a lot of detailed work to be done on a very specific area –Involves external expertise

30 Interest groups Disadvantages –Harder to reach consensus when not face to face –Projects tend to drag on Disadvantages –Harder to reach consensus when not face to face –Projects tend to drag on

31 Mechanisms for ontology change Systematic changes via small working groups

32 Systematic changes Projects not directly related to biological content Systematic changes throughout ontology Small group of GO consortium members –meets regularly by desktop sharing, voice over IP Experts recruited to meetings as needed Projects not directly related to biological content Systematic changes throughout ontology Small group of GO consortium members –meets regularly by desktop sharing, voice over IP Experts recruited to meetings as needed

33 Systematic changes Changes either –made on a branch of the ontology and merged in later always have big problems merging branched file into main file –merged directly into live ontology after session fast, but people get angry Changes either –made on a branch of the ontology and merged in later always have big problems merging branched file into main file –merged directly into live ontology after session fast, but people get angry

34 is_a complete GO contains both is_a and part_of relations Typically, graphs a mixture of incomplete is_a and part_of hierarchies A result of ‘organic’ evolution of GO All graphs now have complete is_a paths to root GO contains both is_a and part_of relations Typically, graphs a mixture of incomplete is_a and part_of hierarchies A result of ‘organic’ evolution of GO All graphs now have complete is_a paths to root

35 partial disjointness Biological process terms organised by granularity: –cellular process –multicellular organism process –multi-organism process To avoid massive increase in number of paths to root, these terms are disjoint –no is_a children in common Biological process terms organised by granularity: –cellular process –multicellular organism process –multi-organism process To avoid massive increase in number of paths to root, these terms are disjoint –no is_a children in common

36 sensu sensu (meaning ‘in the sense of’) used to disambiguate, by taxonomic group, terms with identical strings but different meanings e.g. sporulation (sensu Viridiplantae) v/s sporulation (sensu Bacteria) sensu (meaning ‘in the sense of’) used to disambiguate, by taxonomic group, terms with identical strings but different meanings e.g. sporulation (sensu Viridiplantae) v/s sporulation (sensu Bacteria)

37 sensu Current project to remove the sensu term strings Replace with strings that represent the true differentiae e.g. –cell wall (sensu Bacteria) -> peptidoglycan- based cell wall –cell wall (sensu Fungi) -> chitin- and beta- glucan-containing cell wall Current project to remove the sensu term strings Replace with strings that represent the true differentiae e.g. –cell wall (sensu Bacteria) -> peptidoglycan- based cell wall –cell wall (sensu Fungi) -> chitin- and beta- glucan-containing cell wall

38 Advantages –Fast –Efficient –Small number of people required Advantages –Fast –Efficient –Small number of people required Systematic changes to GO

39 Disadvantages –Difficult to obtain wider consensus –Changes sometimes have to be undone Disadvantages –Difficult to obtain wider consensus –Changes sometimes have to be undone Systematic changes to GO

40 Useful tools for ontology development WebEx –desktop sharing, can control each others desktops wiki –mainly internal Skype –free international calls! conference calls –not free WebEx –desktop sharing, can control each others desktops wiki –mainly internal Skype –free international calls! conference calls –not free

41 Tracking changes to GO General tracking –files stored in cvs, all differences trackable (in theory) –far from ideal - frequent discussion is should we history track, date-stamp terms? General tracking –files stored in cvs, all differences trackable (in theory) –far from ideal - frequent discussion is should we history track, date-stamp terms?

42 Tracking changes to GO Obsolete terms –formerly stored within the ontology –in OBO format made a special kind of deprecated term (tag is_obsolete) –Soon to create ‘replaced_by’ and ‘consider’ tags to point to live terms Obsolete terms –formerly stored within the ontology –in OBO format made a special kind of deprecated term (tag is_obsolete) –Soon to create ‘replaced_by’ and ‘consider’ tags to point to live terms

43 Tracking changes to GO Crediting experts –traditionally no mechanism for doing this –creating abstracts for content meetings, adding tag to term –as yet no mechanism for crediting individuals Crediting experts –traditionally no mechanism for doing this –creating abstracts for content meetings, adding tag to term –as yet no mechanism for crediting individuals

44 Useful tools for ontology development OBO-Edit –ontology editor originally developed for GO –can be used for any OBO format ontology –developed by group of users OBO-Edit –ontology editor originally developed for GO –can be used for any OBO format ontology –developed by group of users

45 Useful tools for ontology development Reasoner integrated into OBO-Edit –based on OBOL –detects missing links, redundant links, –soon misplaced terms, automatic term creation Validation system –typographical errors, is_a orphans, duplicate synonyms etc. Reasoner integrated into OBO-Edit –based on OBOL –detects missing links, redundant links, –soon misplaced terms, automatic term creation Validation system –typographical errors, is_a orphans, duplicate synonyms etc.

46

47 Lessons learned An ontology doesn’t have to be perfect or complete to be used For domain ontologies, external experts should be involved Communication is critical You will never please everyone An ontology doesn’t have to be perfect or complete to be used For domain ontologies, external experts should be involved Communication is critical You will never please everyone


Download ppt "Real-life ontology development: lessons from the Gene Ontology."

Similar presentations


Ads by Google