VR
Formal Principles for Biomedical Ontologies Barry Smith
3 Three levels of ontology
ifomis.de 4 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture
ifomis.de 5 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture
ifomis.de 6 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture
ifomis.de 7 Compare: 1)pure mathematics (re-usable theories of structures such as order, set, function, mapping) 2)applied mathematics, applications of these theories = re-using the same definitions, theorems, proofs in new application domains 3)physical chemistry, biophysics, etc. = adding detail
ifomis.de 8 Three levels of biomedical ontology 1)formal (top-level) ontology = medical ontology has nothing like the technology of re-usable definitions, theorems and proofs provided by pure mathematics 2) domain ontology = UMLS Semantic Network, GALEN CORE 3) terminology-based ontology = UMLS, SNOMED-CT, GALEN, FMA ?????
ifomis.de 9 Description Logic, Protégé, and other tools for supporting automatic reasoning do not fill this gap they do not provide theories of classes, functions, processes, etc. rather: successful coding in a DL-framework presupposes that such theories have already been applied in the very construction of the terminology-based ontology
ifomis.de 10 IFOMIS Institute for Formal Ontology and Medical Information Science, mission: use basic principles of philosophical ontology, traditional theories of classification and definition for quality assurance and alignment of biomedical ontologies
ifomis.de 11 Strategy Part 1: Survey of GO Part 2: Provide principles for building biomedical ontologies derived from formal (top-level) ontology, and illustrate how they can help in quality assurance of terminology- based ontologies like GO Part 3: Show how it can be done right
ifomis.de 12 Part One Survey of GO
ifomis.de 13 GO is three ontologies cellular components molecular functions biological processes December 16, 2003: 1372 component terms 7271 function terms 8069 process terms
ifomis.de 14 GO an impressive achievement used by over 20 genome database and many other groups in academia and industry successful methodology, much imitated now part of OBO (open biological ontologies) consortium Here I focus on problems / errors GO here is just an example
ifomis.de 15 Primary aim of GO not rigorous definition and principled classification but rather: providing a practically useful framework for keeping track of the biological annotations that are applied to gene products
ifomis.de 16 Each of GO’s ontologies is organized in a graph-theoretical structure involving two sorts of links or edges: is-a (epithelial cell differentiation is-a cell differentiation) part-of (axonemal microtubule part-of axoneme)
ifomis.de 17 This graph-theoretic architecture to designed to help humans, who can use the graphs to locate the features and attributes they are addressing in their work and thus to determine the designated terms for these features and attributes within GO’s ‘controlled vocabulary.’
ifomis.de 18 GO’s three ontologies When a gene is identified, three important types of questions need to be addressed: Where is it located in the cell? What functions does it have on the molecular level? And to what biological processes do these functions contribute?
ifomis.de 19 GO’s three ontologies molecular functions cellular constituents biological processes
ifomis.de 20 The Cellular Component Ontology (counterpart of anatomy) consists of terms such as flagellum, chromosome, ferritin, extracellular matrix and virion Cellular components are physical and measurable entities. They are, in the terminology of philosophical ontology, objects or things (independent continuants). They endure self-identically through time while undergoing changes of various sorts Cellular component embraces also the extracellular environment of cells and cells themselves
ifomis.de 21 No organisms GO does not include terms for specific organisms, not even for single-celled organisms
ifomis.de 22 The Molecular Function Ontology molecular function = the action characteristic of a gene product. Actions such as ice nucleation or protein stabilization do not endure but rather occur.
ifomis.de 23 The Molecular Function Ontology Originally included terms such as anti- coagulant (defined as: ‘a substance that retards or prevents coagulation’) and enzyme (defined as: ‘a substance … that catalyzes’) These refer neither to functions nor to actions but rather to components.
ifomis.de 24 The Molecular Activity Ontology Confusion remedied to a degree by policy change of March 2003: ‘All GO molecular function term names [with the exception of the parent term molecular function and of the whole node binding] are to be appended with the word “activity”.’
ifomis.de 25 ‘Function’ = ‘Activity’ Thus the term ‘structural molecule,’ which is defined as meaning: ‘the action of a molecule that contributes to structural integrity,’ is amended to ‘structural molecule activity’
ifomis.de 26 still problem’s with GO Molecular Function Definitions anti-coagulant activity (defined as: “a substance that retards or prevents coagulation”) enzyme activity (defined as: “a substance that catalyzes”)
ifomis.de 27 … and there are still problems with Molecular Function terms GO: : structural constituent of cell wall
ifomis.de 28 structural constituent of cell wall Definition: The action of a molecule that contributes to the structural integrity of a cell wall. confuses actions, which GO includes in its function ontology, with constituents, which GO includes in its cellular component ontology
ifomis.de 29 extracellular matrix structural constituent puparial glue (sensu Diptera) structural constituent of bone structural constituent of chorion (sensu Insecta) structural constituent of chromatin structural constituent of cuticle structural constituent of cytoskeleton structural constituent of epidermis structural constituent of eye lens structural constituent of muscle structural constituent of myelin sheath structural constituent of nuclear pore structural constituent of peritrophic membrane (sensu Insecta) structural constituent of ribosome structural constituent of tooth enamel structural constituent of vitelline membrane (sensu Insecta)
ifomis.de 30 The Biological Process Ontology biological process: ‘A phenomenon marked by changes that lead to a particular result, mediated by one or more gene products.’ Examples: glycolysis, death, adult walking behavior response to blue light
ifomis.de 31 Occurrents Both molecular activity and biological process terms refer to what philosophical ontologists call occurrents = entities which do not endure through time but rather unfold themselves in successive temporal phases. Occurrents can be segmented into parts along the temporal dimension. Continuants exist in toto in every instant at which they exist at all.
ifomis.de 32 Molecular functions and biological processes are closely interrelated E.g. the process anti-apoptosis involves the molecular function apoptosis inhibitor activity. How can GO express such relations?
ifomis.de 33 Are they a matter of granularity? ‘A biological process is accomplished via one or more ordered assemblies of molecular functions.’ ??? Molecular activities = building blocks of biologica processes ??? So: Functions are parts of processes But no:
ifomis.de 34 GO’s three ontologies are separate No links or edges defined between them molecular functions cellular constituents biological processes
ifomis.de 35 Question: How understand granularity if not in terms of parthood?
ifomis.de 36 Molecular functions renamed ‘activities’, because ‘activity’ unlike ‘process’, connotes agency ? but molecules are not agents hypothesis: the term ‘function’ was used for the molecular function ontology because the activities in question are functional in relation to the pertinent organism.
ifomis.de 37 Functions A function is functional = beneficial to the organism If an organism-part has a function, this is because the functioning of this organism- part is beneficial to the organism The function of the heart is to pump blood Not: the function of the hip is to financially support hip-replacement surgeons
ifomis.de 38 Some processes are functionings E.g. pumping blood
ifomis.de 39 Two sorts of processes 1.Functionings (realizations of functions = beneficial to the organism) 2.Other processes (e.g. the result of external interventions) Cf. difference between physiology and pathology
ifomis.de 40 GO not clear about this distinction transport: The directed movement of substances (such as … ions) into, out of, or within a cell cell growth and/or maintenance: Any process required for the survival and growth of a cell Synonym: cell physiology transport is-a cell growth and/or maintenance but (GO: ) viral intracellular protein transport is-a transport
ifomis.de 41 Why do these problems arise? GO has no clear understanding of the role of temporal relations in organizing an ontology (thus also no clear understanding of the difference between a function and the activity which is the realization of a function)
ifomis.de 42 GO excludes organisms from its scope (they are of the wrong granularity) Yet each process or function requires some bearer or bearers which it is the process or function of. Processes are dependent on their bearers (Theory of dependence vs. independence part of formal ontology) (Theory of continuants vs. occurrents part of formal ontology)
ifomis.de 43 Some formal ontology Components are independent continuants Functions are dependent continuants (the function of an object exists continuously in time, just like the object which has the function; and it exists even when it is not being exercised) Processes are (dependent) occurrents
ifomis.de 44 More generally: Continuants can be divided into independent (objects, things, components) and dependent (features, attributes, conditions, functions, roles, qualities …) All occurrents are dependent entities. Every occurrent is dependent for its existence on one or more continuants. A change is always a change in some continuant object.
ifomis.de 45
ifomis.de 46 Part Two Principles of Biomedical Ontologies and their use in quality assurance of terminology-based ontologies
ifomis.de 47 Principle of Temporal Coherence An ontology should rigorously distinguish continuants from occurrents. (Anatomy is a science of continuants)
ifomis.de 48 Principle of Dependence If an ontology recognizes a dependent entity then it (or a linked ontology) should recognize also the relevant class of bearers Part of our aim here is to lay down principles which can support such linkability
ifomis.de 49 Linking to external ontologies can also help to link together GO’s own three separate parts
ifomis.de 50 GO’s three ontologies molecular functions cellular constituent s biological processes dependent independent
ifomis.de 51 GO’s three ontologies molecular functions cellular constituent s organism- level biological processes cellular processes
ifomis.de 52 ‘part-of’; ‘is dependent on’ molecular functions molecule complexe s cellular processes cellular constituents organism- level biological processes organisms
ifomis.de 53 molecular functions molecule complexe s cellular processes cellular constituents organism- level biological processes organisms
ifomis.de 54 molecule complexe s cellular constituent s molecular function s cellular functions organism- level biological functions organisms molecular processe s cellular processes organism- level biological processes
ifomis.de 55 molecule complexe s cellular constituent s molecular function s cellular functions organism- level biological functions organisms molecular processe s cellular processes organism- level biological processes functionings
ifomis.de 56 GO must be linked with other, neighboring ontologies GO has: adult walking behavior but not adult GO has: eye pigmentation but not eye GO has: response to blue light but not light (or blue) 94% of words used in GO terms are not GO terms Part of the solution “Medical FactNet” (NLM, 10am tomorrow)
ifomis.de 57 GO taking steps in this direction Linking to a good external ontology of organism types (to solve some of the problems with sensu) It needs to link further to a good external ontology of anatomy, to solve the location problem and to a good external ontology of coarse- grained reality, to solve the adult walking behavior problem Human beings know what ‘walking’ means
ifomis.de 58 Human beings know what adults are older than embryos GO needs to be linked to ontology of development and in general to resources for reasoning about time and change
ifomis.de 59 but such linkages are possible only if GO itself has a coherent formal architecture
ifomis.de 60 Principle of Univocity univocity: terms should have the same meanings (and thus point to the same referents) on every occasion of use UMLS-Semantic Network: ‘organization’ = body plan (anatomy) ‘organization’ = social organization
ifomis.de 61 Polysemy of GO’s part-of – membrane part-of cell, intended to mean “a membrane is a part-of any cell” – flagellum part-of cell, intended to mean “a flagellum is part-of some cells” – replication fork part-of cell cycle, intended to mean: “a replication fork is part-of the nucleoplasm only during certain times of the cell cycle”
ifomis.de 62 Three meanings of ‘part-of ’ ‘part-of’ = ‘can be part of’ (flagellum part-of cell) ‘part-of’ = ‘is sometimes part of’ (replication fork part-of the nucleoplasm) ‘part-of’ = ‘is included as a sublist in’
ifomis.de 63 THE GOAL IS: not to impose basic principles of classification and definition on biologists – All the principles presented here should be conceived not as iron requirements but rather as rules of thumb – deviation from which is often marked by characteristic families of coding errors
ifomis.de 64 example [GO: ] host cell cytoplasm, defined as: The cytoplasm of a host cell [GO: ] host, defined as: Any organism in which another organism, especially a parasite or symbiont, spends part or all of its life cycle and from which it obtains nourishment and/or protection
ifomis.de 65 Why is this an error? because organisms do not fall within the scope of GO An organism is not a cellular component, and it is not a molecular function, and not a biological process, either
ifomis.de 66 host cell cytoplasm part-of host breaks GO’s own granularity constraints
ifomis.de 67 Why univocity? 1.humans are good at disambiguating ambiguous expressions, machines not 2. quality assurance and ontology maintenance 3. GO, SNOMED, etc., are designed to constitute ‘controlled vocabularies’
ifomis.de 68 Quality assurance and ontology maintenance must be automated As GO increases in size and scope it will “be increasingly difficult to maintain the semantic consistency we desire without software tools that perform consistency checks and controlled updates”. The addition of each new term will require the curator to understand the entire structure of GO in order to avoid redundancy and to ensure that all appropriate linkages are made with other terms.
ifomis.de 69 The purpose of a ‘controlled vocabulary’ = to ensure that the same terms are used by different research groups with the same meanings this has implications also for the syntax of GO terms (= the way terms are compounded together out of other terms)
ifomis.de 70 Univocity and syntax The story of ‘ / ’
ifomis.de 71 / GO: microtubule/kinetochore interaction =df Physical interaction between microtubules and chromatin via proteins making up the kinetochore complex
ifomis.de 72 / GO: ciliary/flagellar motility =df Locomotion due to movement of cilia or flagella.
ifomis.de 73 / GO: negative regulation of chromatin assembly/disassembly =df Any process that stops, prevents or reduces the rate of chromatin assembly and/or disassembly
ifomis.de 74 / GO: G1/S transition of mitotic cell cycle defined as: Progression from G1 phase to S phase of the standard mitotic cell cycle.
ifomis.de 75 / GO: interpretation of nuclear/cytoplasmic to regulate cell growth =df The process where the size of the nucleus with respect to its cytoplasm signals the cell to grow or stop growing.
ifomis.de 76 / GO: hexuronate (glucuronate/galacturonate) porter activity =df Catalysis of the reaction: hexuronate(out) + cation(out) = hexuronate(in) + cation(in)
ifomis.de 77 Problems with GO’s compositionality / (slash)286 : (semi-colon)177, (comma)1206 and180
ifomis.de 78 comma cytokinesis, site selection
ifomis.de 79 plurals biological process physiological processes cellular process cell growth and/or maintenance
ifomis.de 80 specification39complex563 formation; forming 142 regulator; regulatory; regulated; regulation 1326 determination; determinacy 56acting on146 with54constituting35 from141constituent; constitutive29 in51dependent182 via164sensu469
ifomis.de 81 Questions regarding operators How does ‘constituent’ relate to ‘component’ If A within B then is A part-of B or included- in-the-interior-of B ? Does via mean by means of or along the path of ? How is ‘un-’ related to ‘not’ (how is ‘unlocalized’ related to ‘not localized’)
ifomis.de 82 ‘involved in’ term-forming operator (reflection of GO’s limited resources for expressing relations): hydrolase activity, acting on acid anhydrides, involved in cellular and subcellular movement asymmetric protein localization involved in cell fate commitment cell-cell signaling involved in cell fate commitment protein secretion involved in cell fate commitment
ifomis.de 83 ‘involved in’ hydrolase activity, acting on acid anhydrides, involved in cellular and subcellular movement This is a term because GO does not have the resources to express ‘is-involved-in’ as a relation between terms note problems with commas
ifomis.de 84 ‘involved in’ hydrolase activity, acting on acid anhydrides, involved in cellular and subcellular movement is-a hydrolase activity, acting on acid anhydrides
ifomis.de 85 ‘involved in’ hydrolase activity, acting on acid anhydrides, involved in cellular and subcellular movement is-a hydrolase activity, acting on acid anhydrides is ok: hydrolase activity, acting on anhydrides can but need not be involved in cellular and subcellular movement
ifomis.de 86 ‘involved in’ asymmetric protein localization involved in cell fate commitment is-a cell fate commitment should be a part-of relation (compare: breathing involved in running is a running)
ifomis.de 87 ‘involved in’ cell-cell signaling involved in cell fate commitment is-a cell fate commitment ditto: should be a part-of relation
ifomis.de 88 these, though, are good: asymmetric protein localization involved in cell fate commitment is-a asymmetric protein localization cell-cell signaling involved in cell fate commitment is-a cell-cell signaling
ifomis.de 89 ‘involved in’ protein secretion involved in cell fate commitment synonym of protein secretion are there instances of protein secretion not involved in cell fate commitment? … Problems with GO’s peculiar use of ‘synonym’
ifomis.de 90 Consequences of inconsistent and/or indeterminate use of operators there are 29.42% distinct terms within GO which contain one or more polysemous operators but these terms receive only 13.96% of the annotations present within GO Hypothesis: This lower percentage of annotations reflects the fact that poorly defined operators are not well understood by annotators, who thus avoid the corresponding terms
ifomis.de 91 Principle of Compositionality The meanings of compound terms should be determined 1. by the meanings of constituent terms together with 2. the rules governing the syntactic operators
ifomis.de 92 Principle of Objectivity which classes exist is not a function of our biological knowledge. (Terms such as ‘unclassified’ or ‘unknown ligand’ or ‘not otherwise classified as peptides’ do not designate biological natural kinds.)
ifomis.de 93 GO: cellular component unknown cellular component unknown is-a cellular component unlocalized is-a cellular component Holliday junction helicase complex is-a unlocalized
ifomis.de 94 GO’s excuse ‘unlocalized’ is used as a placeholder only but automatic information retrieval systems cannot distinguish it from other, genuine class names formal tools exist which can deal with the addition of knowledge into a classification system without the need to create fake classes (Theory of Granular Partitions)
ifomis.de 95 Principle of Positivity Class names should be positive. Logical complements of classes are not themselves classes. (Terms such as ‘non-mammal’ or ‘non- membrane’ or ‘invertebrate’ or do not designate natural kinds.)
ifomis.de 96 Terms such as ‘Veterinary proprietary drug AND/OR biological’ * do not designate natural kinds. (Which biological classes exist is not a matter of logic.) *has 2532 children in SNOMED-CT
ifomis.de 97 Principle of Explicitness if a link between two classes holds only under certain specific restrictions, then this restriction should be made explicit in the statement of the corresponding link-axiom cf. GO’s sensu
ifomis.de 98 GO can in practice be used only by trained biologists (with know how) whether a GO-term truly stands in the is_a relation depends e.g. on the type of organism involved glycosome is part-of cytoplasm only for Kinetoplastidae Computers have no counterpart of such context-dependent know-how
ifomis.de 99 Principle of Single Inheritance no class in a classificatory hierarchy should have more than one parent on the immediate higher level
ifomis.de 100 Principle of Taxonomic Levels the terms in a classificatory hierarchy should be divided into predetermined levels (analogous to the levels of kingdom, phylum, class, order, etc., in traditional biology). depth in GO’s hierarchies not determinate because of multiple inheritance
ifomis.de 101 Principle of Partonomic Levels Terms in a partonomic hierarchy should be divided into predetermined granularity levels, for example: organism, organ, cell, molecule, etc.) (GO is about to break physiological process into 'cell physiological process' and 'organism physiological process'.) = take granularity seriously
ifomis.de 102 Principle of Exhaustiveness the classes on any given level should exhaust the domain of the classificatory hierarchy.
ifomis.de 103 Single Inheritance + Exhaustiveness = JEPD for: Jointly Exhaustive and Pairwise Disjoint Exhaustiveness often difficult to satisfy in the realm of biological phenomena; but its acceptance as an ideal is presupposed as a goal by every scientist. Single inheritance accepted in all traditional (species-genus) classifications, now under threat because multiple inheritances is a computationally useful device (allows one to avoid certain kinds of combinatory explosion).
ifomis.de 104 Problems with multiple inheritance B C is-a 1 is-a 2 A ‘is-a’ no longer univocal
ifomis.de 105 GO’s ‘is-a’ is pressed into service to mean a variety of different things the resulting ambiguities make the rules for correct coding difficult to communicate to human curators in terms of generally intelligible principles they also serve as obstacles to integration with neighboring ontologies
ifomis.de 106 Problems with multiple inheritance B C is-a 1 is-a 2 A E D ‘sibling’ is no longer determinate Principle of levels is violated
ifomis.de 107
ifomis.de 108
ifomis.de 109 A storage vacuole is not a special kind of vacuole a box used for storage is not a special kind of box
ifomis.de 110
ifomis.de 111 Another term-forming operator lytic vacuole within a protein storage vacuole lytic vacuole within a protein storage vacuole is-a protein storage vacuole time-out within a baseball game is-a baseball game embryo within a uterus is-a uterus
ifomis.de 112 Problems with Location is-located-at / is-located-in and similar relations need to be expressed in GO via some combination of ‘is-a’ and ‘part-of’ … is-a unlocalized is-a site of … within … … in …
ifomis.de 113 Problems with location extrinsic to membrane part-of membrane extrinsic to plasma membrane part-of plasma membrane extrinsic to vacuolar membrane part-of vacuolar membrane
ifomis.de 114 Differentiation and Development development cellular process cell differentiation
ifomis.de 115 Cell differentation is-a development But according to GO’s own definitions the agent or subject of differentiation is the cell, while the agent or subject of development is the whole organism (again: GO has problems in keeping track of entities on differerent levels of granularity)
ifomis.de 116 cell differentiation is-a development but: hemocyte differentiation hemocyte development part-of
ifomis.de 117 GO: : garland cell differentiation Definition: Development of garland cells, a small group of nephrocytes which take up waste materials from the hemolymph by endocytosis. (Illustrates GO’s problems with definitions)
ifomis.de 118
ifomis.de 119 Part Three How to do things right so far only scratched the surface: sensu synonyms GO’s definitions GO’s ‘logical relationships’
ifomis.de 120 Principles for GO terms Temporal coherence Dependence Univocity Compositionality Objectivity Positivity Explicitness Taxonomic Levels Partonomic Levels Single Inheritance Exhaustiveness
ifomis.de 121 Should these principles be satisfied? Michael Ashburner: GO’s philosophy from the beginning was ‘just in time’ - that is, we made no great attempt to ‘complete’ the ontologies …. If you try and ‘complete’ an ontology, or worse: try and ‘get it right,’ then you will fail …
ifomis.de 122 Can these principles be satisfied? Compare GO with Foundational Model of Anatomy (FMA)
ifomis.de 123 PrincipleGOFMA Temporal coherenceNoN/A DependenceNoN/A UnivocityNoYes CompositionalityNoYes ObjectivityNoYes PositivityNoYes ExplicitnessNoN/A Taxonomic LevelsNoYes Partonomic LevelsNoYes Single InheritanceNoYes ExhaustivenessNo
ifomis.de 124 The End PrincipleGOFMA Temporal coherence NoN/A DependenceNoN/A UnivocityNoYes CompositionalityNoYes ObjectivityNoYes PositivityNoYes ExplicitnessNoN/A Taxonomic Levels NoYes Partonomic Levels NoYes Single Inheritance NoYes ExhaustivenessNo
ifomis.de 125 Is GO an ontology GO a controlled vocabulary = (ramshackle) syntactic regimentation but because is-a and part-of are not given uniform readings, this does NOT mean the sort of semantic regimentation which would amount to an ontology in the proper sense of the word
ifomis.de 126 rules for definitions intelligibility: the terms used in a definition should be simpler (more intelligible) than the term to be defined definitions: do not confuse definitions with the communication of new knowledge
ifomis.de 127 Principle of Substitutability in all so-called extensional contexts a defined term should be substitutable by its definition in such a way that the result is both grammatically correct and has the same truth-value as the sentence with which we begin GO: : toxin activity Definition: Acts as to cause injury to other living organisms.
ifomis.de 128 substitutability There is toxin activity here There is acts as to cause injury to other living organisms here
ifomis.de 129 Defining is-a A is-a B = every instance of A is an instance of B A is-a B = A and B are natural kinds and every instance of A is an instance of B A is-a B = A and B are natural kinds and every instance of A is as a matter of necessity an instance of B
ifomis.de 130 Solutions to these problems ‘part_of’ should mean: part_of