Werner CEUSTERS, Barry SMITH AMIA 2006 A Realism-Based Approach to the Evolution of Biomedical Ontologies Washington, DC, USA. November 14, 2006 Werner CEUSTERS, Barry SMITH Center of Excellence in Bioinformatics and Life Sciences University at Buffalo, NY, USA http://www.org.buffalo.edu/RTU
An unfortunate perception of ‘ontology’ The most widespread view of what an ontology is, is that of ‘an explicit specification of the conceptualization of a domain’ (Gruber), often complemented with the notion of ‘agreement’.
Unfortunately, this view on ontology has sad consequences Too much effort goes into the specification business OWL, DL-reasoners, translators and convertors, syntax checkers, ... Too little effort into the faithfulness of the conceptualizations towards what they represent. Pseudo-separation of language and entities “absent nipple” Many ‘ontologies’ and ontology-like systems exhibit mistakes of various sorts.
The same erroneous focus towards ontology evolution Ontology versions exhibit differences of the following sorts: Add a subtree Delete a subtree Move a subtree to a different location Move a set of sibling classes to a different location Create a new abstraction and move a set of siblings down in a class hierarchy, creating a new superclass. Delete a class, moving its subclasses to become subclasses of its superclass. Split a class Merge classes Noy, N.F., Kunnatur, S., Klein, M., and Musen, M.A. Tracking changes during ontology evolution. In Proceeding of the 3rd International Semantic Web Conference (ISWC2004), Hiroshima, Japan, November 2004.
The remedy: a realist view of the world The world exists ‘as it is’ prior to a cognitive agent’s perception thereof; Cognitive agents build up ‘in their minds’ cognitive representations of the world; To make these representations publicly accessible in some enduring fashion, these agents create representational artifacts that are fixed in some medium. Smith B, Kusnierczyk W, Schober D, Ceusters W. Towards a Reference Terminology for Ontology Research and Development in the Biomedical Domain. Proceedings of KR-MED 2006, November 8, 2006, Baltimore MD, USA
“concept representation” We should not be in the business of But beware ! These concretizations are NOT supposed to be the representations of these cognitive representations; “concept representation” “Despite the fact there is an IMIA WG on this” We should not be in the business of
Example: a person (in this room) ’s phenotypic gender Reality of phenotypic gender Male Female Cognitive representation [male] [female] In the EHR: “male” “female” “unknown” Other types of phenotypic gender ?
But beware ! These concretizations are NOT supposed to be the representations of these cognitive representations; They are representations of that part of reality of which cognitive agents have built a cognitive representation They are like the images taken by means of a high quality camera;
They are not (or should not be) like the paintings of Salvador Dali Non-canonical (although nice looking) anatomy
Representational artifacts Ideally built out of representational units and relationships that mirror the entities and their relationships in reality. Non-Formalized Formalized Primarily about particulars Progress notes, discharge letters, medical summaries, maps Inventories, referent tracking database Primarily about universals and defined classes Medical textbooks, scientific theories Ontologies, terminologies,
Some characteristics of representational units each unit is assumed by the creators of the representation to be veridical, i.e. to conform to some relevant POR as conceived on the best current scientific understanding several units may correspond to the same POR by presenting different though still veridical views or perspectives; what is to be represented by the units in a representation depends on the purposes which the representation is designed to serve.
What influences the development of ontologies? Level of biomedical reality: Persons, diseases, pathological structures and formations,... do exist as particulars (p, d, ps, pf, ... ) and universals (P, D, PS, PF, ...), and are related in specific ways prior to our perception; Biomedical reality changes: d’s, p’s, ... come and go; D’s, P’s, ... Level of biomedical science and case perception: Mirrors reality only partially Evolves over time towards better understanding
What influences the development of ontologies? Level of concretizations Mirrors biomedical science and case perception only partially Editing mistakes Leaving out diseases or pathological behaviours for non-biomedical reasons Smoking Adding non-pathological behaviour as a disease Homosexuality Leaving out wat is considered not relevant
Reality versus beliefs, both in evolution p3 Reality IUI-#3 O-#0 O-#2 Belief O-#1 = “refers to” = what constitutes the meaning of representational units …. Therefore: O-#0 is meaningless
Milestones in scientific discovery and case assessment. U1 U2 R p3 IUI-#3 O-#0 O-#2 B O-#1
Belief-reality mismatches in ontologies U1 U2 R p3 IUI-#3 O-#0 O-#2 B O-#1 Total ignorance. A disease already exists, but unnoticed thus far.
Belief-reality mismatches in ontologies U1 U2 R p3 IUI-#3 O-#0 O-#2 B O-#1 False belief in the existence of a type, e.g. ‘diabolic possession’
Belief-reality mismatches in ontologies U1 U2 R p3 IUI-#3 O-#0 O-#2 B O-#1 The coming into existence of a new universal remains unnoticed: ‘AIDS’ existed before being discovered.
Belief-reality mismatches in ontologies U1 U2 R p3 IUI-#3 O-#0 O-#2 B O-#1 An advance in science: the existence of U1 is acknowledged.
Reality versus beliefs, both in evolution p3 IUI-#3 O-#0 O-#2 B O-#1 The existence of John Doe’s aberrant behavior is acknowledged, however, believed to be an instance of what in reality is a fantasy: diabolic possession.
Reality versus beliefs, both in evolution p3 IUI-#3 O-#0 O-#2 B O-#1 Another advance in science: the false belief O-#0 is rightfully abandoned, necessitating therefore to reconsider of what p3 must be believed to be an instance of.
Reality versus beliefs, both in evolution p3 IUI-#3 O-#0 O-#2 B O-#1 Advance in science: “AIDS” is discovered as a new disease entity.
Reality versus beliefs, both in evolution p3 IUI-#3 O-#0 O-#2 B O-#1 For ‘utilitarian’ reasons, policy makers force smoking to be removed from an ontology as risk factor for cancer.
An “optimal” ontology (1) Because ontologies, as conceived on realist terms, are artifacts created for some purpose (e.g. to serve as controlled vocabulary, or to provide domain knowledge to a software application), are at the same time intended to mirror reality, should allow reasoning which is efficient from a computational point of view, we argue that an optimal ontology should constitute a representation of all and only those portions of reality that are relevant for its purpose.
An “optimal” ontology (2) Each term in such an ontology would designate (1) a single portion of reality (POR), which is (2) relevant to the purposes of the ontology and such that (3) the authors of the ontology intended to use this term to designate this POR, and (4) there would be no PORs objectively relevant to these purposes that are not referred to in the ontology.
But things may go wrong … assertion errors: ontology developers may be in error as to what is the case in their target domain; relevance errors: they may be in error as to what is objectively relevant to a given purpose; encoding errors: they may not successfully encode their underlying cognitive representations, so that particular representational units fail to point to the intended PORs.
Key requirement for versioning Any change in an ontology or data repository should be associated with the reason for that change to be able to assess later what kind of mistake has been made !
Example: a person (in this room) ’s gender in the EHR In John Smith’s EHR: At t1: “male” at t2: “female” What are the possibilities ? Change in reality: transgender surgery change in legal self-identification Change in understanding: it was female from the very beginning but interpreted wrongly Correction of data entry mistake (was understood as male, but wrongly transcribed)
Ways representational units do or do not refer OE: objective existence; ORV: objective relevance; BE: belief in existence; BRV: belief in relevance; Int.: intended encoding; Ref.: manner in which the expression refers; G: typology which results when the factor of external reality is ignored. E: number of errors when measured against the benchmark of reality. P/A: presence/absence of term.
Ways representational units do or do not refer OE/BE value pairs Y/Y: correct assertion of the existence of a POR; Y/N: lack of awareness of a POR, reflecting an assertion error; N/N: correct assertion that some putative POR does not exist ; N/Y: the false belief that some putative POR exists.
Ways representational units do or do not refer Ref.: manner in which the expression refers R+: the encoding of the belief is correct R: the encoding is incorrect because it does not refer R-: it does refer, but to a POR other than the one which was intended.
Possible evolutions through versions
Possible evolutions through versions Example: a relevant entity ceases to exist, but the representation is not updated:
Updating is an active process authors assume in good faith that all included representational units are of the P+1 type, and all they are aware of, but not included, of A+1 or A+2. If they become aware of a mistake, they make a change under the assumption that their changes are also towards the P+1, A+1, or A+2 cases. Thus at that time, they know of what type the previous entry must of have been under the belief what the current one is, and the reason for the change.
This leads to a calculus … NOT: to demonstrate how good an individual version of an ontology is, But rather to measure how much it improved (hopefully) as compared to its predecessors. Principle: recursive belief revision
Backward belief revision over time Reality: a POR exists and is not relevant R P Beliefs At t about t -2 At time t, the authors of an ontology correctly perceive the existence of some universal, but consider it relevant while it isn’t, and they make an encoding error such that the representational unit does not refer. There is thus a -2 error with respect to reality, but this remains, of course, unknown.
Backward belief revision over time Reality: a POR exists and is not relevant R P Beliefs At t about t -2 At t+1 about t+1 At t+1 about t At t+1, they correct the encoding mistake, which forces them to believe that at t, the unit-reality configuration was of type P-4 rather than P+1.
Backward belief revision over time Reality: a POR exists and is not relevant R P Beliefs At t about t -2 At t+1 about t+1 At t+1 about t -1 -1 Although they believe that the current situation is P+1, it is in reality P-6, where it was P-7 before. The real error is now -1, while the perceived error with respect to t is also -1
Backward belief revision over time Reality: a POR exists and is not relevant R P Beliefs At t about t -2 At t+1 about t+1 At t+1 about t -1 -1 At t+2, the authors believe that the posited POR in fact does not exist
Backward belief revision over time Reality: a POR exists and is not relevant R P Beliefs At t about t -2 At t+1 about t+1 At t+1 about t -1 -1 At t+2 about t+2 At t+2 about t+1 At t+2 about t -1 -3 -5
Can this be implemented ? Manual burden is low: documenting the reason for a change clicking one radio button. The change of belief revisions is automatically computable from the table shown earlier.
Using this approach for sampling large ontologies Future directions Using this approach for sampling large ontologies Integration of confidence levels Replacing the Y/N dichotomy for beliefs A perhaps more elaborate way of counting errors. Use as a tool for educational purposes to compare the beliefs of various stakeholders, not the least concerning relevance (clinicians, biologists, informaticians, …)