Medical Ontologies: An Overview Barry Smith
2 IFOMIS Institute for Formal Ontology and Medical Information Science Faculty of Medicine University of Leipzig
ifomis.de 3 Partners Laboratory for Applied Ontology, Trento and Rome Language & Computing nv, Zonnegem, Belgium Ontology Works, Baltimore Structural Informatics Group, Department of Biological Structure, University of Washington, Seattle, USA Cognitive Science Laboratory, Princeton University
ifomis.de 4 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture
ifomis.de 5 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture
ifomis.de 6 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture
ifomis.de 7 IFOMIS Institute for Formal Ontology and Medical Information Science Leipzig philosophers and medical informaticians attempting to build and test a Basic Formal Ontology for applications in biomedical and related domains
ifomis.de 8 IFOMIS use basic principles of philosophical ontology for quality assurance and alignment of biomedical ontologies
ifomis.de 9 Compare: 1)pure mathematics (theories of structures such as order, set, function, mapping) employed in every domain 2)applied mathematics, applications of these theories = re-using the same definitions, theorems, proofs in new application domains 3)physical chemistry, biophysics, etc. = adding detail
ifomis.de 10 Three levels of ontology 1)formal (top-level) ontology = medical ontology has nothing like the technology of definitions, theorems and proofs provided by pure mathematics 2) domain ontology = UMLS Semantic Network, GALEN CORE 3) terminology-based ontology = UMLS, SNOMED-CT, GALEN, FMA ?????
ifomis.de 11 Strategy Part 1: Provide an overview of medical ontologies and of the top-level ontologies which they implicitly define Part 2: Show how principles of classification and definition derived from top-level ontology can help in quality assurance of terminology- based ontologies and in ontology alignment Part 3: The Gene Ontology Part 4: Medical Fact Net
ifomis.de 12
ifomis.de 13 UMLS Semantic Network entity event physical conceptual object entity
ifomis.de 14 UMLS Semantic Network entity event physical conceptual object entity
ifomis.de 15 conceptual entity Organism Attribute Finding Idea or Concept Occupation or Discipline Organization Group Group Attribute Intellectual Product Language
ifomis.de 16 conceptual entity idea or concept functional concept body system
ifomis.de 17 entity physical conceptual object entity idea or concept functional concept body system confusion of entity and concept
ifomis.de 18 Functional Concept: Body system is_a Functional Concept. but: Concepts do not perform functions or have physical parts.
ifomis.de 19 This: is not a concept
ifomis.de 20 The Hydraulic Equation BP = CO*PVR arterial blood pressure is directly proportional to the product of blood flow (cardiac output, CO) and peripheral vascular resistance (PVR)
ifomis.de 21 Confusion of Ontology and Epistemology blood pressure is an Organism Function, cardiac output is a Laboratory or Test Result or Diagnostic Procedure BP = CO*PVR thus asserts that blood pressure is proportional either to a laboratory or test result or to a diagnostic procedure
ifomis.de 22 entities independent dependent occurrents continuants continuants (always dependent) ORGANISMS ROLES PROCESSES CELLS FUNCTIONS HISTORIES MOLECULES CONDITIONS LIVES (diseases) (courses of diseases)
ifomis.de 23 entities independent dependent occurrents continuants continuants (always dependent) ORGANISMS ROLES PROCESSES CELLS FUNCTIONS HISTORIES MOLECULES CONDITIONS LIVES (diseases) (courses of diseases) classes instances
ifomis.de 24 A three-category ontology along these lines accepted by DOLCE = first module of Semantic Web Wonderweb Foundational Ontologies Library BFO = IFOMIS Basic Formal Ontology L&C LinKBase UMLS-SN Gene Ontology
ifomis.de 25
Principles for Building Medical Ontologies
ifomis.de 27 Examples Don’t confuse entities with concepts Don’t confuse domain entities with logical or computational structures Don’t confuse ontology with epistemology Don’t confuse is_a with has_role
ifomis.de 28 Further Principles univocity: terms should have the same meanings (and thus point to the same referents) on every occasion of use UMLS-SN: ‘organization’ = body plan ‘organization’ = social organization
ifomis.de 29 univocity Gene Ontology: ‘part_of’ = ‘can be part of’ (flagellum part_of cell) ‘part_of’ = ‘is sometimes part of’ (replication fork part_of the nucleoplasm) ‘part_of’ = ‘is included as a sublist in’
ifomis.de 30 don’t forget instances part_of as a relation between classes vs. part as a relation between instances A part_of B 1.every instance of A is part of some instance of B 2.every instance of B has some instance of A as part
ifomis.de 31 Part_of as a relation between classes is more problematic than is standardly supposed testis part_of human being ? heart part_of human being ?
ifomis.de 32 objectivity which classes exist is not a function of our biological knowledge. (Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds.) GO: aminoadipate-semialdehyde dehydrogenase complex is_a unlocalized
ifomis.de 33 rules for definitions intelligibility: the terms used in a definition should be simpler (more intelligible) than the term to be defined definitions: do not confuse definitions with the communication of new knowledge
ifomis.de 34 substitutability in all so-called extensional contexts a defined term should be substitutable by its definition in such a way that the result is both grammatically correct and has the same truth-value as the sentence with which we begin GO: : toxin activity Definition: Acts as to cause injury to other living organisms.
ifomis.de 35 substitutability There is toxin activity here There is acts as to cause injury to other living organisms here
ifomis.de 36
ifomis.de 37 GO: the Gene Ontology 3 large telephone directories of standardized designations for gene functions and products organized into hierarchies via is_a and part_of
ifomis.de 38 GO can in practice be used only by trained biologists (with know how) whether a GO-term truly stands in the is_a relation depends e.g. on the type of organism involved glycosome is part-of cytoplasm only for Kinetoplastidae Computers have no counterpart of such context-dependent know-how
ifomis.de 39 GO divided into three disjoint term hierarchies the cellular component ontology, e.g. flagellum, chromosome, cell the molecular function ontology, e.g. ice nucleation, binding, protein stabilization the biological process ontology, e.g. glycolysis, death
ifomis.de 40 Primary aim of GO not rigorous definition and principled classification but rather: providing a practically useful framework for keeping track of the biological annotations that are applied to gene products
ifomis.de 41 Thesis 1 With increasing size, GO will be required to increase the degree to which it is a controlled vocabulary which satisfies not merely the needs of human biologists but also the needs of automatic consistency- checking and updating systems
ifomis.de 42 Thesis 2 GO can realize its goal more adequately (and avoid many coding errors) by taking ontology (especially the logic of classifications and definitions) seriously
ifomis.de 43 GO: the Gene Ontology GO divided into 3 separate hierarchies each organized via is_a and part_of
ifomis.de 44 Problems with is_a A is_a B = every instance of A is an instance of B
ifomis.de 45 Problems with is_a Holliday junction helicase complex is_a unlocalized protein storage vacuole is_a vacuole (sensu Streptophyta) R7 differentiation is_a eye photoreceptor differentiation (sensu Drosophilia).
ifomis.de 46 Uses of part_of – membrane part-of cell, intended to mean “a membrane is a part-of any cell” – flagellum part-of cell, intended to mean “a flagellum is part-of some cells” – replication fork part-of cell cycle, intended to mean: “a replication fork is part-of the nucleoplasm only during certain times of the cell cycle” – regulation of sleep part-of sleep, should be corrected to: “regulation of sleep is co-located with and is causally involved with the sleep process”.
ifomis.de 47 Problems with part_of ‘part_of’ = ‘can be part of’ (flagellum part_of cell) ‘part_of’ = ‘is sometimes part of’ (replication fork part_of the nucleoplasm) ‘part_of’ = ‘is included as a sublist in’
ifomis.de 48 Problem’s with GO Molecular Functions anti-coagulant activity (defined as: “a substance that retards or prevents coagulation”) enzyme activity (defined as: “a substance that catalyzes”) structural molecule (defined as: “the action of a molecule that contributes to structural integrity”)
ifomis.de 49 GO: : structural constituent of cell wall Definition: The action of a molecule that contributes to the structural integrity of a cell wall. confuses actions, which GO includes in its function ontology, with constituents, which GO includes in its cellular component ontology
ifomis.de 50 extracellular matrix structural constituent + puparial glue (sensu Diptera) structural constituent of bone structural constituent of chorion (sensu Insecta) structural constituent of chromatin structural constituent of cuticle + structural constituent of cytoskeleton structural constituent of epidermis + structural constituent of eye lens structural constituent of muscle structural constituent of myelin sheath structural constituent of nuclear pore structural constituent of peritrophic membrane (sensu Insecta) structural constituent of ribosome structural constituent of tooth enamel structural constituent of vitelline membrane (sensu Insecta)
ifomis.de 51 Why do these problems arise? Because GO has no clear formal understanding of the role of temporal relations in organizing an ontology (thus also no clear understanding of the difference between a function and the activity which is the realization of a function – GO runs these two together)
ifomis.de 52 As GO increases in size and scope it will “be increasingly difficult to maintain the semantic consistency we desire without software tools that perform consistency checks and controlled updates”. The addition of each new term will require the curator to understand the entire structure of GO in order to avoid redundancy and to ensure that all appropriate linkages are made with other terms.
ifomis.de 53 Problems with GO’s compositionality sensu / : + with from in resulting regulating regulation of complex constituting constitution
ifomis.de 54 / GO: microtubule/kinetochore interaction =df Physical interaction between microtubules and chromatin via proteins making up the kinetochore complex, GO: ciliary/flagellar motility =df Locomotion due to movement of cilia or flagella.
ifomis.de 55 / GO: negative regulation of chromatin assembly/disassembly =df Any process that stops, prevents or reduces the rate of chromatin assembly and/or disassembly GO: G1/S transition of mitotic cell cycle defined as: Progression from G1 phase to S phase of the standard mitotic cell cycle.
ifomis.de 56 / GO: interpretation of nuclear/cytoplasmic to regulate cell growth =df The process where the size of the nucleus with respect to its cytoplasm signals the cell to grow or stop growing.
ifomis.de 57 / GO: hexuronate (glucuronate/galacturonate) porter activity =df Catalysis of the reaction: hexuronate(out) + cation(out) = hexuronate(in) + cation(in)
ifomis.de 58 Problems with GO’s consistency GO: host cell cytoplasm part-of GO: host host cell cytoplasm =df “The cytoplasm of a host cell.” host =df “Any organism in which another organism, especially a parasite or symbiont, spends part or all of its life cycle and from which it obtains nourishment and/or protection.”
ifomis.de 59 Cellular Component Another problem with ‘host’ It is not a cellular component (and not a molecular function, and not a biological process, either) GO has: adult walking behavior but not ‘adult’ or ‘walking’ GO has: ‘eye pigmentation’ but not ‘eye’
ifomis.de 60 Solution Link GO to external ontologies: 1.of organism types (to solve the sensu problem) 2.of anatomy, to solve the eye problem 3.of coarse medical reality, to solve the adult walking behavior problem) (see MFN below)
ifomis.de 61 note that such linkages are possible only if GO itself has a coherent formal architecture
ifomis.de 62
ifomis.de 63 Medical Fact Net Medical Belief Net (MBN) large, heterogeneous, open-source corpus of medical sentences in the English language expressed in the form of grammatically complete statements and assessed by the degree to which they are understandable and assented to by typical non-expert human subjects. Medical Fact Net (MFN) = subclass of MBN receiving high marks on the scale of correctnesss from medical experts MFN = intersection of non-expert beliefs about medical phenomena and truths validated by medical experts.
ifomis.de 64 Medical Word Net = lexical database extending the Princeton WordNet by all the medical terms encountered in MBN First in (US) English Then in German First for adults, then for children … First for medicine, then for …
ifomis.de 65 MBN/MFN/MWN Formal Architecture Semi-automatically generated graph-based parsing of each sentence + formal ontology of all MFN entities and relationships + mapping into the UMLS Metathesaurus.
ifomis.de 66 Evaluation MFN will be integrated into an existing term- search-based on-line consumer health portal based in such a way that MFN sentences are used to direct users to information sources. We will then measure the degree to which this results in greater user satisfaction by setting up an experiment in which customers of the portal are randomly assigned to one of two groups: one to which access to MFN is offered, and other for which simple term-searching is used.
ifomis.de 67 Significance Non-expert language of family members, advisors, administrators, nurses, paramedics, lawyers … Research on differences between everyday language and technical language
ifomis.de 68 Mismatches in Doctor-Patient Communication Question Text: My seven-year-old son developed a rash today that I believe to be chickenpox. My concern is that a friend of mine had her 10-day- old baby at my home last evening before we were aware of the illness. […] Is there cause for concern at this point? Answer Text: Chickenpox is the common name for varicella infection. [...] You are correct in that a person with chickenpox can be contagious for 48 hours before the first vesicle is seen. [...]
ifomis.de 69 Non-Expert Language in Online Communication Need to integrate free text and structured data. E-health services need automatic ways to respond to questions in standard forms, and to provide internet-accessible medical knowledge that is both reliable and accessible to the non-expert.
ifomis.de 70 Diagnostic decision support we might associate collections of utterances stored in MBN describing symptoms sourced to single patients with metadata recording subsequent diagnosis. Trained on this corpus, the system could establish patterns of association between specific sequences of utterances and specific diseases; one could then test the degree to which such associations are sufficiently strong as to produce usable automatic diagnosis on the basis of patient inputs.
ifomis.de 71 Medical education/medical literacy Use MBN to evaluate of the reliability of the medical knowledge of different non-expert communities. Use MFN to develop tools to support face-to-face education of lay people in the fields of medicine and health care MBN provides opportunities for a new type of research in the field of consumer health. e.g. on basic kinds in the medical domain à la Eleanor Rosch
ifomis.de 72 Medical Coverage in WordNet 2.0 WordNet’s coverage of domains like medicine, physics, and geology is very limited. coverage of medical terms represents a mixture of folk and expert vocabulary.
ifomis.de 73 MFN: From Words to Facts Do for (non-expert) medicine what Belstein’s Fact Database does for (expert) Biochemistry Relation to CYC Relation to FrameNet Botany Knowledge Base DARPA’s Rapid Knowledge Formation project.
ifomis.de 74 Sources Lexical knowledge bases, such as: a.the relevant general lexical information contained in WordNet b.lexical knowledge-bases of lay medical vocabulary c.medical dictionaries and large medical terminology and ontology systems such as the UMLS Specialist Lexicon, the Foundational Model of Anatomy Statement or fact knowledge bases, such as: d. open-source linguistic corpora, public health documents, internet resources e. the relevant example sentences in the FrameNet and WordNet corpora f. free text sources g. the results of transforming the content of lexical knowledge bases (especially WordNet) into statements
ifomis.de 75 Generation from lexical databases treat a database like WordNet or LinKBase as a set of links tLt', between terms (where L ranges over 'is-a', 'part-of', 'is-caused-by', etc.). We form the subset of this set by restricting the values of t and t' to those which terms occur in MWN Some members of the resulting class of tLt' formula can then be transformed into English sentences automatically. For example each t is-a t'-formula can be transformed into a sentence of the form ' a t is a type of t' ' Other tLt' formula can be converted by hand into English sentences, for example "forearm HAS-PARTIAL-MATERIAL-OVERLAP wrist" can be transformed into "the forearm overlaps with the wrist" and "the wrist overlaps with the forearm".
ifomis.de 76 Problems to be Addressed “generic medical knowledge of (non-expert) adults”
ifomis.de 77 Genericity: Much generic medical knowledge relates to what holds for the most part or in most cases or in a statistically significant fraction of cases (consider: smoking causes cancer).
ifomis.de 78 Medical knowledge is intertwined with knowledge of other domains (things that can be involved in an accident …)
ifomis.de 79 Knowledge Much medical knowledge of experts and non- experts alike takes the form of knowledge of specific cases (Aunt Mary’s arthritis is always worse in the winter). MFN should be a repository of medical knowledge that is generic and context- independent, the counterpart of the theoretical knowledge of the sciences. Note that lexical knowledge of the sort stored in WordNet, too, is both generic and context- independent.
ifomis.de 80 Expertise a crisp separation of expert and non- expert sentences is impossible. Viagra, anthrax, HIV, Prozac, SARS experimental design needed to avoid artifacts
ifomis.de 81 Completeness Problem elementary facts: People have two eyes. Babies are born. Arms move. WordNet contains some coverage particularly of elementary facts of the A is type/part of B form in virtue of their specific formal architectures WordNet synsets can be used to generate long lists of elementary facts from single starting points
ifomis.de 82 Six Transform MWN into a large corpus of generic beliefs by turning WordNet on its side; that is we transform a relation such as {t1, …, tn} IS-A {t´1, …, t´m} into n x m sentences of the form: ti IS-A t´k and impose filters
ifomis.de 83 A New Kind of Linguistics MFN part and parcel of recent attempts in the biomedical sciences to confront problems of similar scope in the development of large fact- repositories such as KEGG or Swiss-Prot. In its final form it should be consistent with the knowledge that is contained also in other fact repositories both at the expert and the non- expert level – and serve to integrate them together in a federated database.
ifomis.de 84 “Adult walking behavior” will be freed from its lonely status inside GO
ifomis.de 85 The End