Presentation is loading. Please wait.

Presentation is loading. Please wait.

Medical Ontologies: An Overview Barry Smith

Similar presentations


Presentation on theme: "Medical Ontologies: An Overview Barry Smith"— Presentation transcript:

1 Medical Ontologies: An Overview Barry Smith http://ifomis.de

2 2 IFOMIS Institute for Formal Ontology and Medical Information Science Faculty of Medicine University of Leipzig

3 http:// ifomis.de 3 Partners Laboratory for Applied Ontology, Trento and Rome Language & Computing nv, Zonnegem, Belgium Ontology Works, Baltimore Structural Informatics Group, Department of Biological Structure, University of Washington, Seattle, USA Cognitive Science Laboratory, Princeton University

4 http:// ifomis.de 4 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture

5 http:// ifomis.de 5 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture

6 http:// ifomis.de 6 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture

7 http:// ifomis.de 7 IFOMIS Institute for Formal Ontology and Medical Information Science Leipzig http://ifomis.de philosophers and medical informaticians attempting to build and test a Basic Formal Ontology for applications in biomedical and related domains

8 http:// ifomis.de 8 IFOMIS use basic principles of philosophical ontology for quality assurance and alignment of biomedical ontologies

9 http:// ifomis.de 9 Compare: 1)pure mathematics (theories of structures such as order, set, function, mapping) employed in every domain 2)applied mathematics, applications of these theories = re-using the same definitions, theorems, proofs in new application domains 3)physical chemistry, biophysics, etc. = adding detail

10 http:// ifomis.de 10 Three levels of ontology 1)formal (top-level) ontology = medical ontology has nothing like the technology of definitions, theorems and proofs provided by pure mathematics 2) domain ontology = UMLS Semantic Network, GALEN CORE 3) terminology-based ontology = UMLS, SNOMED-CT, GALEN, FMA ?????

11 http:// ifomis.de 11 Strategy Part 1: Provide an overview of medical ontologies and of the top-level ontologies which they implicitly define Part 2: Show how principles of classification and definition derived from top-level ontology can help in quality assurance of terminology- based ontologies and in ontology alignment Part 3: The Gene Ontology Part 4: Medical Fact Net

12 http:// ifomis.de 12

13 http:// ifomis.de 13 UMLS Semantic Network entity event physical conceptual object entity

14 http:// ifomis.de 14 UMLS Semantic Network entity event physical conceptual object entity

15 http:// ifomis.de 15 conceptual entity Organism Attribute Finding Idea or Concept Occupation or Discipline Organization Group Group Attribute Intellectual Product Language

16 http:// ifomis.de 16 conceptual entity idea or concept functional concept body system

17 http:// ifomis.de 17 entity physical conceptual object entity idea or concept functional concept body system confusion of entity and concept

18 http:// ifomis.de 18 Functional Concept: Body system is_a Functional Concept. but: Concepts do not perform functions or have physical parts.

19 http:// ifomis.de 19 This: is not a concept

20 http:// ifomis.de 20 The Hydraulic Equation BP = CO*PVR arterial blood pressure is directly proportional to the product of blood flow (cardiac output, CO) and peripheral vascular resistance (PVR)

21 http:// ifomis.de 21 Confusion of Ontology and Epistemology blood pressure is an Organism Function, cardiac output is a Laboratory or Test Result or Diagnostic Procedure BP = CO*PVR thus asserts that blood pressure is proportional either to a laboratory or test result or to a diagnostic procedure

22 http:// ifomis.de 22 entities independent dependent occurrents continuants continuants (always dependent) ORGANISMS ROLES PROCESSES CELLS FUNCTIONS HISTORIES MOLECULES CONDITIONS LIVES (diseases) (courses of diseases)

23 http:// ifomis.de 23 entities independent dependent occurrents continuants continuants (always dependent) ORGANISMS ROLES PROCESSES CELLS FUNCTIONS HISTORIES MOLECULES CONDITIONS LIVES (diseases) (courses of diseases) classes instances

24 http:// ifomis.de 24 A three-category ontology along these lines accepted by DOLCE = first module of Semantic Web Wonderweb Foundational Ontologies Library BFO = IFOMIS Basic Formal Ontology L&C LinKBase UMLS-SN Gene Ontology

25 http:// ifomis.de 25

26 Principles for Building Medical Ontologies

27 http:// ifomis.de 27 Examples Don’t confuse entities with concepts Don’t confuse domain entities with logical or computational structures Don’t confuse ontology with epistemology Don’t confuse is_a with has_role

28 http:// ifomis.de 28 Further Principles univocity: terms should have the same meanings (and thus point to the same referents) on every occasion of use UMLS-SN: ‘organization’ = body plan ‘organization’ = social organization

29 http:// ifomis.de 29 univocity Gene Ontology: ‘part_of’ = ‘can be part of’ (flagellum part_of cell) ‘part_of’ = ‘is sometimes part of’ (replication fork part_of the nucleoplasm) ‘part_of’ = ‘is included as a sublist in’

30 http:// ifomis.de 30 don’t forget instances part_of as a relation between classes vs. part as a relation between instances A part_of B 1.every instance of A is part of some instance of B 2.every instance of B has some instance of A as part

31 http:// ifomis.de 31 Part_of as a relation between classes is more problematic than is standardly supposed testis part_of human being ? heart part_of human being ?

32 http:// ifomis.de 32 objectivity which classes exist is not a function of our biological knowledge. (Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds.) GO: aminoadipate-semialdehyde dehydrogenase complex is_a unlocalized

33 http:// ifomis.de 33 rules for definitions intelligibility: the terms used in a definition should be simpler (more intelligible) than the term to be defined definitions: do not confuse definitions with the communication of new knowledge

34 http:// ifomis.de 34 substitutability in all so-called extensional contexts a defined term should be substitutable by its definition in such a way that the result is both grammatically correct and has the same truth-value as the sentence with which we begin GO:0015070: toxin activity Definition: Acts as to cause injury to other living organisms.

35 http:// ifomis.de 35 substitutability There is toxin activity here There is acts as to cause injury to other living organisms here

36 http:// ifomis.de 36

37 http:// ifomis.de 37 GO: the Gene Ontology 3 large telephone directories of standardized designations for gene functions and products organized into hierarchies via is_a and part_of

38 http:// ifomis.de 38 GO can in practice be used only by trained biologists (with know how) whether a GO-term truly stands in the is_a relation depends e.g. on the type of organism involved glycosome is part-of cytoplasm only for Kinetoplastidae Computers have no counterpart of such context-dependent know-how

39 http:// ifomis.de 39 GO divided into three disjoint term hierarchies the cellular component ontology, e.g. flagellum, chromosome, cell the molecular function ontology, e.g. ice nucleation, binding, protein stabilization the biological process ontology, e.g. glycolysis, death

40 http:// ifomis.de 40 Primary aim of GO not rigorous definition and principled classification but rather: providing a practically useful framework for keeping track of the biological annotations that are applied to gene products

41 http:// ifomis.de 41 Thesis 1 With increasing size, GO will be required to increase the degree to which it is a controlled vocabulary which satisfies not merely the needs of human biologists but also the needs of automatic consistency- checking and updating systems

42 http:// ifomis.de 42 Thesis 2 GO can realize its goal more adequately (and avoid many coding errors) by taking ontology (especially the logic of classifications and definitions) seriously

43 http:// ifomis.de 43 GO: the Gene Ontology GO divided into 3 separate hierarchies each organized via is_a and part_of

44 http:// ifomis.de 44 Problems with is_a A is_a B = every instance of A is an instance of B

45 http:// ifomis.de 45 Problems with is_a Holliday junction helicase complex is_a unlocalized protein storage vacuole is_a vacuole (sensu Streptophyta) R7 differentiation is_a eye photoreceptor differentiation (sensu Drosophilia).

46 http:// ifomis.de 46 Uses of part_of – membrane part-of cell, intended to mean “a membrane is a part-of any cell” – flagellum part-of cell, intended to mean “a flagellum is part-of some cells” – replication fork part-of cell cycle, intended to mean: “a replication fork is part-of the nucleoplasm only during certain times of the cell cycle” – regulation of sleep part-of sleep, should be corrected to: “regulation of sleep is co-located with and is causally involved with the sleep process”.

47 http:// ifomis.de 47 Problems with part_of ‘part_of’ = ‘can be part of’ (flagellum part_of cell) ‘part_of’ = ‘is sometimes part of’ (replication fork part_of the nucleoplasm) ‘part_of’ = ‘is included as a sublist in’

48 http:// ifomis.de 48 Problem’s with GO Molecular Functions anti-coagulant activity (defined as: “a substance that retards or prevents coagulation”) enzyme activity (defined as: “a substance that catalyzes”) structural molecule (defined as: “the action of a molecule that contributes to structural integrity”)

49 http:// ifomis.de 49 GO:0005199: structural constituent of cell wall Definition: The action of a molecule that contributes to the structural integrity of a cell wall. confuses actions, which GO includes in its function ontology, with constituents, which GO includes in its cellular component ontology

50 http:// ifomis.de 50 extracellular matrix structural constituent + puparial glue (sensu Diptera) structural constituent of bone structural constituent of chorion (sensu Insecta) structural constituent of chromatin structural constituent of cuticle + structural constituent of cytoskeleton structural constituent of epidermis + structural constituent of eye lens structural constituent of muscle structural constituent of myelin sheath structural constituent of nuclear pore structural constituent of peritrophic membrane (sensu Insecta) structural constituent of ribosome structural constituent of tooth enamel structural constituent of vitelline membrane (sensu Insecta)

51 http:// ifomis.de 51 Why do these problems arise? Because GO has no clear formal understanding of the role of temporal relations in organizing an ontology (thus also no clear understanding of the difference between a function and the activity which is the realization of a function – GO runs these two together)

52 http:// ifomis.de 52 As GO increases in size and scope it will “be increasingly difficult to maintain the semantic consistency we desire without software tools that perform consistency checks and controlled updates”. The addition of each new term will require the curator to understand the entire structure of GO in order to avoid redundancy and to ensure that all appropriate linkages are made with other terms.

53 http:// ifomis.de 53 Problems with GO’s compositionality sensu / : + with from in resulting regulating regulation of complex constituting constitution

54 http:// ifomis.de 54 / GO:0008608 microtubule/kinetochore interaction =df Physical interaction between microtubules and chromatin via proteins making up the kinetochore complex, GO:0001539 ciliary/flagellar motility =df Locomotion due to movement of cilia or flagella.

55 http:// ifomis.de 55 / GO:0045798 negative regulation of chromatin assembly/disassembly =df Any process that stops, prevents or reduces the rate of chromatin assembly and/or disassembly GO:0000082 G1/S transition of mitotic cell cycle defined as: Progression from G1 phase to S phase of the standard mitotic cell cycle.

56 http:// ifomis.de 56 / GO:0001559 interpretation of nuclear/cytoplasmic to regulate cell growth =df The process where the size of the nucleus with respect to its cytoplasm signals the cell to grow or stop growing.

57 http:// ifomis.de 57 / GO:0015539 hexuronate (glucuronate/galacturonate) porter activity =df Catalysis of the reaction: hexuronate(out) + cation(out) = hexuronate(in) + cation(in)

58 http:// ifomis.de 58 Problems with GO’s consistency GO: 0030430 host cell cytoplasm part-of GO:018995 host host cell cytoplasm =df “The cytoplasm of a host cell.” host =df “Any organism in which another organism, especially a parasite or symbiont, spends part or all of its life cycle and from which it obtains nourishment and/or protection.”

59 http:// ifomis.de 59 Cellular Component Another problem with ‘host’ It is not a cellular component (and not a molecular function, and not a biological process, either) GO has: adult walking behavior but not ‘adult’ or ‘walking’ GO has: ‘eye pigmentation’ but not ‘eye’

60 http:// ifomis.de 60 Solution Link GO to external ontologies: 1.of organism types (to solve the sensu problem) 2.of anatomy, to solve the eye problem 3.of coarse medical reality, to solve the adult walking behavior problem) (see MFN below)

61 http:// ifomis.de 61 note that such linkages are possible only if GO itself has a coherent formal architecture

62 http:// ifomis.de 62

63 http:// ifomis.de 63 Medical Fact Net Medical Belief Net (MBN) large, heterogeneous, open-source corpus of medical sentences in the English language expressed in the form of grammatically complete statements and assessed by the degree to which they are understandable and assented to by typical non-expert human subjects. Medical Fact Net (MFN) = subclass of MBN receiving high marks on the scale of correctnesss from medical experts MFN = intersection of non-expert beliefs about medical phenomena and truths validated by medical experts.

64 http:// ifomis.de 64 Medical Word Net = lexical database extending the Princeton WordNet by all the medical terms encountered in MBN First in (US) English Then in German First for adults, then for children … First for medicine, then for …

65 http:// ifomis.de 65 MBN/MFN/MWN Formal Architecture Semi-automatically generated graph-based parsing of each sentence + formal ontology of all MFN entities and relationships + mapping into the UMLS Metathesaurus.

66 http:// ifomis.de 66 Evaluation MFN will be integrated into an existing term- search-based on-line consumer health portal based in such a way that MFN sentences are used to direct users to information sources. We will then measure the degree to which this results in greater user satisfaction by setting up an experiment in which customers of the portal are randomly assigned to one of two groups: one to which access to MFN is offered, and other for which simple term-searching is used.

67 http:// ifomis.de 67 Significance Non-expert language of family members, advisors, administrators, nurses, paramedics, lawyers … Research on differences between everyday language and technical language

68 http:// ifomis.de 68 Mismatches in Doctor-Patient Communication Question Text: My seven-year-old son developed a rash today that I believe to be chickenpox. My concern is that a friend of mine had her 10-day- old baby at my home last evening before we were aware of the illness. […] Is there cause for concern at this point? Answer Text: Chickenpox is the common name for varicella infection. [...] You are correct in that a person with chickenpox can be contagious for 48 hours before the first vesicle is seen. [...]

69 http:// ifomis.de 69 Non-Expert Language in Online Communication Need to integrate free text and structured data. E-health services need automatic ways to respond to questions in standard forms, and to provide internet-accessible medical knowledge that is both reliable and accessible to the non-expert.

70 http:// ifomis.de 70 Diagnostic decision support we might associate collections of utterances stored in MBN describing symptoms sourced to single patients with metadata recording subsequent diagnosis. Trained on this corpus, the system could establish patterns of association between specific sequences of utterances and specific diseases; one could then test the degree to which such associations are sufficiently strong as to produce usable automatic diagnosis on the basis of patient inputs.

71 http:// ifomis.de 71 Medical education/medical literacy Use MBN to evaluate of the reliability of the medical knowledge of different non-expert communities. Use MFN to develop tools to support face-to-face education of lay people in the fields of medicine and health care MBN provides opportunities for a new type of research in the field of consumer health. e.g. on basic kinds in the medical domain à la Eleanor Rosch

72 http:// ifomis.de 72 Medical Coverage in WordNet 2.0 WordNet’s coverage of domains like medicine, physics, and geology is very limited. coverage of medical terms represents a mixture of folk and expert vocabulary.

73 http:// ifomis.de 73 MFN: From Words to Facts Do for (non-expert) medicine what Belstein’s Fact Database does for (expert) Biochemistry Relation to CYC Relation to FrameNet Botany Knowledge Base DARPA’s Rapid Knowledge Formation project.

74 http:// ifomis.de 74 Sources Lexical knowledge bases, such as: a.the relevant general lexical information contained in WordNet b.lexical knowledge-bases of lay medical vocabulary c.medical dictionaries and large medical terminology and ontology systems such as the UMLS Specialist Lexicon, the Foundational Model of Anatomy Statement or fact knowledge bases, such as: d. open-source linguistic corpora, public health documents, internet resources e. the relevant example sentences in the FrameNet and WordNet corpora f. free text sources g. the results of transforming the content of lexical knowledge bases (especially WordNet) into statements

75 http:// ifomis.de 75 Generation from lexical databases treat a database like WordNet or LinKBase as a set of links tLt', between terms (where L ranges over 'is-a', 'part-of', 'is-caused-by', etc.). We form the subset of this set by restricting the values of t and t' to those which terms occur in MWN Some members of the resulting class of tLt' formula can then be transformed into English sentences automatically. For example each t is-a t'-formula can be transformed into a sentence of the form ' a t is a type of t' ' Other tLt' formula can be converted by hand into English sentences, for example "forearm HAS-PARTIAL-MATERIAL-OVERLAP wrist" can be transformed into "the forearm overlaps with the wrist" and "the wrist overlaps with the forearm".

76 http:// ifomis.de 76 Problems to be Addressed “generic medical knowledge of (non-expert) adults”

77 http:// ifomis.de 77 Genericity: Much generic medical knowledge relates to what holds for the most part or in most cases or in a statistically significant fraction of cases (consider: smoking causes cancer).

78 http:// ifomis.de 78 Medical knowledge is intertwined with knowledge of other domains (things that can be involved in an accident …)

79 http:// ifomis.de 79 Knowledge Much medical knowledge of experts and non- experts alike takes the form of knowledge of specific cases (Aunt Mary’s arthritis is always worse in the winter). MFN should be a repository of medical knowledge that is generic and context- independent, the counterpart of the theoretical knowledge of the sciences. Note that lexical knowledge of the sort stored in WordNet, too, is both generic and context- independent.

80 http:// ifomis.de 80 Expertise a crisp separation of expert and non- expert sentences is impossible. Viagra, anthrax, HIV, Prozac, SARS  experimental design needed to avoid artifacts

81 http:// ifomis.de 81 Completeness Problem elementary facts: People have two eyes. Babies are born. Arms move. WordNet contains some coverage particularly of elementary facts of the A is type/part of B form in virtue of their specific formal architectures WordNet synsets can be used to generate long lists of elementary facts from single starting points

82 http:// ifomis.de 82 Six Transform MWN into a large corpus of generic beliefs by turning WordNet on its side; that is we transform a relation such as {t1, …, tn} IS-A {t´1, …, t´m} into n x m sentences of the form: ti IS-A t´k and impose filters

83 http:// ifomis.de 83 A New Kind of Linguistics MFN part and parcel of recent attempts in the biomedical sciences to confront problems of similar scope in the development of large fact- repositories such as KEGG or Swiss-Prot. In its final form it should be consistent with the knowledge that is contained also in other fact repositories both at the expert and the non- expert level – and serve to integrate them together in a federated database.

84 http:// ifomis.de 84 “Adult walking behavior” will be freed from its lonely status inside GO

85 http:// ifomis.de 85 The End


Download ppt "Medical Ontologies: An Overview Barry Smith"

Similar presentations


Ads by Google