Medical Ontologies: An Overview Barry Smith

Slides:



Advertisements
Similar presentations
Ontology Assessment – Proposed Framework and Methodology.
Advertisements

Enhancing GO for the sake of clinical bionformatics Anand Kumar IFOMIS, University of Leipzig/Saarbrücken.
Software Architecture in Practice (3 rd Ed) Understanding Quality Attributes Understanding the following: How to express the qualities we want our architecture.
Basics of Knowledge Management ICOM5047 – Design Project in Computer Engineering ECE Department J. Fernando Vega Riveros, Ph.D.
Gene Ontology John Pinney
Introduction To System Analysis and Design
The Gene Ontology Barry Smith March 2004.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
1 An Ontology of Relations for Biomedical Informatics Barry Smith 10 January 2005.
The Role of Foundational Relations in the Alignment of Biomedical Ontologies Barry Smith and Cornelius Rosse.
Thomas Bittner and Barry Smith IFOMIS (Saarbrücken) Normalizing Medical Ontologies Using Basic Formal Ontology.
Lecture 13 Revision IMS Systems Analysis and Design.
The Ontology of the Gene Ontology Barry Smith Jennifer Williams Steffen Schulze-Kremer
STOP Barry Smith Smart Terminologies via Ontological Principles.
On the Application of Formal Principles to Life Science Data: A Case Study in the Gene Ontology Barry Smith * Jacob Köhler † Anand Kumar * *
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Software Requirements
AN INTRODUCTION TO BIOMEDICAL ONTOLOGY Barry Smith University at Buffalo 1.
VT. From Basic Formal Ontology to Medicine Barry Smith and Anand Kumar.
Overview of Software Requirements
Reference Ontologies, Application Ontologies, Terminology Ontologies Barry Smith
The database development process
An expert system is a package that holds a body of knowledge and a set of rules on a subject that has been gained from human experts. An expert system.
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
THEORIES, MODELS, AND FRAMEWORKS
Medical Informatics Basics
Chapter 17 Nursing Diagnosis
Developing Enterprise Architecture
On Roles of Models in Information Systems (Arne Sølvberg) Gustavo Carvalho 26 de Agosto de 2010.
PROCESS MODELING Chapter 8 - Process Modeling
UML - Development Process 1 Software Development Process Using UML (2)
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
Knowledge representation
ArchiMate Authors : eSchoolink Group - ITNLU. Contents 1. What’s ArchiMate ? 2. Why ArchiMate ? 3. Main Benefits of ArchiMate 4. Layers of ArchiMate 5.
Medical Informatics Basics
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
Medical Informatics Basics Lection 1 Associated professor Andriy Semenets Department of Medical Informatics.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
System models l Abstract descriptions of systems whose requirements are being analysed.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
UML Use Case Diagramming Guidelines. What is UML? The Unified Modeling Language (UML) is a standard language for specifying, visualizing, constructing,
The Gene Ontology and its insertion into UMLS Jane Lomax.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
1 Software Requirements l Specifying system functionality and constraints l Chapters 5 and 6 ++
VR. Formal Principles for Biomedical Ontologies Barry Smith
Mining the Biomedical Research Literature Ken Baclawski.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
Winter 2007SEG2101 Chapter 31 Chapter 3 Requirements Specifications.
Software Requirements Specification Document (SRS)
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
CMSC 345 Fall 2000 Requirements Expression. How To Express Requirements Often performed best by working top- down Express general attributes of system.
Introduction to research
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
25 Questions and 5 Remarks on the “Medical Fact Net” Vision Udo Hahn.
Understanding the Research Process
Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.
Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
1 Software Requirements Descriptions and specifications of a system.
MANAGEMENT INFORMATION SYSTEM
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
Information System Applications
Logical Database Design and the Rational Model
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The Systems Engineering Context
Software Design and Architecture
Introduction to Applied and Theoretical Ontology Barry Smith
Presentation transcript:

Medical Ontologies: An Overview Barry Smith

2 IFOMIS Institute for Formal Ontology and Medical Information Science Faculty of Medicine University of Leipzig

ifomis.de 3 Partners Laboratory for Applied Ontology, Trento and Rome Language & Computing nv, Zonnegem, Belgium Ontology Works, Baltimore Structural Informatics Group, Department of Biological Structure, University of Washington, Seattle, USA Cognitive Science Laboratory, Princeton University

ifomis.de 4 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture

ifomis.de 5 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture

ifomis.de 6 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture

ifomis.de 7 IFOMIS Institute for Formal Ontology and Medical Information Science Leipzig philosophers and medical informaticians attempting to build and test a Basic Formal Ontology for applications in biomedical and related domains

ifomis.de 8 IFOMIS use basic principles of philosophical ontology for quality assurance and alignment of biomedical ontologies

ifomis.de 9 Compare: 1)pure mathematics (theories of structures such as order, set, function, mapping) employed in every domain 2)applied mathematics, applications of these theories = re-using the same definitions, theorems, proofs in new application domains 3)physical chemistry, biophysics, etc. = adding detail

ifomis.de 10 Three levels of ontology 1)formal (top-level) ontology = medical ontology has nothing like the technology of definitions, theorems and proofs provided by pure mathematics 2) domain ontology = UMLS Semantic Network, GALEN CORE 3) terminology-based ontology = UMLS, SNOMED-CT, GALEN, FMA ?????

ifomis.de 11 Strategy Part 1: Provide an overview of medical ontologies and of the top-level ontologies which they implicitly define Part 2: Show how principles of classification and definition derived from top-level ontology can help in quality assurance of terminology- based ontologies and in ontology alignment Part 3: The Gene Ontology Part 4: Medical Fact Net

ifomis.de 12

ifomis.de 13 UMLS Semantic Network entity event physical conceptual object entity

ifomis.de 14 UMLS Semantic Network entity event physical conceptual object entity

ifomis.de 15 conceptual entity Organism Attribute Finding Idea or Concept Occupation or Discipline Organization Group Group Attribute Intellectual Product Language

ifomis.de 16 conceptual entity idea or concept functional concept body system

ifomis.de 17 entity physical conceptual object entity idea or concept functional concept body system confusion of entity and concept

ifomis.de 18 Functional Concept: Body system is_a Functional Concept. but: Concepts do not perform functions or have physical parts.

ifomis.de 19 This: is not a concept

ifomis.de 20 The Hydraulic Equation BP = CO*PVR arterial blood pressure is directly proportional to the product of blood flow (cardiac output, CO) and peripheral vascular resistance (PVR)

ifomis.de 21 Confusion of Ontology and Epistemology blood pressure is an Organism Function, cardiac output is a Laboratory or Test Result or Diagnostic Procedure BP = CO*PVR thus asserts that blood pressure is proportional either to a laboratory or test result or to a diagnostic procedure

ifomis.de 22 entities independent dependent occurrents continuants continuants (always dependent) ORGANISMS ROLES PROCESSES CELLS FUNCTIONS HISTORIES MOLECULES CONDITIONS LIVES (diseases) (courses of diseases)

ifomis.de 23 entities independent dependent occurrents continuants continuants (always dependent) ORGANISMS ROLES PROCESSES CELLS FUNCTIONS HISTORIES MOLECULES CONDITIONS LIVES (diseases) (courses of diseases) classes instances

ifomis.de 24 A three-category ontology along these lines accepted by DOLCE = first module of Semantic Web Wonderweb Foundational Ontologies Library BFO = IFOMIS Basic Formal Ontology L&C LinKBase UMLS-SN Gene Ontology

ifomis.de 25

Principles for Building Medical Ontologies

ifomis.de 27 Examples Don’t confuse entities with concepts Don’t confuse domain entities with logical or computational structures Don’t confuse ontology with epistemology Don’t confuse is_a with has_role

ifomis.de 28 Further Principles univocity: terms should have the same meanings (and thus point to the same referents) on every occasion of use UMLS-SN: ‘organization’ = body plan ‘organization’ = social organization

ifomis.de 29 univocity Gene Ontology: ‘part_of’ = ‘can be part of’ (flagellum part_of cell) ‘part_of’ = ‘is sometimes part of’ (replication fork part_of the nucleoplasm) ‘part_of’ = ‘is included as a sublist in’

ifomis.de 30 don’t forget instances part_of as a relation between classes vs. part as a relation between instances A part_of B 1.every instance of A is part of some instance of B 2.every instance of B has some instance of A as part

ifomis.de 31 Part_of as a relation between classes is more problematic than is standardly supposed testis part_of human being ? heart part_of human being ?

ifomis.de 32 objectivity which classes exist is not a function of our biological knowledge. (Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds.) GO: aminoadipate-semialdehyde dehydrogenase complex is_a unlocalized

ifomis.de 33 rules for definitions intelligibility: the terms used in a definition should be simpler (more intelligible) than the term to be defined definitions: do not confuse definitions with the communication of new knowledge

ifomis.de 34 substitutability in all so-called extensional contexts a defined term should be substitutable by its definition in such a way that the result is both grammatically correct and has the same truth-value as the sentence with which we begin GO: : toxin activity Definition: Acts as to cause injury to other living organisms.

ifomis.de 35 substitutability There is toxin activity here There is acts as to cause injury to other living organisms here

ifomis.de 36

ifomis.de 37 GO: the Gene Ontology 3 large telephone directories of standardized designations for gene functions and products organized into hierarchies via is_a and part_of

ifomis.de 38 GO can in practice be used only by trained biologists (with know how) whether a GO-term truly stands in the is_a relation depends e.g. on the type of organism involved glycosome is part-of cytoplasm only for Kinetoplastidae Computers have no counterpart of such context-dependent know-how

ifomis.de 39 GO divided into three disjoint term hierarchies the cellular component ontology, e.g. flagellum, chromosome, cell the molecular function ontology, e.g. ice nucleation, binding, protein stabilization the biological process ontology, e.g. glycolysis, death

ifomis.de 40 Primary aim of GO not rigorous definition and principled classification but rather: providing a practically useful framework for keeping track of the biological annotations that are applied to gene products

ifomis.de 41 Thesis 1 With increasing size, GO will be required to increase the degree to which it is a controlled vocabulary which satisfies not merely the needs of human biologists but also the needs of automatic consistency- checking and updating systems

ifomis.de 42 Thesis 2 GO can realize its goal more adequately (and avoid many coding errors) by taking ontology (especially the logic of classifications and definitions) seriously

ifomis.de 43 GO: the Gene Ontology GO divided into 3 separate hierarchies each organized via is_a and part_of

ifomis.de 44 Problems with is_a A is_a B = every instance of A is an instance of B

ifomis.de 45 Problems with is_a Holliday junction helicase complex is_a unlocalized protein storage vacuole is_a vacuole (sensu Streptophyta) R7 differentiation is_a eye photoreceptor differentiation (sensu Drosophilia).

ifomis.de 46 Uses of part_of – membrane part-of cell, intended to mean “a membrane is a part-of any cell” – flagellum part-of cell, intended to mean “a flagellum is part-of some cells” – replication fork part-of cell cycle, intended to mean: “a replication fork is part-of the nucleoplasm only during certain times of the cell cycle” – regulation of sleep part-of sleep, should be corrected to: “regulation of sleep is co-located with and is causally involved with the sleep process”.

ifomis.de 47 Problems with part_of ‘part_of’ = ‘can be part of’ (flagellum part_of cell) ‘part_of’ = ‘is sometimes part of’ (replication fork part_of the nucleoplasm) ‘part_of’ = ‘is included as a sublist in’

ifomis.de 48 Problem’s with GO Molecular Functions anti-coagulant activity (defined as: “a substance that retards or prevents coagulation”) enzyme activity (defined as: “a substance that catalyzes”) structural molecule (defined as: “the action of a molecule that contributes to structural integrity”)

ifomis.de 49 GO: : structural constituent of cell wall Definition: The action of a molecule that contributes to the structural integrity of a cell wall. confuses actions, which GO includes in its function ontology, with constituents, which GO includes in its cellular component ontology

ifomis.de 50 extracellular matrix structural constituent + puparial glue (sensu Diptera) structural constituent of bone structural constituent of chorion (sensu Insecta) structural constituent of chromatin structural constituent of cuticle + structural constituent of cytoskeleton structural constituent of epidermis + structural constituent of eye lens structural constituent of muscle structural constituent of myelin sheath structural constituent of nuclear pore structural constituent of peritrophic membrane (sensu Insecta) structural constituent of ribosome structural constituent of tooth enamel structural constituent of vitelline membrane (sensu Insecta)

ifomis.de 51 Why do these problems arise? Because GO has no clear formal understanding of the role of temporal relations in organizing an ontology (thus also no clear understanding of the difference between a function and the activity which is the realization of a function – GO runs these two together)

ifomis.de 52 As GO increases in size and scope it will “be increasingly difficult to maintain the semantic consistency we desire without software tools that perform consistency checks and controlled updates”. The addition of each new term will require the curator to understand the entire structure of GO in order to avoid redundancy and to ensure that all appropriate linkages are made with other terms.

ifomis.de 53 Problems with GO’s compositionality sensu / : + with from in resulting regulating regulation of complex constituting constitution

ifomis.de 54 / GO: microtubule/kinetochore interaction =df Physical interaction between microtubules and chromatin via proteins making up the kinetochore complex, GO: ciliary/flagellar motility =df Locomotion due to movement of cilia or flagella.

ifomis.de 55 / GO: negative regulation of chromatin assembly/disassembly =df Any process that stops, prevents or reduces the rate of chromatin assembly and/or disassembly GO: G1/S transition of mitotic cell cycle defined as: Progression from G1 phase to S phase of the standard mitotic cell cycle.

ifomis.de 56 / GO: interpretation of nuclear/cytoplasmic to regulate cell growth =df The process where the size of the nucleus with respect to its cytoplasm signals the cell to grow or stop growing.

ifomis.de 57 / GO: hexuronate (glucuronate/galacturonate) porter activity =df Catalysis of the reaction: hexuronate(out) + cation(out) = hexuronate(in) + cation(in)

ifomis.de 58 Problems with GO’s consistency GO: host cell cytoplasm part-of GO: host host cell cytoplasm =df “The cytoplasm of a host cell.” host =df “Any organism in which another organism, especially a parasite or symbiont, spends part or all of its life cycle and from which it obtains nourishment and/or protection.”

ifomis.de 59 Cellular Component Another problem with ‘host’ It is not a cellular component (and not a molecular function, and not a biological process, either) GO has: adult walking behavior but not ‘adult’ or ‘walking’ GO has: ‘eye pigmentation’ but not ‘eye’

ifomis.de 60 Solution Link GO to external ontologies: 1.of organism types (to solve the sensu problem) 2.of anatomy, to solve the eye problem 3.of coarse medical reality, to solve the adult walking behavior problem) (see MFN below)

ifomis.de 61 note that such linkages are possible only if GO itself has a coherent formal architecture

ifomis.de 62

ifomis.de 63 Medical Fact Net Medical Belief Net (MBN) large, heterogeneous, open-source corpus of medical sentences in the English language expressed in the form of grammatically complete statements and assessed by the degree to which they are understandable and assented to by typical non-expert human subjects. Medical Fact Net (MFN) = subclass of MBN receiving high marks on the scale of correctnesss from medical experts MFN = intersection of non-expert beliefs about medical phenomena and truths validated by medical experts.

ifomis.de 64 Medical Word Net = lexical database extending the Princeton WordNet by all the medical terms encountered in MBN First in (US) English Then in German First for adults, then for children … First for medicine, then for …

ifomis.de 65 MBN/MFN/MWN Formal Architecture Semi-automatically generated graph-based parsing of each sentence + formal ontology of all MFN entities and relationships + mapping into the UMLS Metathesaurus.

ifomis.de 66 Evaluation MFN will be integrated into an existing term- search-based on-line consumer health portal based in such a way that MFN sentences are used to direct users to information sources. We will then measure the degree to which this results in greater user satisfaction by setting up an experiment in which customers of the portal are randomly assigned to one of two groups: one to which access to MFN is offered, and other for which simple term-searching is used.

ifomis.de 67 Significance Non-expert language of family members, advisors, administrators, nurses, paramedics, lawyers … Research on differences between everyday language and technical language

ifomis.de 68 Mismatches in Doctor-Patient Communication Question Text: My seven-year-old son developed a rash today that I believe to be chickenpox. My concern is that a friend of mine had her 10-day- old baby at my home last evening before we were aware of the illness. […] Is there cause for concern at this point? Answer Text: Chickenpox is the common name for varicella infection. [...] You are correct in that a person with chickenpox can be contagious for 48 hours before the first vesicle is seen. [...]

ifomis.de 69 Non-Expert Language in Online Communication Need to integrate free text and structured data. E-health services need automatic ways to respond to questions in standard forms, and to provide internet-accessible medical knowledge that is both reliable and accessible to the non-expert.

ifomis.de 70 Diagnostic decision support we might associate collections of utterances stored in MBN describing symptoms sourced to single patients with metadata recording subsequent diagnosis. Trained on this corpus, the system could establish patterns of association between specific sequences of utterances and specific diseases; one could then test the degree to which such associations are sufficiently strong as to produce usable automatic diagnosis on the basis of patient inputs.

ifomis.de 71 Medical education/medical literacy Use MBN to evaluate of the reliability of the medical knowledge of different non-expert communities. Use MFN to develop tools to support face-to-face education of lay people in the fields of medicine and health care MBN provides opportunities for a new type of research in the field of consumer health. e.g. on basic kinds in the medical domain à la Eleanor Rosch

ifomis.de 72 Medical Coverage in WordNet 2.0 WordNet’s coverage of domains like medicine, physics, and geology is very limited. coverage of medical terms represents a mixture of folk and expert vocabulary.

ifomis.de 73 MFN: From Words to Facts Do for (non-expert) medicine what Belstein’s Fact Database does for (expert) Biochemistry Relation to CYC Relation to FrameNet Botany Knowledge Base DARPA’s Rapid Knowledge Formation project.

ifomis.de 74 Sources Lexical knowledge bases, such as: a.the relevant general lexical information contained in WordNet b.lexical knowledge-bases of lay medical vocabulary c.medical dictionaries and large medical terminology and ontology systems such as the UMLS Specialist Lexicon, the Foundational Model of Anatomy Statement or fact knowledge bases, such as: d. open-source linguistic corpora, public health documents, internet resources e. the relevant example sentences in the FrameNet and WordNet corpora f. free text sources g. the results of transforming the content of lexical knowledge bases (especially WordNet) into statements

ifomis.de 75 Generation from lexical databases treat a database like WordNet or LinKBase as a set of links tLt', between terms (where L ranges over 'is-a', 'part-of', 'is-caused-by', etc.). We form the subset of this set by restricting the values of t and t' to those which terms occur in MWN Some members of the resulting class of tLt' formula can then be transformed into English sentences automatically. For example each t is-a t'-formula can be transformed into a sentence of the form ' a t is a type of t' ' Other tLt' formula can be converted by hand into English sentences, for example "forearm HAS-PARTIAL-MATERIAL-OVERLAP wrist" can be transformed into "the forearm overlaps with the wrist" and "the wrist overlaps with the forearm".

ifomis.de 76 Problems to be Addressed “generic medical knowledge of (non-expert) adults”

ifomis.de 77 Genericity: Much generic medical knowledge relates to what holds for the most part or in most cases or in a statistically significant fraction of cases (consider: smoking causes cancer).

ifomis.de 78 Medical knowledge is intertwined with knowledge of other domains (things that can be involved in an accident …)

ifomis.de 79 Knowledge Much medical knowledge of experts and non- experts alike takes the form of knowledge of specific cases (Aunt Mary’s arthritis is always worse in the winter). MFN should be a repository of medical knowledge that is generic and context- independent, the counterpart of the theoretical knowledge of the sciences. Note that lexical knowledge of the sort stored in WordNet, too, is both generic and context- independent.

ifomis.de 80 Expertise a crisp separation of expert and non- expert sentences is impossible. Viagra, anthrax, HIV, Prozac, SARS  experimental design needed to avoid artifacts

ifomis.de 81 Completeness Problem elementary facts: People have two eyes. Babies are born. Arms move. WordNet contains some coverage particularly of elementary facts of the A is type/part of B form in virtue of their specific formal architectures WordNet synsets can be used to generate long lists of elementary facts from single starting points

ifomis.de 82 Six Transform MWN into a large corpus of generic beliefs by turning WordNet on its side; that is we transform a relation such as {t1, …, tn} IS-A {t´1, …, t´m} into n x m sentences of the form: ti IS-A t´k and impose filters

ifomis.de 83 A New Kind of Linguistics MFN part and parcel of recent attempts in the biomedical sciences to confront problems of similar scope in the development of large fact- repositories such as KEGG or Swiss-Prot. In its final form it should be consistent with the knowledge that is contained also in other fact repositories both at the expert and the non- expert level – and serve to integrate them together in a federated database.

ifomis.de 84 “Adult walking behavior” will be freed from its lonely status inside GO

ifomis.de 85 The End