Ontology and Its Applications Barry Smith
2 OVERVIEW Part I: A Brief Overview of Developments in Ontology at the Borderlines of Philosophy and Computation Part II: Ontology and Biomedical Informatics
3 IFOMIS now part of European Centre for Ontological Research, Saarbrücken, Germany
4 Institute for Formal Ontology and Medical Information Science 16 staff 2 medical informaticians 1 neurologist 1 chemist 1 radiologist 2 computer scientists 9 philosophers
5 The problem Different communities of researchers use different and often incompatible concepts / categories in expressing the results of their work
6 Example: Medicine blood is a tissue blood is a body fluid How to integrate competing conceptualizations?
7 Example: Molecular Biology GDB Genome Database of Human Genome Project GenBank National Center for Biotechnology Information, Washington DC
8 What is a gene? GDB: a gene is a DNA fragment that can be transcribed and translated into a protein GenBank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype
9 How to integrate competing conceptualizations for example across the granular divide between medicine and molecular biology?
10 Answer: ONTOLOGY! But what does “ontology” mean?
11 Three senses of ‘ontology’ 1.Philosophical sense: Aristotle: an inventory of the types of entities and relations in reality Quine: an inventory of ontological commitments 2.Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain 3.Gene Ontology sense: a controlled vocabulary for database annotation / indexing
12 Two Communities Reference Ontology Community: An ontology is an inventory of the types of entities and relations which exist in a given domain of reality KR Community: an ontology is a consensus representation of the concepts used in a given domain of discourse
13 “Ontology” as used in KR / AI had its roots in Quine’s doctrine of ontological commitment and in the ‘internal metaphysics’ of Carnap/Putnam
14 Quineanism: ontology is the study of the ontological commitments or presuppositions embodied in scientific theories (or in the beliefs of those experts, or in the databases of that company)
15 Quineanism, too, faces the integration problem If an ontology is the set of ontological commitments of a theory how can we cope with questions pertaining to the relations between the objects to which different theories are committed? Quine can tell us what there is but can he tell us how it is related together?
16 The problem of the unity of science The logical positivist solution to this problem addressed a world in which sciences are identified with printed texts What if sciences are identified with information systems or with the contents of websites?
17 The Semantic Web Initiative The Web is a vast edifice of heterogeneous data sources Needs the ability to query and integrate across different and often incompatible conceptual systems
18 How resolve such incompatibilities and make the various parts of the web interoperable? Enforce conceptual compatibility via standardized taxonomies applied to websites as meta-tags formulated within the framework of a common web language like OWL
19 Tim Berners Lee: hyperlinked vocabularies, called ‘ontologies’ will be used by Web authors ‘to explicitly define their words and concepts as they post their stuff online. ‘codes would let software "agents" analyze the Web on our behalf, making smart inferences that go far beyond the simple linguistic analyses performed by today's search engines.’
20 A new silver bullet
21 Metadata in Web commerce agree on a metadata standard for washing machines as concerns size, price, etc. create machine-readable databases and put them on the net consumers can query multiple sites simultaneously and search for highly specific, reliable, context-sensitive results
22 Metadata in science agree on metadata standards for molecules (genes, proteins, drugs), clinical phenomena, therapies... create machine-readable databases and put them on the net biomedical researchers can query multiple sites simultaneously and search for highly specific, reliable, context-sensitive results
23 A world of exhaustive, reliable metadata would be utopia (Cary Doctorow)
24 Problem 1: People lie Cheating in assigning meta-tags can confer benefits to the cheaters Metadata exists in a competitive world. Some people are crooks. Some people are cranks.
25 Semantic Web effort thus far devoted primarily to developing systems for standardized representation of web pages and web processes (= ontology of web typography) not to the harder task of developing ontologies (reliable taxonomies, term hierarchies) for the content of such web pages
26 Problem 2: People are lazy Half the pages on Geocities are called “Please title this page”
27 Problem 3: People are stupid The vast majority of the Internet's users (even those who are native speakers of English) cannot spell or punctuate Will internet users learn to accurately tag their information with whatever taxonomy and syntax they're supposed to be using?
28 even with correct XML-syntax: Jules Deryck Newco XTC Group Business Manager +32(0) (0) (0) Dendersesteenweg 17
29 errors still abound Jules Deryck Newco XTC Group Business Manager +32(0) (0) (0) Dendersesteenweg Is "Jules" the first name of the person, or of the business- card?
30 errors still abound Jules Deryck Newco XTC Group Business Manager +32(0) (0) (0) Dendersesteenweg Aartselaar Belgium Is Jules or Newco the member of XTC Group?
31 errors still abound Jules Deryck Newco XTC Group Business Manager +32(0) (0) (0) Dendersesteenweg Aartselaar Belgium Do the phone numbers and address belong to Jules or to the business?
32 Problem 4: Building good ontologies/standardized taxonomies is very difficult and the constraints imposed by OWL and similar languages make the job even harder
33 Problem 5: Ontology Impedance = semantic mismatch between ontologies ‘gene’ used in websites issued by biotech companies involved in gene patenting medical researchers interested in role of genes in predisposition to smoking insurance companies
34 Problem 6: The Concept Orientation Tom Gruber: An ontology is a specification of a conceptualization Semantic Web: specify Tom’s, and Dick’s, and Harry’s conceptualizations carefully, ensure that all are formulated in a common (XML-based) syntax Presto: conceptualizations will somehow become integrated
35 even a world of exhaustive, reliable metadata would not solve the problem of integration
36 expressing different systems of concepts in a common syntactic environment does not resolve conceptual incompatibilities
37 different conceptualizations
38 need not interconnect at all
39 we cannot make incompatible terminology-systems interconnect just by looking at concepts, or knowledge or language
40 to decide which of a plurality of competing conceptualizations to accept we need some tertium quid
41 we need, in other words, to take the world itself into account
42 Compare the way biologists resolve disagreements as to whether they mean the same thing by different words: by pointing to the objects in their lab
43
44 The Semantic Web is a machine for creating syllogisms (Clay Shirky) Humans are mortal Greeks are human Therefore, Greeks are mortal
45 Lewis Carroll No interesting poems are unpopular among people of real taste No modern poetry is free from affectation All your poems are on the subject of soap- bubbles No affected poetry is popular among people of real taste No ancient poetry is on the subject of soap- bubbles Therefore: All your poems are bad.
46 the promise of the Semantic Web it will improve all the areas of your life where you currently use syllogisms
47 Semantic Web compatibility problems should be solved automatically (by machine) Hence ontologies must be applications running in real time
48 Semantic Web methodology Get syntax right first (Conceptualism; weak expressive resource; weak Description Logics – to ensure computational tractability) and integration of ‘concepts’ will take care of itself but only at the price of Procrustean simplification
49 IFOMIS methodology Get ontology right first (use powerful logic to develop ontology as theory of reality and solve tractability problems later) only thus will we have some hope of genuine integration across different disciplines and data resources
50 Belnap “it is a good thing logicians were around before computer scientists; “if computer scientists had got there first, then we wouldn’t have numbers because arithmetic is undecidable”
51 It is a good thing philosophical ontology was around before Description Logics, because otherwise we would have only hierarchies of concepts together with abstract mathematical models and no universals or instances in reality…
52 Recall: GDB: a gene is a DNA fragment that can be transcribed and translated into a protein Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype
53 Ontology ‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’, ‘type’... ‘part’, ‘whole’, ‘function’, ‘inhere’, ‘substance’ … are ontological terms in the sense of traditional (philosophical) ontology
54 The idea of a reference ontology a theory of the kinds of entities existing in reality and of the relations between them
55 The Reference Ontology Community IFOMIS (Saarbrücken) Laboratories for Applied Ontology (Trento/Rome, Turin) Ontology Works (Baltimore) Department of Biological Structure (Seattle) Medical Ontology Research (Bethesda) The Gene Ontology / Open Biological Ontologies Consortium
56 IFOMIS’s long-term goal Build a robust high-level reference ontology THE WORLD’S FIRST INDUSTRIAL-STRENGTH PHILOSOPHY as the basis for an ontologically coherent unification of biomedical knowledge and terminology
57 Two upper-level ontologies reference BFO (Saarbrücken) – Basic Formal Ontology DOLCE (Trento/Rome)
58 Aristotle First ontologist
59 Edmund Husserl
60 Formal Ontology term coined by Husserl = the theory of those ontological structures such as part-whole, universal-particular which apply to all domains whatsoever
61 Husserl’s Logical Investigations¸1900/01 –Aristotelian theory of universals and particulars –theory of part and whole –theory of ontological dependence –the theory of boundaries and fusion
62 Formal Ontology contrasted with material or regional ontologies (compare relation between pure and applied mathematics) Husserl’s idea: If we can build a good formal ontology, this should save time and effort in building reference ontologies for each successive material domain
63 In formal ontology as in formal logic, we can grasp the properties of given structures in such a way as to establish in one go the properties of all formally similar structures
64 Compare: 1)pure mathematics (theories of structures such as order, set, function, mapping) employed in every domain 2)applied mathematics, applications of these theories = re-using the same definitions, theorems, proofs in new application domains 3)physical chemistry, biophysics, etc. = adding detail
65 Three levels of ontology 1)formal (top-level) ontology = biomedical ontology has nothing like the technology of definitions, theorems and proofs provided by pure mathematics 2) domain ontology = UMLS Semantic Network, GO, GALEN CORE 3) terminology-based ontology = UMLS, SNOMED-CT, GALEN, FMA ?????
66
67 The Concept Orientation An ontology is a consensus representation of concepts
68 ‘concept’ runs together: a)meaning shared in common by synonymous terms b)idea shared in common in the minds of those who use these terms c)universal, type, feature or property shared in common by entities in the world
69 There are more word meanings than there are universals / types of entities in reality unicorn devil canceled workshop prevented pregnancy imagined mammal fractured lip...
70 space of word meanings space of universals
71 space of word meanings space of universals space of word meanings
72 space of word meanings space of universals space of word meanings space of universals
73 space of word meanings
74 if ontological relations are defined across the whole space of word meanings rather than across the space of universals instantiated in reality then our tools for dealing with such relations are blunted
75 meningitis is_a disease of the nervous system is a statement about universals in reality
76 unicorn is_a one-horned mammal A is_a B =def. ‘A’ is narrower in meaning than ‘B’
77 The linguistic reading of ‘concept’ yields a smudgy view of reality, built out of relations like: ‘synonymous_with’ ‘associated_to’
78 Fruit Orange Vegetable SimilarTo Apfelsine SynonymWith NarrowerThan Goble & Shadbolt
79 The concept-based approach can provide some half-way coherent treatment of is_a relations
80 but it can’t cope at all with relations like part_of = def. composes, with one or more other physical units, some larger whole contains =def. is the receptacle for fluids or other substances
81 connected_to =def. Directly attached to another physical unit as tendons are connected to muscles. How can a meaning or concept be directly attached to another physical unit as tendons are connected to muscles ?
82 An example of the concept orientation Unified Medical Language System (UMLS)
83 UMLS Metathesaurus : 1 million biomedical concepts 2.8 million concept names from more than 100 controlled vocabularies and classifications built by US National Library of Medicine
84 UMLS Source Vocabularies MeSH – Medical Subject Headings … ICD International Classification of Diseases … GO – Gene Ontology … FMA – Foundational Model of Anatomy …
85 To reap the benefits of standardization we need to make ONE SYSTEM out of many different terminologies = UMLS “Semantic Network” nearest thing to an “ontology” in the UMLS
86 UMLS SN described by its authors as “An Upper Level Ontology for the Biomedical Domain” (Compare the Semantic Web initiative)
87 UMLS SN 134 Semantic Types 54 types of edges (relations) yielding a graph containing more than 6,000 edges
88 Fragment of UMLS SN
89
90
91 UMLS SN Top Level entity event physical conceptual object entity organism
92 conceptual entity Organism Attribute Finding Idea or Concept Occupation or Discipline Organization Group Group Attribute Intellectual Product Language
93 conceptual entity idea or concept functional concept body system
94 entity physical conceptual object entity idea or concept functional concept body system confusion of entity and concept
95 Functional Concept: Body system is_a Functional Concept. but: Concepts do not perform functions or have physical parts.
96 This: is not a concept
97 Confusion of Ontology and Epistemology Physical Object Substance Food Chemical Body Substance
98 Confusion of Ontology and Epistemology Chemical Viewed Structurally Functionally
99 Chemical Viewed Structurally Functionally Inorganic Organic Enzyme Biomedical or Chemical Chemical Dental Material
100 Chemical Viewed Structurally Functionally Inorganic Organic Biomedical or Chemical Chemical Dental Material Enzyme
101 The Hydraulic Equation BP = CO*PVR arterial blood pressure is directly proportional to the product of blood flow (cardiac output, CO) and peripheral vascular resistance (PVR)
102 Confusion of Ontology and Epistemology blood pressure is an Organism Function, cardiac output is a Laboratory or Test Result or Diagnostic Procedure BP = CO*PVR thus asserts that blood pressure is proportional either to a laboratory or test result or to a diagnostic procedure
103 Fragment of UMLS SN
104 UMLS Semantic Network anatomical abnormality associated_with daily or recreational activity educational activity associated with pathologic function bacterium causes experimental model of disease
105
106 GO: the Gene Ontology 3 large telephone directories of standardized designations for gene functions and products organized into hierarchies via is_a and part_of
107 When a gene is identified three important types of questions need to be addressed: 1. Where is it located in the cell? 2. What functions does it have on the molecular level? 3. To what biological processes do these functions contribute?
108 GO’s three ontologies molecular functions cellular components biological processes
109 GO is three ontologies cellular components molecular functions biological processes December 16, 2003: 1372 component terms 7271 function terms 8069 process terms
110 The Cellular Component Ontology (counterpart of anatomy) flagellum chromosome membrane cell wall nucleus
111 The Molecular Function Ontology ice nucleation protein stabilization kinase activity binding The Molecular Function ontology is (roughly) an ontology of actions on the molecular level of granularity
112 Biological Process Ontology Examples: glycolysis death adult walking behavior response to blue light = occurrents on the level of granularity of cells, organs and whole organisms
113 Each of GO’s ontologies is organized in a graph-theoretical structure involving two sorts of links or edges: is-a (= is a subtype of ) (copulation is-a biological process) part-of (cell wall part-of cell)
114
115 GO is species-independent an ontology of the unchanging universal building blocks of life (substances and processes) and of the structures they form
116
117 The Gene Ontology error prone in part because of its sloppy treatment of relations menopause part_of death
118
119 Primary aim of GO not rigorous definition and principled classification but rather: providing a practically useful framework for keeping track of the biological annotations that are applied to gene products
120 Problem’s with GO Molecular Functions anti-coagulant activity (defined as: “a substance that retards or prevents coagulation”) enzyme activity (defined as: “a substance that catalyzes”) structural molecule (defined as: “the action of a molecule that contributes to structural integrity”)
121 GO: : structural constituent of cell wall Definition: The action of a molecule that contributes to the structural integrity of a cell wall. confuses actions, which GO includes in its function ontology, with constituents, which GO includes in its cellular component ontology
122
123
124 cars red cars Cadillacs cars with radios
125 Why do these problems arise? Because GO has no clear formal understanding of the role of relations in organizing an ontology (thus also no clear understanding of the difference between a function and the activity which is the realization of a function – GO runs these two together)
126 Thesis GO can realize its goal more adequately (and avoid many coding errors) by taking ontology (especially the logic of classifications and definitions) seriously
127 Digital Anatomist Foundational Model of Anatomy (Department of Biological Structure, University of Washington, Seattle) The first crack in the wall of the Concept Orientation
128
129 Pleural Cavity Pleural Cavity Interlobar recess Interlobar recess Mesothelium of Pleura Mesothelium of Pleura Pleura(Wall of Sac) Pleura(Wall of Sac) Visceral Pleura Visceral Pleura Pleural Sac Parietal Pleura Parietal Pleura Anatomical Space Organ Cavity Organ Cavity Serous Sac Cavity Serous Sac Cavity Anatomical Structure Anatomical Structure Organ Serous Sac Mediastinal Pleura Mediastinal Pleura Tissue Organ Part Organ Subdivision Organ Subdivision Organ Component Organ Component Organ Cavity Subdivision Organ Cavity Subdivision Serous Sac Cavity Subdivision Serous Sac Cavity Subdivision part_of is_a
130 Pleural Cavity Pleural Cavity Interlobar recess Interlobar recess Mesothelium of Pleura Mesothelium of Pleura Pleura(Wall of Sac) Pleura(Wall of Sac) Visceral Pleura Visceral Pleura Pleural Sac Parietal Pleura Parietal Pleura Mediastinal Pleura Mediastinal Pleura Tissue CellOrganelle part_of Reference Ontology for Anatomy at every level of granularity
131 The Gene Ontology European Bioinformatics Institute,... Open source Transgranular Cross-Species Components, Processes, Functions The second crack in the wall
132 But: No logical structure Viciously circular definitions Poor rules for coding, definitions, treatment of relations, classifications so highly error-prone
133 New GO / OBO Reform Effort OBO = Open Biological Ontologies
134 OBO Library Gene Ontology MGED Ontology Cell Ontology Disease Ontology Sequence Ontology Fungal Ontology Plant Ontology Mouse Anatomy Ontology Mouse Development Ontology...
135 coupled with Relations Ontology (IFOMIS) suite of relations for biomedical ontology to be submitted to CEN as basis for standardization of biomedical ontologies + alignment of FMA and GALEN
136
137 E N D E