Introduction to Applied and Theoretical Ontology Barry Smith http://ontologist.com
The Challenge of Biomedical Research Each (clinical, pathological, genetic, proteomic, pharmacological …) information system uses its own terminology and category system But biomedical research demands the ability to navigate through all such information systems How can we overcome the incompatibilities which become apparent when data from different sources is combined?
Database and terminology standardization is desparately needed in medical and bioinformatics to enable the huge amounts of existing data to be fused together automatically
Current standard solution The Unified Medical Language System (UMLS) a metathesaurus of some 100 source vocabularies: SNOMED ICD-10 MeSH – Medical Subject Headings Foundational Model of Anatomy LOINC (Logical Observation Identifiers Names and Codes) Gene Ontology HL7
compiled by US National Library of Medicine, Bethesda MD UMLS Metathesaurus > 1,800,000 Concepts > 10,000,000 Relations compiled by US National Library of Medicine, Bethesda MD
Problem The Source Vocabularies contain bad coding
MeSH Organisms Plants Plant Components Plant Components, Aerial Flowering Tops Flowers Pollen Confusion of mass and count senses of ‘substance’
both_testes is_a testis SNOMED both_testes is_a testis
Problem The UMLS Source Vocabularies are Mutually Inconsistent
Representation of Blood in SNOMED Blood is_a Tissue
Representation of Blood in MeSH Blood is_a Bodily Fluid
How to make ONE SYSTEM out of different source terminologies? Through the UMLS Semantic Network 134 Semantic Types 55 Links (is_a, part_of, etc.) built by linguists for the sake of your health and well-being …
built by Saussurian linguists AND MANDATED BY THE US FEDERAL GOVERNMENT for the sake of your health and well-being …
UMLS Semantic Network entity event physical conceptual object entity organism
Occupation or Discipline conceptual entity Organism Attribute Finding Idea or Concept Occupation or Discipline Organization Group Group Attribute Intellectual Product Language
Occupation or Discipline conceptual entity Organism Attribute Finding Idea or Concept Occupation or Discipline Organization Group Group Attribute Intellectual Product Language
Idea or Concept Functional Concept Qualitative Concept Quantitative Concept Spatial Concept Body Location or Region Body Space or Junction Geographic Area Molecular Sequence Amino Acid Sequence Carbohydrate Sequence Nucleotide Sequence
Bad Zwischenahn is an Idea or Concept
Idea or Concept Functional Concept Qualitative Concept Quantitative Concept Spatial Concept Body Location or Region Body Space or Junction Geographic Area Molecular Sequence Amino Acid Sequence Carbohydrate Sequence Nucleotide Sequence
organism anatomical structure fully formed anatomical structure entity physical conceptual object entity organism anatomical structure fully formed anatomical structure body part, organ or organ component
entity physical conceptual object entity idea or concept functional concept body system
Musculo-Skeletal System etc. Body System Circulatory System Nervous System Immune System Musculo-Skeletal System etc.
Your digestive system, according to UMLS, is a conceptual entity
GO: the Gene Ontology 3 large telephone directories of standardized designations for gene functions and products designed to cover the whole of biology model for fungal ontology, plant ontology, drosophila ontology, etc.
GO: the Gene Ontology GO organized into 3 hierarchies via is_a and part_of
The intended meaning of part-of as explained in the GO Usage Guide is: « can be a part of » GO axiom: flagellum part-of cell, means: “a flagellum is part-of some cells”
GO divided into three disjoint term hierarchies cellular component ontology molecular function ontology biological process ontology flagellum, chromosome, cell ice nucleation, binding, protein stabilization glycolysis, death
GO divided into three disjoint term hierarchies = no is_a and no part_of relations between them How are functions and processes linked together? cellular component ontology molecular function ontology biological process ontology
Definition of «Function» UMLS Semantic Network: Functional Concept =df A concept which is of interest because it pertains to the carrying out of a process or activity. GO: Molecular Function =df the action characteristic of a gene product.
UMLS brings clarity On March 2003 all nodes in the Molecular Function ontology (except the root) had ‘activity’ added to their names Function = activity
Confusion of Function and Activity If function = activity (= functioning) But then how deal with dormant/suppressed functions?
How are the ontologies related? Function = “the action characteristic of a gene product.” Process = “phenomenon marked by changes that lead to a particular result, mediated by one or more gene products”
Result: constant coding errors result from lack of clear principles as concerns what basic notions like ‘function’, ‘process’, ‘part’ mean
Examples of GO Molecular Functions anti-coagulant (defined as: “a substance that retards or prevents coagulation”) enzyme (defined as: “a substance that catalyzes”) structural molecule (defined as: “the action of a molecule that contributes to structural integrity”)
Problems with Bioinformatics Terminology Systems Circular definitions Confusion of use and mention Confusion of concepts and objects Confusion of concepts and classes Confusion of terms and objects Confusion knowledge with what is known Simple stupidity … all of which lead to poor coding
These problems are derived 1. from the drive for rapid population of bioinformatics databases -- for funding bodies quantity overwhelms quality (KR mentality: quick and dirty) 2. from ignorance of the basic principles of ontology (and logic) 3. from relativism/reductionism of linguists
The problem Different communities of medical researchers use different and often incompatible category systems in expressing the results of their work
The solution “ONTOLOGY” Remove “Ontology Impedance” But what does “ontology” mean?
Two alternative readings Ontologies are oriented around terms or concepts = currently popular IT conception Ontologies are oriented around the entities in reality = traditional philosophical conception, embraced also by IFOMIS
Ontology as a branch of philosophy seeks to establish the science of the kinds and structures of objects, properties, events, processes and relations in every domain of reality
Ontology a kind of generalized chemistry or zoology (Aristotle’s ontology grew out of biological classification)
world’s first ontologist Aristotle world’s first ontologist
World‘s first ontology (from Porphyry’s Commentary on Aristotle’s Categories) Porphyry‘s Tree Ca. 1514
Linnaean Ontology
Ontology is distinguished from the special sciences it seeks to study all of the various types of entities existing at all levels of granularity
and to establish how they hang together to form a single whole (‘reality’ or ‘being’)
different concept/terminology systems
need not interconnect at all for example they may relate to entities of different granularity
we cannot make incompatible terminology-systems interconnect just by looking at concepts, or knowledge or language
we cannot make incompatible terminology-systems interconnect by staring at the terminology systems themselves
to decide which of a plurality of competing definitions to accept we need some tertium quid
we need, in other words, to take the world itself into account
= basic formal ontology BFO = basic formal ontology
BFO ontology is defined not as the ‘standardization’ or ‘specification’ of conceptualizations (not as a branch of knowledge or concept engineering) but as an inventory of the entities existing in reality
The BFO framework will solve the problem of ontological impedance and provide tools for quality-control on the output of computer applications
BFO not a computer application but a Reference Ontology (something like old-fashioned metaphysics)
Reference Ontology a theory of a domain of entities in the world
BFO not just a system of categories but a formal theory with definitions, axioms, theorems designed to provide the resources for reference ontologies for specific domains of sufficient richness that terminological incompatibilities can be resolved intelligently rather than by brute force
Proposed solution distinguish two separate tasks: - the task of developing computer applications capable of running in real time the task of developing an expressively rich framework of a sort which will allow us to resolve incompatibilities between definitions and formulate intuitive and reliable principles for database curation
Reference Ontology a theory of the tertium quid – called reality – needed to hand-callibrate database/terminology systems
Methodology Get ontology right first (realism; descriptive adequacy; rather powerful logic); solve tractability problems later
The Reference Ontology Community IFOMIS (Leipzig) Laboratories for Applied Ontology (Trento/Rome, Turin) Foundational Ontology Project (Leeds) Ontology Works (Baltimore) Ontek Corporation (Buffalo/Leeds) Language and Computing (L&C) (Belgium/Philadelphia)
Domains of Current Work IFOMIS Leipzig: Medicine, Bioinformatics Laboratories for Applied Ontology Trento/Rome: Ontology of Cognition/Language Turin: Law Foundational Ontology Project: Space, Physics Ontology Works: Genetics, Molecular Biology Ontek Corporation: Biological Systematics Language and Computing: Natural Language Understanding
THE END