VR. Formal Principles for Biomedical Ontologies Barry Smith

Slides:



Advertisements
Similar presentations
Enhancing GO for the sake of clinical bionformatics Anand Kumar IFOMIS, University of Leipzig/Saarbrücken.
Advertisements

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Knowledge Representation
Semiotics and Ontologies. Ontologies contain categories, lexicons contain word senses, terminologies contain terms, directories contain addresses, catalogs.
What is Ontology? Dictionary:A branch of metaphysics concerned with the nature and relations of being. Barry Smith:The science of what is, of the kinds.
 Cell cycle – regular pattern of eukaryotic cells that includes growth, DNA replication and cell division  All eukaryotic cells go through the same.
Gene Ontology John Pinney
The Gene Ontology Barry Smith March 2004.
The Role of Foundational Relations in the Alignment of Biomedical Ontologies Barry Smith and Cornelius Rosse.
1 Ontology in 15 Minutes Barry Smith. 2 Main obstacle to integrating genetic and EHR data No facility for dealing with time and instances (particulars)
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
FMA: a domain reference ontology Comments on Cornelius Rosse’s talk Anita Burgun WG6 meeting, Rome 29 Apr- 2 May 2005.
Medical Ontologies: An Overview Barry Smith
Thomas Bittner and Barry Smith IFOMIS (Saarbrücken) Normalizing Medical Ontologies Using Basic Formal Ontology.
The Ontology of the Gene Ontology Barry Smith Jennifer Williams Steffen Schulze-Kremer
STOP Barry Smith Smart Terminologies via Ontological Principles.
On the Application of Formal Principles to Life Science Data: A Case Study in the Gene Ontology Barry Smith * Jacob Köhler † Anand Kumar * *
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
AN INTRODUCTION TO BIOMEDICAL ONTOLOGY Barry Smith University at Buffalo 1.
VT. From Basic Formal Ontology to Medicine Barry Smith and Anand Kumar.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 1: Introduction to Decision Support Systems Decision Support.
Biological Ontologies Neocles Leontis April 20, 2005.
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
Internet tools for genomic analysis: part 2
Ontological Model for Colon Carcinoma: A Case Study for Knowledge Representation in Clinical Bioinformatics Kumar A 1,2, Yip L 3, Jaremek M 2, Scheib H.
ifomis.de 1 Outline Part 0: HL7 RIM Part 1: Survey of GO and its problems Part 2: Extending GO to make a full ontology Part 3: Conclusion.
Reference Ontologies, Application Ontologies, Terminology Ontologies Barry Smith
Systems Engineering Foundations of Software Systems Integration Peter Denno, Allison Barnard Feeney Manufacturing Engineering Laboratory National Institute.
HL7 RIM Exegesis and Critique Regenstrief Institute, November 8, 2005 Barry Smith Director National Center for Ontological Research.
Auditing Logical Access in a Network Environment Presented By, Eric Booker and Mark Ren New York State Comptroller’s Office Network Security Unit.
On Roles of Models in Information Systems (Arne Sølvberg) Gustavo Carvalho 26 de Agosto de 2010.
Chapter 5CSA 217 Design in Construction Chapter 5 1.
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
-Nikhil Bhatia 28 th October What is RUP? Central Elements of RUP Project Lifecycle Phases Six Engineering Disciplines Three Supporting Disciplines.
Cell membranes, Membrane lipids, Membrane proteins.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
Knowledge representation
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 20 Object-Oriented.
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
1 Enriching and Designing Metaschemas for the UMLS Semantic Network Department of Computer Science New Jersey Institute of Technology Yehoshua Perl James.
Chapter 7 System models.
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
System models l Abstract descriptions of systems whose requirements are being analysed.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
Sommerville 2004,Mejia-Alvarez 2009Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11.
What is an Ontology? A representation of knowledge in a domain In theory Thomas Gruber (1993) “An ontology is a formal, explicit specification of a shared.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Cell Communication Chapter Cell Communication: An Overview  Cells communicate with one another through Direct channels of communication Specific.
Mining the Biomedical Research Literature Ken Baclawski.
Development and Genes Part 1. 2 Development is the process of timed genetic controlled changes that occurs in an organism’s life cycle. Mitosis Cell differentiation.
Some Thoughts to Consider 8 How difficult is it to get a group of people, or a group of companies, or a group of nations to agree on a particular ontology?
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
Winter 2007SEG2101 Chapter 31 Chapter 3 Requirements Specifications.
Approach to building ontologies A high-level view Chris Wroe.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
Life Science. Explain that cells are the basic unit of structures and function of living organisms. Cells are the basic unit of structures of living organisms.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Protein. Protein and Roles 1: biological process unknown 1.1 Structural categories 1.2 organism categories 1.3 cellular component o unlocalized.
Ecological Interface Design Overview Park Young Ho Dept. of Nuclear & Quantum Engineering Korea Advanced Institute of Science and Technology May
1 Standards and Ontology Barry Smith
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
The Systems Engineering Context
Introduction to Applied and Theoretical Ontology Barry Smith
What is Ontology? s Dictionary:A branch of metaphysics concerned with the nature and relations of being. Barry Smith:The science of what is, of.
Presentation transcript:

VR

Formal Principles for Biomedical Ontologies Barry Smith

3 Three levels of ontology

ifomis.de 4 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture

ifomis.de 5 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture

ifomis.de 6 Three levels of ontology 1)formal (top-level) ontology dealing with categories employed in every domain: object, event, whole, part, instance, class 2) domain ontology, applies top-level system to a particular domain cell, gene, drug, disease, therapy 3) terminology-based ontology large, lower-level system Dupuytren’s disease of palm, nodules with no contracture

ifomis.de 7 Compare: 1)pure mathematics (re-usable theories of structures such as order, set, function, mapping) 2)applied mathematics, applications of these theories = re-using the same definitions, theorems, proofs in new application domains 3)physical chemistry, biophysics, etc. = adding detail

ifomis.de 8 Three levels of biomedical ontology 1)formal (top-level) ontology = medical ontology has nothing like the technology of re-usable definitions, theorems and proofs provided by pure mathematics 2) domain ontology = UMLS Semantic Network, GALEN CORE 3) terminology-based ontology = UMLS, SNOMED-CT, GALEN, FMA ?????

ifomis.de 9 Description Logic, Protégé, and other tools for supporting automatic reasoning do not fill this gap they do not provide theories of classes, functions, processes, etc. rather: successful coding in a DL-framework presupposes that such theories have already been applied in the very construction of the terminology-based ontology

ifomis.de 10 IFOMIS Institute for Formal Ontology and Medical Information Science, mission: use basic principles of philosophical ontology, traditional theories of classification and definition for quality assurance and alignment of biomedical ontologies

ifomis.de 11 Strategy Part 1: Survey of GO Part 2: Provide principles for building biomedical ontologies derived from formal (top-level) ontology, and illustrate how they can help in quality assurance of terminology- based ontologies like GO Part 3: Show how it can be done right

ifomis.de 12 Part One Survey of GO

ifomis.de 13 GO is three ontologies cellular components molecular functions biological processes December 16, 2003: 1372 component terms 7271 function terms 8069 process terms

ifomis.de 14 GO an impressive achievement used by over 20 genome database and many other groups in academia and industry successful methodology, much imitated now part of OBO (open biological ontologies) consortium Here I focus on problems / errors GO here is just an example

ifomis.de 15 Primary aim of GO not rigorous definition and principled classification but rather: providing a practically useful framework for keeping track of the biological annotations that are applied to gene products

ifomis.de 16 Each of GO’s ontologies is organized in a graph-theoretical structure involving two sorts of links or edges: is-a (epithelial cell differentiation is-a cell differentiation) part-of (axonemal microtubule part-of axoneme)

ifomis.de 17 This graph-theoretic architecture to designed to help humans, who can use the graphs to locate the features and attributes they are addressing in their work and thus to determine the designated terms for these features and attributes within GO’s ‘controlled vocabulary.’

ifomis.de 18 GO’s three ontologies When a gene is identified, three important types of questions need to be addressed: Where is it located in the cell? What functions does it have on the molecular level? And to what biological processes do these functions contribute?

ifomis.de 19 GO’s three ontologies molecular functions cellular constituents biological processes

ifomis.de 20 The Cellular Component Ontology (counterpart of anatomy) consists of terms such as flagellum, chromosome, ferritin, extracellular matrix and virion Cellular components are physical and measurable entities. They are, in the terminology of philosophical ontology, objects or things (independent continuants). They endure self-identically through time while undergoing changes of various sorts Cellular component embraces also the extracellular environment of cells and cells themselves

ifomis.de 21 No organisms GO does not include terms for specific organisms, not even for single-celled organisms

ifomis.de 22 The Molecular Function Ontology molecular function = the action characteristic of a gene product. Actions such as ice nucleation or protein stabilization do not endure but rather occur.

ifomis.de 23 The Molecular Function Ontology Originally included terms such as anti- coagulant (defined as: ‘a substance that retards or prevents coagulation’) and enzyme (defined as: ‘a substance … that catalyzes’) These refer neither to functions nor to actions but rather to components.

ifomis.de 24 The Molecular Activity Ontology Confusion remedied to a degree by policy change of March 2003: ‘All GO molecular function term names [with the exception of the parent term molecular function and of the whole node binding] are to be appended with the word “activity”.’

ifomis.de 25 ‘Function’ = ‘Activity’ Thus the term ‘structural molecule,’ which is defined as meaning: ‘the action of a molecule that contributes to structural integrity,’ is amended to ‘structural molecule activity’

ifomis.de 26 still problem’s with GO Molecular Function Definitions anti-coagulant activity (defined as: “a substance that retards or prevents coagulation”) enzyme activity (defined as: “a substance that catalyzes”)

ifomis.de 27 … and there are still problems with Molecular Function terms GO: : structural constituent of cell wall

ifomis.de 28 structural constituent of cell wall Definition: The action of a molecule that contributes to the structural integrity of a cell wall. confuses actions, which GO includes in its function ontology, with constituents, which GO includes in its cellular component ontology

ifomis.de 29 extracellular matrix structural constituent puparial glue (sensu Diptera) structural constituent of bone structural constituent of chorion (sensu Insecta) structural constituent of chromatin structural constituent of cuticle structural constituent of cytoskeleton structural constituent of epidermis structural constituent of eye lens structural constituent of muscle structural constituent of myelin sheath structural constituent of nuclear pore structural constituent of peritrophic membrane (sensu Insecta) structural constituent of ribosome structural constituent of tooth enamel structural constituent of vitelline membrane (sensu Insecta)

ifomis.de 30 The Biological Process Ontology biological process: ‘A phenomenon marked by changes that lead to a particular result, mediated by one or more gene products.’ Examples: glycolysis, death, adult walking behavior response to blue light

ifomis.de 31 Occurrents Both molecular activity and biological process terms refer to what philosophical ontologists call occurrents = entities which do not endure through time but rather unfold themselves in successive temporal phases. Occurrents can be segmented into parts along the temporal dimension. Continuants exist in toto in every instant at which they exist at all.

ifomis.de 32 Molecular functions and biological processes are closely interrelated E.g. the process anti-apoptosis involves the molecular function apoptosis inhibitor activity. How can GO express such relations?

ifomis.de 33 Are they a matter of granularity? ‘A biological process is accomplished via one or more ordered assemblies of molecular functions.’ ??? Molecular activities = building blocks of biologica processes ??? So: Functions are parts of processes But no:

ifomis.de 34 GO’s three ontologies are separate No links or edges defined between them molecular functions cellular constituents biological processes

ifomis.de 35 Question: How understand granularity if not in terms of parthood?

ifomis.de 36 Molecular functions renamed ‘activities’, because ‘activity’ unlike ‘process’, connotes agency ? but molecules are not agents hypothesis: the term ‘function’ was used for the molecular function ontology because the activities in question are functional in relation to the pertinent organism.

ifomis.de 37 Functions A function is functional = beneficial to the organism If an organism-part has a function, this is because the functioning of this organism- part is beneficial to the organism The function of the heart is to pump blood Not: the function of the hip is to financially support hip-replacement surgeons

ifomis.de 38 Some processes are functionings E.g. pumping blood

ifomis.de 39  Two sorts of processes 1.Functionings (realizations of functions = beneficial to the organism) 2.Other processes (e.g. the result of external interventions) Cf. difference between physiology and pathology

ifomis.de 40 GO not clear about this distinction transport: The directed movement of substances (such as … ions) into, out of, or within a cell cell growth and/or maintenance: Any process required for the survival and growth of a cell Synonym: cell physiology transport is-a cell growth and/or maintenance but (GO: ) viral intracellular protein transport is-a transport

ifomis.de 41 Why do these problems arise? GO has no clear understanding of the role of temporal relations in organizing an ontology (thus also no clear understanding of the difference between a function and the activity which is the realization of a function)

ifomis.de 42 GO excludes organisms from its scope (they are of the wrong granularity) Yet each process or function requires some bearer or bearers which it is the process or function of. Processes are dependent on their bearers (Theory of dependence vs. independence part of formal ontology) (Theory of continuants vs. occurrents part of formal ontology)

ifomis.de 43 Some formal ontology Components are independent continuants Functions are dependent continuants (the function of an object exists continuously in time, just like the object which has the function; and it exists even when it is not being exercised) Processes are (dependent) occurrents

ifomis.de 44 More generally: Continuants can be divided into independent (objects, things, components) and dependent (features, attributes, conditions, functions, roles, qualities …) All occurrents are dependent entities. Every occurrent is dependent for its existence on one or more continuants. A change is always a change in some continuant object.

ifomis.de 45

ifomis.de 46 Part Two Principles of Biomedical Ontologies and their use in quality assurance of terminology-based ontologies

ifomis.de 47 Principle of Temporal Coherence An ontology should rigorously distinguish continuants from occurrents. (Anatomy is a science of continuants)

ifomis.de 48 Principle of Dependence If an ontology recognizes a dependent entity then it (or a linked ontology) should recognize also the relevant class of bearers Part of our aim here is to lay down principles which can support such linkability

ifomis.de 49 Linking to external ontologies can also help to link together GO’s own three separate parts

ifomis.de 50 GO’s three ontologies molecular functions cellular constituent s biological processes  dependent   independent

ifomis.de 51 GO’s three ontologies molecular functions cellular constituent s organism- level biological processes cellular processes

ifomis.de 52 ‘part-of’; ‘is dependent on’ molecular functions molecule complexe s cellular processes cellular constituents organism- level biological processes organisms

ifomis.de 53 molecular functions molecule complexe s cellular processes cellular constituents organism- level biological processes organisms

ifomis.de 54 molecule complexe s cellular constituent s molecular function s cellular functions organism- level biological functions organisms molecular processe s cellular processes organism- level biological processes

ifomis.de 55 molecule complexe s cellular constituent s molecular function s cellular functions organism- level biological functions organisms molecular processe s cellular processes organism- level biological processes functionings

ifomis.de 56 GO must be linked with other, neighboring ontologies GO has: adult walking behavior but not adult GO has: eye pigmentation but not eye GO has: response to blue light but not light (or blue) 94% of words used in GO terms are not GO terms Part of the solution “Medical FactNet” (NLM, 10am tomorrow)

ifomis.de 57 GO taking steps in this direction Linking to a good external ontology of organism types (to solve some of the problems with sensu) It needs to link further to a good external ontology of anatomy, to solve the location problem and to a good external ontology of coarse- grained reality, to solve the adult walking behavior problem Human beings know what ‘walking’ means

ifomis.de 58 Human beings know what adults are older than embryos GO needs to be linked to ontology of development and in general to resources for reasoning about time and change

ifomis.de 59 but such linkages are possible only if GO itself has a coherent formal architecture

ifomis.de 60 Principle of Univocity univocity: terms should have the same meanings (and thus point to the same referents) on every occasion of use UMLS-Semantic Network: ‘organization’ = body plan (anatomy) ‘organization’ = social organization

ifomis.de 61 Polysemy of GO’s part-of – membrane part-of cell, intended to mean “a membrane is a part-of any cell” – flagellum part-of cell, intended to mean “a flagellum is part-of some cells” – replication fork part-of cell cycle, intended to mean: “a replication fork is part-of the nucleoplasm only during certain times of the cell cycle”

ifomis.de 62 Three meanings of ‘part-of ’ ‘part-of’ = ‘can be part of’ (flagellum part-of cell) ‘part-of’ = ‘is sometimes part of’ (replication fork part-of the nucleoplasm) ‘part-of’ = ‘is included as a sublist in’

ifomis.de 63 THE GOAL IS: not to impose basic principles of classification and definition on biologists – All the principles presented here should be conceived not as iron requirements but rather as rules of thumb – deviation from which is often marked by characteristic families of coding errors

ifomis.de 64 example [GO: ] host cell cytoplasm, defined as: The cytoplasm of a host cell [GO: ] host, defined as: Any organism in which another organism, especially a parasite or symbiont, spends part or all of its life cycle and from which it obtains nourishment and/or protection

ifomis.de 65 Why is this an error? because organisms do not fall within the scope of GO An organism is not a cellular component, and it is not a molecular function, and not a biological process, either

ifomis.de 66 host cell cytoplasm part-of host breaks GO’s own granularity constraints

ifomis.de 67 Why univocity? 1.humans are good at disambiguating ambiguous expressions, machines not 2. quality assurance and ontology maintenance 3. GO, SNOMED, etc., are designed to constitute ‘controlled vocabularies’

ifomis.de 68 Quality assurance and ontology maintenance must be automated As GO increases in size and scope it will “be increasingly difficult to maintain the semantic consistency we desire without software tools that perform consistency checks and controlled updates”. The addition of each new term will require the curator to understand the entire structure of GO in order to avoid redundancy and to ensure that all appropriate linkages are made with other terms.

ifomis.de 69 The purpose of a ‘controlled vocabulary’ = to ensure that the same terms are used by different research groups with the same meanings this has implications also for the syntax of GO terms (= the way terms are compounded together out of other terms)

ifomis.de 70 Univocity and syntax The story of ‘ / ’

ifomis.de 71 / GO: microtubule/kinetochore interaction =df Physical interaction between microtubules and chromatin via proteins making up the kinetochore complex

ifomis.de 72 / GO: ciliary/flagellar motility =df Locomotion due to movement of cilia or flagella.

ifomis.de 73 / GO: negative regulation of chromatin assembly/disassembly =df Any process that stops, prevents or reduces the rate of chromatin assembly and/or disassembly

ifomis.de 74 / GO: G1/S transition of mitotic cell cycle defined as: Progression from G1 phase to S phase of the standard mitotic cell cycle.

ifomis.de 75 / GO: interpretation of nuclear/cytoplasmic to regulate cell growth =df The process where the size of the nucleus with respect to its cytoplasm signals the cell to grow or stop growing.

ifomis.de 76 / GO: hexuronate (glucuronate/galacturonate) porter activity =df Catalysis of the reaction: hexuronate(out) + cation(out) = hexuronate(in) + cation(in)

ifomis.de 77 Problems with GO’s compositionality / (slash)286 : (semi-colon)177, (comma)1206 and180

ifomis.de 78 comma cytokinesis, site selection

ifomis.de 79 plurals biological process physiological processes cellular process cell growth and/or maintenance

ifomis.de 80 specification39complex563 formation; forming 142 regulator; regulatory; regulated; regulation 1326 determination; determinacy 56acting on146 with54constituting35 from141constituent; constitutive29 in51dependent182 via164sensu469

ifomis.de 81 Questions regarding operators How does ‘constituent’ relate to ‘component’ If A within B then is A part-of B or included- in-the-interior-of B ? Does via mean by means of or along the path of ? How is ‘un-’ related to ‘not’ (how is ‘unlocalized’ related to ‘not localized’)

ifomis.de 82 ‘involved in’ term-forming operator (reflection of GO’s limited resources for expressing relations): hydrolase activity, acting on acid anhydrides, involved in cellular and subcellular movement asymmetric protein localization involved in cell fate commitment cell-cell signaling involved in cell fate commitment protein secretion involved in cell fate commitment

ifomis.de 83 ‘involved in’ hydrolase activity, acting on acid anhydrides, involved in cellular and subcellular movement This is a term because GO does not have the resources to express ‘is-involved-in’ as a relation between terms note problems with commas

ifomis.de 84 ‘involved in’ hydrolase activity, acting on acid anhydrides, involved in cellular and subcellular movement is-a hydrolase activity, acting on acid anhydrides

ifomis.de 85 ‘involved in’ hydrolase activity, acting on acid anhydrides, involved in cellular and subcellular movement is-a hydrolase activity, acting on acid anhydrides is ok: hydrolase activity, acting on anhydrides can but need not be involved in cellular and subcellular movement

ifomis.de 86 ‘involved in’ asymmetric protein localization involved in cell fate commitment is-a cell fate commitment should be a part-of relation (compare: breathing involved in running is a running)

ifomis.de 87 ‘involved in’ cell-cell signaling involved in cell fate commitment is-a cell fate commitment ditto: should be a part-of relation

ifomis.de 88 these, though, are good: asymmetric protein localization involved in cell fate commitment is-a asymmetric protein localization cell-cell signaling involved in cell fate commitment is-a cell-cell signaling

ifomis.de 89 ‘involved in’ protein secretion involved in cell fate commitment synonym of protein secretion are there instances of protein secretion not involved in cell fate commitment? … Problems with GO’s peculiar use of ‘synonym’

ifomis.de 90 Consequences of inconsistent and/or indeterminate use of operators there are 29.42% distinct terms within GO which contain one or more polysemous operators but these terms receive only 13.96% of the annotations present within GO Hypothesis: This lower percentage of annotations reflects the fact that poorly defined operators are not well understood by annotators, who thus avoid the corresponding terms

ifomis.de 91 Principle of Compositionality The meanings of compound terms should be determined 1. by the meanings of constituent terms together with 2. the rules governing the syntactic operators

ifomis.de 92 Principle of Objectivity which classes exist is not a function of our biological knowledge. (Terms such as ‘unclassified’ or ‘unknown ligand’ or ‘not otherwise classified as peptides’ do not designate biological natural kinds.)

ifomis.de 93 GO: cellular component unknown cellular component unknown is-a cellular component unlocalized is-a cellular component Holliday junction helicase complex is-a unlocalized

ifomis.de 94 GO’s excuse ‘unlocalized’ is used as a placeholder only but automatic information retrieval systems cannot distinguish it from other, genuine class names formal tools exist which can deal with the addition of knowledge into a classification system without the need to create fake classes (Theory of Granular Partitions)

ifomis.de 95 Principle of Positivity Class names should be positive. Logical complements of classes are not themselves classes. (Terms such as ‘non-mammal’ or ‘non- membrane’ or ‘invertebrate’ or do not designate natural kinds.)

ifomis.de 96 Terms such as ‘Veterinary proprietary drug AND/OR biological’ * do not designate natural kinds. (Which biological classes exist is not a matter of logic.) *has 2532 children in SNOMED-CT

ifomis.de 97 Principle of Explicitness if a link between two classes holds only under certain specific restrictions, then this restriction should be made explicit in the statement of the corresponding link-axiom cf. GO’s sensu

ifomis.de 98 GO can in practice be used only by trained biologists (with know how) whether a GO-term truly stands in the is_a relation depends e.g. on the type of organism involved glycosome is part-of cytoplasm only for Kinetoplastidae Computers have no counterpart of such context-dependent know-how

ifomis.de 99 Principle of Single Inheritance no class in a classificatory hierarchy should have more than one parent on the immediate higher level

ifomis.de 100 Principle of Taxonomic Levels the terms in a classificatory hierarchy should be divided into predetermined levels (analogous to the levels of kingdom, phylum, class, order, etc., in traditional biology). depth in GO’s hierarchies not determinate because of multiple inheritance

ifomis.de 101 Principle of Partonomic Levels Terms in a partonomic hierarchy should be divided into predetermined granularity levels, for example: organism, organ, cell, molecule, etc.) (GO is about to break physiological process into 'cell physiological process' and 'organism physiological process'.) = take granularity seriously

ifomis.de 102 Principle of Exhaustiveness the classes on any given level should exhaust the domain of the classificatory hierarchy.

ifomis.de 103 Single Inheritance + Exhaustiveness = JEPD for: Jointly Exhaustive and Pairwise Disjoint Exhaustiveness often difficult to satisfy in the realm of biological phenomena; but its acceptance as an ideal is presupposed as a goal by every scientist. Single inheritance accepted in all traditional (species-genus) classifications, now under threat because multiple inheritances is a computationally useful device (allows one to avoid certain kinds of combinatory explosion).

ifomis.de 104 Problems with multiple inheritance B C is-a 1 is-a 2 A ‘is-a’ no longer univocal

ifomis.de 105 GO’s ‘is-a’ is pressed into service to mean a variety of different things the resulting ambiguities make the rules for correct coding difficult to communicate to human curators in terms of generally intelligible principles they also serve as obstacles to integration with neighboring ontologies

ifomis.de 106 Problems with multiple inheritance B C is-a 1 is-a 2 A E D ‘sibling’ is no longer determinate Principle of levels is violated

ifomis.de 107

ifomis.de 108

ifomis.de 109 A storage vacuole is not a special kind of vacuole a box used for storage is not a special kind of box

ifomis.de 110

ifomis.de 111 Another term-forming operator lytic vacuole within a protein storage vacuole lytic vacuole within a protein storage vacuole is-a protein storage vacuole time-out within a baseball game is-a baseball game embryo within a uterus is-a uterus

ifomis.de 112 Problems with Location is-located-at / is-located-in and similar relations need to be expressed in GO via some combination of ‘is-a’ and ‘part-of’ … is-a unlocalized is-a site of … within … … in …

ifomis.de 113 Problems with location extrinsic to membrane part-of membrane extrinsic to plasma membrane part-of plasma membrane extrinsic to vacuolar membrane part-of vacuolar membrane

ifomis.de 114 Differentiation and Development development cellular process cell differentiation

ifomis.de 115 Cell differentation is-a development But according to GO’s own definitions the agent or subject of differentiation is the cell, while the agent or subject of development is the whole organism (again: GO has problems in keeping track of entities on differerent levels of granularity)

ifomis.de 116 cell differentiation is-a development but: hemocyte differentiation hemocyte development part-of

ifomis.de 117 GO: : garland cell differentiation Definition: Development of garland cells, a small group of nephrocytes which take up waste materials from the hemolymph by endocytosis. (Illustrates GO’s problems with definitions)

ifomis.de 118

ifomis.de 119 Part Three How to do things right so far only scratched the surface: sensu synonyms GO’s definitions GO’s ‘logical relationships’

ifomis.de 120 Principles for GO terms Temporal coherence Dependence Univocity Compositionality Objectivity Positivity Explicitness Taxonomic Levels Partonomic Levels Single Inheritance Exhaustiveness

ifomis.de 121 Should these principles be satisfied? Michael Ashburner: GO’s philosophy from the beginning was ‘just in time’ - that is, we made no great attempt to ‘complete’ the ontologies …. If you try and ‘complete’ an ontology, or worse: try and ‘get it right,’ then you will fail …

ifomis.de 122 Can these principles be satisfied? Compare GO with Foundational Model of Anatomy (FMA)

ifomis.de 123 PrincipleGOFMA Temporal coherenceNoN/A DependenceNoN/A UnivocityNoYes CompositionalityNoYes ObjectivityNoYes PositivityNoYes ExplicitnessNoN/A Taxonomic LevelsNoYes Partonomic LevelsNoYes Single InheritanceNoYes ExhaustivenessNo

ifomis.de 124 The End PrincipleGOFMA Temporal coherence NoN/A DependenceNoN/A UnivocityNoYes CompositionalityNoYes ObjectivityNoYes PositivityNoYes ExplicitnessNoN/A Taxonomic Levels NoYes Partonomic Levels NoYes Single Inheritance NoYes ExhaustivenessNo

ifomis.de 125 Is GO an ontology GO a controlled vocabulary = (ramshackle) syntactic regimentation but because is-a and part-of are not given uniform readings, this does NOT mean the sort of semantic regimentation which would amount to an ontology in the proper sense of the word

ifomis.de 126 rules for definitions intelligibility: the terms used in a definition should be simpler (more intelligible) than the term to be defined definitions: do not confuse definitions with the communication of new knowledge

ifomis.de 127 Principle of Substitutability in all so-called extensional contexts a defined term should be substitutable by its definition in such a way that the result is both grammatically correct and has the same truth-value as the sentence with which we begin GO: : toxin activity Definition: Acts as to cause injury to other living organisms.

ifomis.de 128 substitutability There is toxin activity here There is acts as to cause injury to other living organisms here

ifomis.de 129 Defining is-a A is-a B = every instance of A is an instance of B A is-a B = A and B are natural kinds and every instance of A is an instance of B A is-a B = A and B are natural kinds and every instance of A is as a matter of necessity an instance of B

ifomis.de 130 Solutions to these problems ‘part_of’ should mean: part_of