The Ontology of the Gene Ontology Barry Smith Jennifer Williams Steffen Schulze-Kremer
2 The Prime Directive As the right of each sentient species to live in accordance with its normal cultural evolution is considered sacred, no Star Fleet personnel may interfere with the healthy development of alien life and culture. Such interference includes the introduction of superior knowledge, strength, or technology to a world whose society is incapable of handling such advantages wisely.
ifomis.de3 The Bioinformatics Prime Directive no computer scientist may interfere with the information resources provided by biologists
ifomis.de4 The Story of GONG Computer scientists develop browsers, query-interfaces, tools for statistical analysis or for cross- ontology mapping which take the biological information as something inviolable
ifomis.de5 IFOMIS: Renegade StarTroop Institute for Formal Ontology and Medical Information Science Faculty of Medicine University of Leipzig ifomis.de
6 The Gene Statistic The Gene Ontology
ifomis.de7 GO: the Gene Ontology 3 large telephone directories of standardized designations for gene functions and products designed to cover the whole of biology model for fungal ontology, plant ontology, drosophila ontology, etc.
ifomis.de8 Primary aim of GO not rigorous definition and principled classification but rather: providing a practically useful framework for keeping track of the biological annotations that are applied to gene products Thesis: GO can realize its goal more adequately (and avoid many coding errors) by taking ontology (especially the logic of classifications and definitions) seriously
ifomis.de9 GO: the Gene Ontology GO divided into 3 separate hierarchies each organized via is_a and part_of
ifomis.de10 Problems with is_a A is_a B = every instance of A is an instance of B
ifomis.de11 Problems with is_a Holliday junction helicase complex is_a unlocalized protein storage vacuole is_a vacuole (sensu Streptophyta)
ifomis.de12 Problems with part_of ‘part_of’ = ‘can be part of’ (flagellum part_of cell) ‘part_of’ = ‘is sometimes part of’ (replication fork part_of the nucleoplasm) ‘part_of’ = ‘is included as a sublist in’
ifomis.de13 GO divided into three disjoint term hierarchies cellular component ontology molecular function ontology biological process ontology flagellum, chromosome, cell ice nucleation, binding, protein stabilization glycolysis, death
ifomis.de14 three separate hierarchies = no is_a and no part_of relations defined between them PUZZLE: How are the classes in the three separate hierarchies linked together? cellular component ontology molecular function ontology biological process ontology
ifomis.de15 Component Component is easy to understand: A component is a 3-dimensional entity which endures through time
ifomis.de16 Process Process is easy to understand: A process is an occurrent entity = an entity which unfolds itself through time in successive temporal parts
ifomis.de17 What is a function?
ifomis.de18 Definition of «Function» UMLS Semantic Network: Functional Concept = df A concept which is of interest because it pertains to the carrying out of a process or activity. GO: Molecular Function = df the action characteristic of a gene product.
ifomis.de19 How are the 3 ontologies related? Function = “the action characteristic of a gene product” Process = “phenomenon marked by changes that lead to a particular result, mediated by one or more gene products” NO PART-WHOLE RELATIONS BETWEEN FUNCTION AND PROCESS ONTOLOGIES
ifomis.de20 The True Story about Process and Function A process is an occurrent entity A component is a continuant entity
ifomis.de21 The True Story about Function and Process A process is an occurrent entity A component is an independent continuant entity There are also dependent continuant entities: qualities, roles, dispositions, powers … and functions
ifomis.de22 The function of your heart is: to pump blood This function endures through time and gets exercised. This function exists even when it is not being exercised The exercise of a function is a process
ifomis.de23 Functions exist even when they are not being expressed Functions exist even when there is no functioning
ifomis.de24 Constitiuent-Process-Function Processes depend on constituents Processes realize functions Constituents have functions
ifomis.de25 Dependent continuants are realized through occurrent processes the exercise of a function the performance of a role the execution of a plan the application of a therapy the realization of a disposition the course of a disease
ifomis.de26 GO: “A biological process is accomplished via one or more ordered assemblies of molecular functions.”
ifomis.de27 But no: “GO molecular functions are occurrent rather than continuant. The terminology we've used to date is, I agree, confusing but the activities described in the molecular function ontology are events -- they represent the function as it is exercised rather than the potential to exercise that function.”
ifomis.de28 “The defintions you cite are certainly inconsistent with this at the moment, but this is a temporary situation. … true path violations … do crop up fairly regularly, but are always fixed.”
ifomis.de29 Confusion of Function and Activity If function = activity (= functioning) how can GO deal with dormant/suppressed functions? How can GO deal with the relation of expression which involves a function and its exercise?
ifomis.de30 A step towards clarity On March 2003 (nearly) all nodes in the Molecular Function ontology (except the root) had ‘activity’ added to their names Function = activity How does ‘process’ relate to ‘activity’
ifomis.de31 GO’s answer “A biological process is accomplished via one or more ordered assemblies of molecular functions.” BUT: there are no part-whole relations across ontologies Result: constant coding errors resulting from lack of clear principles as concerns what the basic notions of ‘function’ and ‘process’ mean
ifomis.de32 Examples of GO Molecular Functions anti-coagulant activity (defined as: “a substance that retards or prevents coagulation”) enzyme activity (defined as: “a substance that catalyzes”) structural molecule (defined as: “the action of a molecule that contributes to structural integrity”)
ifomis.de33 GO: : structural constituent of cell wall Definition: The action of a molecule that contributes to the structural integrity of a cell wall. confuses constituents with actions, which GO includes in its function ontology.
ifomis.de34 extracellular matrix structural constituent + puparial glue (sensu Diptera) structural constituent of bone structural constituent of chorion (sensu Insecta) structural constituent of chromatin structural constituent of cuticle + structural constituent of cytoskeleton structural constituent of epidermis + structural constituent of eye lens structural constituent of muscle structural constituent of myelin sheath structural constituent of nuclear pore structural constituent of peritrophic membrane (sensu Insecta) structural constituent of ribosome structural constituent of tooth enamel structural constituent of vitelline membrane (sensu Insecta)
ifomis.de35 Problems caused by lack of intuitive formal understandings of its basic ontological terms The need for expert knowledge places severe obstacles in the way of using GO as a basis for computer applications computers do not have access to expert biological knowledge
ifomis.de36 As GO increases in size and scope it will “be increasingly difficult to maintain the semantic consistency we desire without software tools that perform consistency checks and controlled updates”. The addition of each new term will require the curator to understand the entire structure of GO in order to avoid redundancy and to ensure that all appropriate linkages are made with other terms.
ifomis.de37 Benefits of the GO Approach 1) Work on populating GO could start immediately, without its authors needing to solve some of the intricate problems which face ontologies when formalized as logical theories. 2) Populating GO does not require the completion of complex protocols of formally determined steps but can be done intuitively by the expert biologist. 3) There are few formal constraints standing in the way of easy incorporation of existing controlled vocabularies from the biological domain.
ifomis.de38 Drawbacks 1) It is unclear what kinds of reasoning are permissible on the basis of GO’s hierarchies. 2) The rationale of GO’s subclassifications is unclear. 3) No procedures are offered by which GO can be validated. 4) There are insufficient rules for determining how to recognize whether a given concept is or is not present in GO.
ifomis.de39 GO DOES NOT COMPUTE Solution: Rebuild from scratch before it is too late MANGO