Download presentation
Presentation is loading. Please wait.
1
Ontologies in Bioinformatics
Robert Stevens Department of Computer Science University of Manchester
2
Introduction What is knowledge? What is an ontology?
Relationships between the two communities The last decade of bio-ontologiesontologies The future
3
What is Knowledge? man academic, senior ancient university, 5 rated
B I O L G Y man academic, senior ancient university, 5 rated European important figure in biology Knowledge – all information and an understanding to carry out tasks and to infer new information Information -- data equipped with meaning Data -- un-interpreted signals that reach our senses Name Job Institution Country C o n f “Data are the uninterpreted signals that reach our senses every minute in time by the zillions…Information is data equipped with meaning…Knowledge is the whole body of data and information that people bring to bear to practical use in action, in order to carry out tasks and to create new information.” (Schreiber et al. 1998) Michael Ashburner Professor University of Cambridge UK I S M B
4
Things, Symbols & Concepts
Humans require words (or at least symbols) to communicate efficiently. The mapping of words to things is only indirectly possible. We do it by creating symbols that stand for things. The relation between symbols and things has been described in the form of the meaning triangle: “Jaguar“ Concept [Ogden, Richards, 1923] Concept refers to thing Symbol stands for Thing Symbol evokes concept Symbol: “Jaguar” Thing: car or beast
5
Representing Knowledge
Language uses symbols and rules (natural language) to communicate knowledge Need human intelligence to deal with pragmatics NLP notoriously difficult Need to capture knowledge in a computationally amenable manner Ontology: A conceptual model Ontology plus lexicon is a terminology Primary aim of creating a shared understanding of a domain and the relationships within that domain Common symbols for the things within a domain Capturing domain knowledge with fidelity and precision
6
Sharing info Sharing meaning
Metadata Data describing the content and meaning of resources and services. But everyone must speak the same language… Terminologies Shared and common vocabularies For search engines, agents, curators, authors and users But everyone must mean the same thing… Service provider The WWW has made data available: Ready publication An infrastructure for retrieving and representing documents An infrastructure for accessing data Next Step is semantic interoperation: Understanding what the data means Linking in insightful ways Automated support for integration Sharing data Sharing meaning Ontologies are often used as controlled vocabularies To share and integrate information you must describe it. Ontology: From knowledge representation and philosophy A rigorous and explicit conceptualisation of knowledge Linked to words to render the concepts Shared controlled vocabulary Used for: Complex and expressive conceptual descriptions of data Subject classifications Reasoning and inferring new knowledge Sharing & exchanging knowledge Much of the biological data is self-described marked up text (pre-dating XML), and hence ontologies for disambiguating database entries and annotation is accepted as standard practice. Disambiguating content. Ontologies Shared and common understanding of a domain Essential for search, exchange and discovery
7
What is an Ontology? Concepts: Units of thought: Classes and individuals; Protein, Gene, DNA, Hexokinase, glycolysis,… Terms: Labels for concepts “Protein”, “Gene”,… Relationships: Semantic links between concepts Is-a-kind, is-a, part-of, name-of,… Taxonomy backbone of ontology
8
So what Counts as an ontology? [Deborah McGuinness, Stanford]
General Logical constraints Frames (properties) Formal Is-a Thesauri Catalog/ ID Disjointness, Inverse, partof Formal instance Informal Is-a Terms/ glossary Value restrictions Arom Gene Ontology TAMBIS EcoCyc Mouse Anatomy PharmGKB
9
The art of ranking things in genera and species is of no small importance and very much assists our judgment as well as our memory. You know how much it matters in botany, not to mention animals and other substances, or again moral and notional entities as some call them. Order largely depends on it, and many good authors write in such a way that their whole account could be divided and subdivided according to a procedure related to genera and species. This helps one not merely to retain things, but also to find them. And those who have laid out all sorts of notions under certain headings or categories have done something very useful. Gottfried Wilhelm Leibniz, New Essays on Human Understanding
10
The Gene Ontology
11
Bio-Ontologies in the Past Decade
Explicit use of ontologies fairly recent EcoCyc and RiboWeb using Frame Based Systems to create knowledge bases An area in which the CS community can test their technology Large, complex and dynamic “A knowledge based discipline” The post-genomic era encourages the need for shared understanding Cross-genome comparisons need structured, controlled vocabularies Moved from small nich to a much bigger niche Biologists are building ontologies
12
Uses of Bio-Ontologies
Controlled vocabularies for annotation Describing schema dn the content of schema Domain maps Query mechanisms Resolution of semantic heterogeneiety Text analysis….
13
The Gene Ontology Tutorial and the first Bio-Ontologies meeting at ISMB 1998 in Montreal Fly, mouse and yeast get together to develop GO First release some 3,500 terms covering Molecular Function, biological Process and Cellular Component Now some 15,000 terms and growing Gene Ontology Consortium covers some 15 organism databases plus SWISS-PROT and others Synonyms, abbreviations and associations to gene products: Access to names, genes etc. A common understanding across a community
14
GO DAG for heparin biosynthesis
GO: : Gene_Ontology (46199) GO: : biological_process (30188) GO: : cell growth and/or maintenance (20547) GO: : metabolism (14693) GO: : carbohydrate metabolism (267) GO: : aminoglycan metabolism (18) GO: :glycosaminoglycan metabolism GO: : heparin metabolism (3) GO: : heparin biosynthesis (3) Original DAG for heparin biosynthesis as of July 02. The DAG shows heparin biosynthesis has only one parent and there is only one path to the root concept. All relationships are is-a except between Gene ontology and biological process. This is actually a mixed classification organised both in terms of the chemical being metabolism and the nature of the process. So the direct parent glycosaminoglycan metabolism is more abstract both in process and chemical class. However there is no link to the biosynthesis tree within the ontology.
15
Open bio-Ontologies (OBO)
Go, though large, is narrow Sequence Ontology Chemical Ontology Promotes a common ontology format, tools and house-style Micro-array community a further boost – avoiding mistakes of previous bioinformatics resources Need ontolgoies for phenotype, tissues, anatomies, etc.
16
Two Communities Computer Scientists Biologists Better Ontologies
Building ontologies KR Reasoning Biologists Ontology content Domain Knowledge Better Ontologies
17
What are We Saying? Person Man Woman
is-a is-a Man Woman Are all instances of Man instances of Person? Can an instance of Person be both a Man and an instance of Woman? Can there be any more kinds of Person?
18
This Year’s Meeting A theme of text analysis and ontology
First time talks have matched theme Ontologies and indexing Integrating ontologies into NLP systems Ontologies in information retrieval Developing terminologies GO in NLP New Ontologies Semantic Similarity
19
Opportunities Ontologies to help text analysis
Text analysis to help build ontologies Biology community steadily building a large number of large domain ontologies CS community can help build computationally amenable ontologies Vast quantities of domain knowledge in natural language forms in literature and databanks Opportunities for language and ontology communities
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.