Download presentation
Presentation is loading. Please wait.
Published byBranden Evans Modified over 9 years ago
1
Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies
2
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Building Ontologies No field of Ontological Engineering equivalent to Knowledge or Software Engineering; No standard methodologies for building ontologies; Such a methodology would include: l a set of stages that occur when building ontologies; l guidelines and principles to assist in the different stages; l an ontology life-cycle which indicates the relationships among stages. Gruber's guidelines for constructing ontologies are well known.
3
Copyright © 1998 Pangea Systems, Inc. All rights reserved. The Development Lifecycle Two kinds of complementary methodologies emerged: l Stage-based, e.g. TOVE [Uschold96] l Iterative evolving prototypes, e.g. MethOntology [Gomez Perez94]. Most have TWO stages: 1. Informal stage u ontology is sketched out using either natural language descriptions or some diagram technique 2. Formal stage u ontology is encoded in a formal knowledge representation language, that is machine computable An ontology should ideally be communicated to people and unambiguously interpreted by software l the informal representation helps the former l the formal representation helps the latter.
4
Copyright © 1998 Pangea Systems, Inc. All rights reserved. A Provisional Methodology A skeletal methodology and life-cycle for building ontologies; Inspired by the software engineering V-process model; The overall process moves through a life-cycle. The left side charts the processes in building an ontology The right side charts the guidelines, principles and evaluation used to ‘quality assure’ the ontology
5
Copyright © 1998 Pangea Systems, Inc. All rights reserved. The V-model Methodology Conceptualisation Integrating existing ontologies Encoding Representation Identify purpose and scope Knowledge acquisition Evaluation: coverage, verification, granularity Conceptualisation Principles: commitment, conciseness, clarity, extensibility, coherency Encoding/Representation principles: encoding bias, consistency, house styles and standards, reasoning system exploitation Ontology in Use User Model Conceptualisation Model Implementation Model
6
Copyright © 1998 Pangea Systems, Inc. All rights reserved. The ontology building life-cycle Identify purpose and scope Knowledge acquisition Evaluation Language and representation Available development tools Conceptualisation Integrating existing ontologies Encoding Building
7
Copyright © 1998 Pangea Systems, Inc. All rights reserved. User Model: Identify purpose and scope Decide what applications the ontology will support EcoCyc: Pathway engineering, qualitative simulation of metabolism, computer-aided instruction, reference source TAMBIS: retrieval across a broad range of bioinformatics resources The use to which an ontology is put affects its content and style Impacts re-usability of the ontology
8
Copyright © 1998 Pangea Systems, Inc. All rights reserved. User Model: Knowledge Acquisition Specialist biologists; standard text books; research papers and other ontologies and database schema. Motivating scenarios and informal competency questions – informal questions the ontology must be able to answer Evaluation: l Fitness for purpose l Coverage and competency
9
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Conceptualisation Model: Conceptualisation Identify the key concepts, their properties and the relationships that hold between them; l Which ones are essential? l What information will be required by the applications? Structure domain knowledge into explicit conceptual models. Identify natural language terms to refer to such concepts, relations and attributes; Determine naming conventions l Consistent naming for classes and slots l EcoCyc: u Classes are capitalized, hyphenated, plural u Slot names are uppercase A quality ontology captures relevant biological distinctions with high fidelity
10
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Conceptualisation Model: Pitfalls Pitfall: Missing ontological elements l Missing classes: Swiss-Prot Protein complexes l Missing attributes: Genetic code identifier l Confuse 1:1 with 1:Many, or 1:Many with Many:Many u Cofactor as an attribute of reaction l Important data is stored within text/comment fields Pitfall: Extra ontological elements Pitfall: Stop over-elaborating – when do I stop? Pitfall: Relevance – do I really need all this detail?
11
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Integrating Existing Ontologies Reuse or adapt existing ontologies when possible l Save time l Correctness l Facilitate interoperation Integration of ontologies l Ontologies have to be aligned l Hindered by poor documentation and argumentation l Hindered by implicit assumptions l Shared generic upper level ontologies should make integration easier
12
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Encoding: Implementation Toolkit Construct ontology using an ontology-development system l Does the data model have the right expressivity? u Is it just a taxonomy or are relationships needed? u Is multiple parentage needed? Inverse relationships? u What types of constraints are needed? l Are reasoning services needed? l What are authoring features of the development tool? l Can ontology be exported to a DBMS schema? l Can ontology be exported to an ontology exchange language? l Is simultaneous updating by multiple authors needed? l Size limitations of development tool?
13
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Encoding: Ontology Implementation Pitfalls Pitfall: Semantic ambiguity l Multiple ways to encode the same information l Meaning of class definitions unclear Pitfall: Encoding Bias l Encoding the ontology changes the ontology
14
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Encoding: Ontology Implementation Pitfalls Pitfall: Redundancy (lack of normalization) l Exact same information repeated l Presence of computationally derivable information u Date of birth and age u DNA sequence and reverse complement l More effort required for entry and update l Partial updates lead to inconsistency l OK if redundant information is maintained automatically
15
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Encoding: The Interaction Problem Task influences what knowledge is represented and how its represented l Molecular biology: chemical and physical properties of proteins l Bioinformatics: accession number, function gene l Underlying perspectives mean they may not be reconcilable If an ontology has too many conflicting tasks it can end up compromised – TaO experience
16
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Evaluate it - A guide for reusability Conciseness l No redundancy l Appropriateness – protein molecules at the atomic resolution when amino acid level would do Clarity Consistency Satisfiability – it doesn’t contradict itself l Enzyme is a both a protein which catalyses a reaction and does not catalyse a reaction Commitment l Do I have to buy into a load of stuff I don’t really need or want just to get the bit I do?
17
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Documentation: Make Ontology Understandable! Produce clear informal and formal documentation l An ontology that cannot be understood will not be reused l Genbank feature table l NCBI ASN.1 definitions There exists a space of alternative ontology design decisions l Semantics / Granularity l Terminology Pitfall: Neglecting to record design rationale
18
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Publish the Ontology Formal and informal specifications Intended domain of application Design rationale Limitations See EcoCyc paper in ISMB-93/Bioinformatics 00 See TAMBIS paper in Bioinformatics 99
19
Copyright © 1998 Pangea Systems, Inc. All rights reserved. SequenceComponent Gene Motif Restriction site Phosphorylation site Macromolecule Reference Ontology MacroMolecule Protein Nucleic Acid Lipid PeptideEnzyme RNA DNA cDNAgDNAmDNA mRNA componentOf
20
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Discussion What is a macromolecule? Where does macromolecule fit into an upper level ontology? l Substance? l Structure? Is lipid a macromolecule? If we replace macromolecule with biopolymer is the placement of lipid legit? Is a peptide a protein and therefore a macromolecule? If not, where does it go?
21
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Taxonomy and Roles Do we want to assert everything in a taxonomy? Or do we want to define things in terms of their properties? l Enzyme = Protein catalyses Reaction l gDNA = DNA hasLocation Chromosomal l Sufficiency as well as necessary conditions Whats the relationship between l cDNA and EST l cDNA and some child of RNA ?
22
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Axioms and constraints Not all RNA is translated to protein Do we want to say that DNA is translated to protein? Do we want to model catalytic RNAs? Relationships – what other ones do we need? l Genes express proteins l Genes express rRNA, tRNA l Genes are found on gDNA l Genes are found on mDNA l Genes have their own components – recursive relationships with partitive semantics Reasoning? Instances? Reusable? Clear? Concise?
23
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Ontological Pitfalls Stop-over – when do I stop over elaborating? l Proteins amino acid residues side chains physical chemical properties …. Relevance l Do we need to mention all the types of nucleic acid?
24
Copyright © 1998 Pangea Systems, Inc. All rights reserved. EcoCyc MacroMolecule Proteins Nucleic-Acids PolyPeptides Protein-Complexes RNA DNA DNA-Segments Misc-RNA Chemicals Compounds-And-Elements Compounds Lipids Genes
25
Copyright © 1998 Pangea Systems, Inc. All rights reserved. Macromolecule in other Ontologies Gene Ontology Used to add attributes to gene instances in databases Doesn’t need to talk about molecules or components of molecules TAMBIS Ontology Models it in a similar way to our reference macromolecule ontology Because it asks questions of bioinformatics sources
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.