Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Database Systems: Design, Implementation, and Management Tenth Edition
Lecture # 2 : Process Models
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
1 Building and Using Ontologies Robert Stevens Department of Computer Science University of Manchester Manchester UK.
Object-Oriented Analysis and Design
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Chapter 6: Design of Expert Systems
Software Testing and Quality Assurance
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 8 Slide 1 System models.
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Biological Ontologies Neocles Leontis April 20, 2005.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Lecture Nine Database Planning, Design, and Administration
Course Instructor: Aisha Azeem
Requirements Engineering
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
S/W Project Management Software Process Models. Objectives To understand  Software process and process models, including the main characteristics of.
Knowledge Representation Ontology are best delivered in some computable representation Variety of choices with different: –Expressiveness The range of.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
ITEC224 Database Programming
Database Systems: Design, Implementation, and Management Ninth Edition
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Software Requirements Engineering CSE 305 Lecture-2.
A GENERIC PROCESS FOR REQUIREMENTS ENGINEERING Chapter 2 1 These slides are prepared by Enas Naffar to be used in Software requirements course - Philadelphia.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 20 Object-Oriented.
School of Computing FACULTY OF ENGINEERING Developing a methodology for building small scale domain ontologies: HISO case study Ilaria Corda PhD student.
Chapter 13: RNA and Protein Synthesis
Design engineering Vilnius The goal of design engineering is to produce a model that exhibits: firmness – a program should not have bugs that inhibit.
Chapter 7 System models.
System models l Abstract descriptions of systems whose requirements are being analysed.
Modified by Juan M. Gomez Software Engineering, 6th edition. Chapter 7 Slide 1 Chapter 7 System Models.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Sommerville 2004,Mejia-Alvarez 2009Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
1 Introduction to Software Engineering Lecture 1.
February 24, 2006 ONTOLOGIES Helena Sofia Pinto ( )
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
Part4 Methodology of Database Design Chapter 07- Overview of Conceptual Database Design Lu Wei College of Software and Microelectronics Northwestern Polytechnical.
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Introduction to Software Architecture.
© 2010 Health Information Management: Concepts, Principles, and Practice Chapter 5: Data and Information Management.
Chapter 6 CASE Tools Software Engineering Chapter 6-- CASE TOOLS
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
Human Computer Interaction
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Some Thoughts to Consider 8 How difficult is it to get a group of people, or a group of companies, or a group of nations to agree on a particular ontology?
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
Approach to building ontologies A high-level view Chris Wroe.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Software Engineering, COMP201 Slide 1 Software Requirements BY M D ACHARYA Dept of Computer Science.
Software Engineering Lecture 10: System Engineering.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
1 Ontological Foundations For SysML Henson Graves September 2010.
1 Software Requirements Descriptions and specifications of a system.
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Computer Aided Software Engineering (CASE)
Abstract descriptions of systems whose requirements are being analysed
Methontology: From Ontological art to Ontological Engineering
Lecture 10 Structuring System Requirements: Conceptual Data Modeling
Presentation transcript:

Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Building Ontologies No field of Ontological Engineering equivalent to Knowledge or Software Engineering; No standard methodologies for building ontologies; Such a methodology would include: l a set of stages that occur when building ontologies; l guidelines and principles to assist in the different stages; l an ontology life-cycle which indicates the relationships among stages. Gruber's guidelines for constructing ontologies are well known.

Copyright © 1998 Pangea Systems, Inc. All rights reserved. The Development Lifecycle Two kinds of complementary methodologies emerged: l Stage-based, e.g. TOVE [Uschold96] l Iterative evolving prototypes, e.g. MethOntology [Gomez Perez94]. Most have TWO stages: 1. Informal stage u ontology is sketched out using either natural language descriptions or some diagram technique 2. Formal stage u ontology is encoded in a formal knowledge representation language, that is machine computable An ontology should ideally be communicated to people and unambiguously interpreted by software l the informal representation helps the former l the formal representation helps the latter.

Copyright © 1998 Pangea Systems, Inc. All rights reserved. A Provisional Methodology A skeletal methodology and life-cycle for building ontologies; Inspired by the software engineering V-process model; The overall process moves through a life-cycle. The left side charts the processes in building an ontology The right side charts the guidelines, principles and evaluation used to ‘quality assure’ the ontology

Copyright © 1998 Pangea Systems, Inc. All rights reserved. The V-model Methodology Conceptualisation Integrating existing ontologies Encoding Representation Identify purpose and scope Knowledge acquisition Evaluation: coverage, verification, granularity Conceptualisation Principles: commitment, conciseness, clarity, extensibility, coherency Encoding/Representation principles: encoding bias, consistency, house styles and standards, reasoning system exploitation Ontology in Use User Model Conceptualisation Model Implementation Model

Copyright © 1998 Pangea Systems, Inc. All rights reserved. The ontology building life-cycle Identify purpose and scope Knowledge acquisition Evaluation Language and representation Available development tools Conceptualisation Integrating existing ontologies Encoding Building

Copyright © 1998 Pangea Systems, Inc. All rights reserved. User Model: Identify purpose and scope Decide what applications the ontology will support EcoCyc: Pathway engineering, qualitative simulation of metabolism, computer-aided instruction, reference source TAMBIS: retrieval across a broad range of bioinformatics resources The use to which an ontology is put affects its content and style Impacts re-usability of the ontology

Copyright © 1998 Pangea Systems, Inc. All rights reserved. User Model: Knowledge Acquisition Specialist biologists; standard text books; research papers and other ontologies and database schema. Motivating scenarios and informal competency questions – informal questions the ontology must be able to answer Evaluation: l Fitness for purpose l Coverage and competency

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Conceptualisation Model: Conceptualisation Identify the key concepts, their properties and the relationships that hold between them; l Which ones are essential? l What information will be required by the applications? Structure domain knowledge into explicit conceptual models. Identify natural language terms to refer to such concepts, relations and attributes; Determine naming conventions l Consistent naming for classes and slots l EcoCyc: u Classes are capitalized, hyphenated, plural u Slot names are uppercase A quality ontology captures relevant biological distinctions with high fidelity

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Conceptualisation Model: Pitfalls Pitfall: Missing ontological elements l Missing classes: Swiss-Prot Protein complexes l Missing attributes: Genetic code identifier l Confuse 1:1 with 1:Many, or 1:Many with Many:Many u Cofactor as an attribute of reaction l Important data is stored within text/comment fields Pitfall: Extra ontological elements Pitfall: Stop over-elaborating – when do I stop? Pitfall: Relevance – do I really need all this detail?

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Integrating Existing Ontologies Reuse or adapt existing ontologies when possible l Save time l Correctness l Facilitate interoperation Integration of ontologies l Ontologies have to be aligned l Hindered by poor documentation and argumentation l Hindered by implicit assumptions l Shared generic upper level ontologies should make integration easier

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Encoding: Implementation Toolkit Construct ontology using an ontology-development system l Does the data model have the right expressivity? u Is it just a taxonomy or are relationships needed? u Is multiple parentage needed? Inverse relationships? u What types of constraints are needed? l Are reasoning services needed? l What are authoring features of the development tool? l Can ontology be exported to a DBMS schema? l Can ontology be exported to an ontology exchange language? l Is simultaneous updating by multiple authors needed? l Size limitations of development tool?

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Encoding: Ontology Implementation Pitfalls Pitfall: Semantic ambiguity l Multiple ways to encode the same information l Meaning of class definitions unclear Pitfall: Encoding Bias l Encoding the ontology changes the ontology

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Encoding: Ontology Implementation Pitfalls Pitfall: Redundancy (lack of normalization) l Exact same information repeated l Presence of computationally derivable information u Date of birth and age u DNA sequence and reverse complement l More effort required for entry and update l Partial updates lead to inconsistency l OK if redundant information is maintained automatically

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Encoding: The Interaction Problem Task influences what knowledge is represented and how its represented l Molecular biology: chemical and physical properties of proteins l Bioinformatics: accession number, function gene l Underlying perspectives mean they may not be reconcilable If an ontology has too many conflicting tasks it can end up compromised – TaO experience

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Evaluate it - A guide for reusability Conciseness l No redundancy l Appropriateness – protein molecules at the atomic resolution when amino acid level would do Clarity Consistency Satisfiability – it doesn’t contradict itself l Enzyme is a both a protein which catalyses a reaction and does not catalyse a reaction Commitment l Do I have to buy into a load of stuff I don’t really need or want just to get the bit I do?

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Documentation: Make Ontology Understandable! Produce clear informal and formal documentation l An ontology that cannot be understood will not be reused l Genbank feature table l NCBI ASN.1 definitions There exists a space of alternative ontology design decisions l Semantics / Granularity l Terminology Pitfall: Neglecting to record design rationale

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Publish the Ontology Formal and informal specifications Intended domain of application Design rationale Limitations See EcoCyc paper in ISMB-93/Bioinformatics 00 See TAMBIS paper in Bioinformatics 99

Copyright © 1998 Pangea Systems, Inc. All rights reserved. SequenceComponent Gene Motif Restriction site Phosphorylation site Macromolecule Reference Ontology MacroMolecule Protein Nucleic Acid Lipid PeptideEnzyme RNA DNA cDNAgDNAmDNA mRNA componentOf

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Discussion What is a macromolecule? Where does macromolecule fit into an upper level ontology? l Substance? l Structure? Is lipid a macromolecule? If we replace macromolecule with biopolymer is the placement of lipid legit? Is a peptide a protein and therefore a macromolecule? If not, where does it go?

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Taxonomy and Roles Do we want to assert everything in a taxonomy? Or do we want to define things in terms of their properties? l Enzyme = Protein catalyses Reaction l gDNA = DNA hasLocation Chromosomal l Sufficiency as well as necessary conditions Whats the relationship between l cDNA and EST l cDNA and some child of RNA ?

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Axioms and constraints Not all RNA is translated to protein Do we want to say that DNA is translated to protein? Do we want to model catalytic RNAs? Relationships – what other ones do we need? l Genes express proteins l Genes express rRNA, tRNA l Genes are found on gDNA l Genes are found on mDNA l Genes have their own components – recursive relationships with partitive semantics Reasoning? Instances? Reusable? Clear? Concise?

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Ontological Pitfalls Stop-over – when do I stop over elaborating? l Proteins  amino acid residues  side chains  physical chemical properties …. Relevance l Do we need to mention all the types of nucleic acid?

Copyright © 1998 Pangea Systems, Inc. All rights reserved. EcoCyc MacroMolecule Proteins Nucleic-Acids PolyPeptides Protein-Complexes RNA DNA DNA-Segments Misc-RNA Chemicals Compounds-And-Elements Compounds Lipids Genes

Copyright © 1998 Pangea Systems, Inc. All rights reserved. Macromolecule in other Ontologies Gene Ontology Used to add attributes to gene instances in databases Doesn’t need to talk about molecules or components of molecules TAMBIS Ontology Models it in a similar way to our reference macromolecule ontology Because it asks questions of bioinformatics sources