1 How to Build an Ontology Barry Smith

Slides:



Advertisements
Similar presentations
1 Five Steps to Interoperability (in the domain of scientific ontology) Barry Smith.
Advertisements

Upper Ontology Summit Tuesday March 14 The BFO perspective Barry Smith Department of Philosophy, University at Buffalo National Center.
Upper Ontology Summit Wednesday March 15 The BFO perspective Barry Smith Department of Philosophy, University at Buffalo National.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
What is Ontology? Dictionary:A branch of metaphysics concerned with the nature and relations of being. Barry Smith:The science of what is, of the kinds.
National center for ontological research University at Buffalo The Center for the Arts October 27, 2005.
Application of OBO Foundry Principles in GO Chris Mungall Lawrence Berkeley Labs NCBO GO Consortium.
Ontology in Buffalo Barry Smith. 2 Ontology (phil.) The science of being Ontologies (tech.) Standardized classification systems which enable data from.
Ontology Notes are from:
Earth to Major Tom Barry Smith
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 8 Slide 1 System models.
1 How Ontologies Create Research Communities Barry Smith
1 An Ontology of Relations for Biomedical Informatics Barry Smith 10 January 2005.
The Role of Foundational Relations in the Alignment of Biomedical Ontologies Barry Smith and Cornelius Rosse.
1 Ontology in 15 Minutes Barry Smith. 2 Main obstacle to integrating genetic and EHR data No facility for dealing with time and instances (particulars)
Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
FMA: a domain reference ontology Comments on Cornelius Rosse’s talk Anita Burgun WG6 meeting, Rome 29 Apr- 2 May 2005.
Thomas Bittner and Barry Smith IFOMIS (Saarbrücken) Normalizing Medical Ontologies Using Basic Formal Ontology.
Ontology The science of the kinds and structures of objects, and their properties and relations. Defined by a scientific field's vocabulary and by the.
On the Application of Formal Principles to Life Science Data: A Case Study in the Gene Ontology Barry Smith * Jacob Köhler † Anand Kumar * *
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
1 Logical Tools and Theories in Contemporary Bioinformatics Barry Smith
AN INTRODUCTION TO BIOMEDICAL ONTOLOGY Barry Smith University at Buffalo 1.
Tutorial on Ontology Design Barry Smith and Werner Ceusters.
1 A General Introduction to Biomedical Ontology Barry Smith
Anatomical Information Science Barry Smith
1 The OBO Relation Ontology Genome Biology 2005, 6:R46 based on the fundamental distinction between instances and universals takes instances and time into.
1 Part II. The Ontology of Biomedical Reality Some Terminological Proposals.
1 What an Ontology is For Barry Smith University at Buffalo Common Anatomy Reference Ontology Workshop.
1 Part III.The OBO Foundry Project: Towards Scientific Standards and Principles-Based Coordination in Biomedical Ontology Development.
1 How Ontologies Create Research Communities Barry Smith
Son of SN Barry Smith. The Virtues of Single Inheritance (= True Hierarchy) better coding clearer instructions better automatic reasoning better definitions.
HL7 RIM Exegesis and Critique Regenstrief Institute, November 8, 2005 Barry Smith Director National Center for Ontological Research.
1 The Future of Clinical Bioinformatics: Overcoming Obstacles to Information Integration Barry Smith Brussells, Eurorec Ontology Workshop, 25 November.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Building Biomedical Ontologies Jennifer Clark, GO Editorial Office.
Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
Knowledge representation
Ontological realism as a strategy for integrating ontologies Ontology Summit February 7, 2013 Barry Smith 1.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Introduction to Ontology Barry Smith August 11, 2012.
Why we need the OBO Core Michael Ashburner, Suzanna Lewis and Barry Smith.
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Core 6 (University at Buffalo) Dissemination of Ontology Best Practices Barry Smith (PI) Fabian Neuhaus (Post-Doc) Werner.
Chapter 7 System models.
System models l Abstract descriptions of systems whose requirements are being analysed.
Modified by Juan M. Gomez Software Engineering, 6th edition. Chapter 7 Slide 1 Chapter 7 System Models.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
Ontological Foundations of Biological Continuants Stefan Schulz, Udo Hahn Text Knowledge Engineering Lab University of Jena (Germany) Department of Medical.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Artificial Intelligence 2004 Ontology
What is an Ontology? A representation of knowledge in a domain In theory Thomas Gruber (1993) “An ontology is a formal, explicit specification of a shared.
1 Principles for Building Biomedical Ontologies Barry Smith.
1 How to build an ontology Barry Smith
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
1 The OBO Relation Ontology: Preliminaries Barry Smith
APPLICATION OF ONTOLOGIES IN CANCER NANOTECHNOLOGY RESEARCH Faculty of Engineering in Foreign Languages 1 Student: Andreea Buga Group: 1241E – FILS Coordinating.
Basic Formal Ontology Barry Smith August 26, 2013.
Upper Ontology Summit The BFO perspective Barry Smith Department of Philosophy, University at Buffalo National Center for Ontological Research National.
International Workshop 28 Jan – 2 Feb 2011 Phoenix, AZ, USA Ontology in Model-Based Systems Engineering Henson Graves 29 January 2011.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
1 The Future of (Biomedical) Ontology: Overcoming Obstacles to Information Integration Barry Smith (IFOMIS) Manchester
1 Standards and Ontology Barry Smith
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
Ontology in 15 Minutes Barry Smith.
Principles for Building Biomedical Ontologies:A GO Perspective
Ontology in 15 Minutes Barry Smith.
What is Ontology? s Dictionary:A branch of metaphysics concerned with the nature and relations of being. Barry Smith:The science of what is, of.
Presentation transcript:

1 How to Build an Ontology Barry Smith

2 Ontology A classification of entities and the relations between them. Ontology is a list of types structured by relations Defined by a scientific field's vocabulary and by the canonical formulations of its theories. Scientific theories consist of generalizations. What I will not be talking about: XML, OWL,..., data(types), information models, file formats...

3 Top-Level GO OBO, OBO Core NCBO FMA NCBC Roadmap Centers NCI EVS NECTAR (National Electronic Clinical Trials and Research) Network

4 Instances are not included in an ontology It is the generalizations that are important (but instances must still be taken into account)

5 A515287DC3300 Dust Collector Fan B521683Gilmer Belt C521682Motor Drive Belt

6 Ontology Types Instances

7 Ontology = A Representation of Types

8 Each node of an ontology consists of: preferred term (aka term) term identifier (TUI, aka CUI) synonyms definition, glosses, comments

9 Ontology = A Representation of Types Nodes in an ontology are connected by relations: primarily: is_a (= is subtype of) and part_of designed to support search, reasoning and annotation

10 Rules for formating terms Terms are names of types: if you prefix a term with the type ___ the term should still make sense Hence: terms should be in the singular Terms should be lower case Avoid abbreviations even when it is clear in context what they mean (‘breast’ for ‘breast tumor’)

11 Motivation: to capture reality Inferences and decisions we make are based upon what we know of reality. An ontology is a computable representation of this underlying bio(techno)logical reality. Enables a computer to reason over the data in (some of) the ways that we do.

12 Biomedical ontology integration / interoperability Will never be achieved through integration of meanings or concepts The problem is precisely that different user communities use different concepts What’s really needed is to have well-defined commonly used relationships

13 Concepts Biomedical ontology integration will never be achieved through integration of meanings or concepts The problem is precisely that different user communities use different concepts

14 Concepts Concepts are in your head and will change as our understanding changes Ontologies represent types: not concepts, meanings, ideas... Types exist, with their instances, in objective reality – including types of experimental process, design, method,...

15 Most ontologies are execrable But some good ontologies do already exist as far as possible don’t reinvent use the power of combination and collaboration ontologies are like telephones: they are valuable only to the degree that they are used and networked with other ontologies

16 Why do we need rules/standards for good ontology? Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking): unintuitive rules for classification lead to errors Intuitive rule facilitate training of curators and annotators Common rules allow alignment with other ontologies Logically coherent rules enhance harvesting of content through automatic reasoning systems

17 Rules on types Don’t confuse types with concepts Don’t confuse types with ways of getting to know types Don’t confuse types with ways of talking about types Don’t confuses types with data about types

18 First Rule: Univocity Terms (including those describing relations) should have the same meanings on every occasion of use. In other words, they should refer to the same types in reality

19 Second Rule: Positivity There are no negative types Terms such as ‘non-mammal’ or ‘non- membrane’ do not designate genuine types. (There are also no conjunctive and disjunctive types: rabbit and nailfile; rabbit or nosewipe)

20 Third Rule: Objectivity Which types exist is not a function of our biological knowledge. Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds.

21 Fourth Rule: Single Inheritance No type in a classificatory hierarchy should have more than one is_a parent on the immediate higher level

22 Rule of Single Inheritance no diamonds: C is_a 2 B is_a 1 A

23 Problems with multiple inheritance B C is_a 1 is_a 2 A ‘is_a’ no longer univocal

24 ‘is_a’ is pressed into service to mean a variety of different things shortfalls from single inheritance are often clues to incorrect entry of terms and relations the resulting ambiguities make the rules for correct entry difficult to communicate to human curators

25 is_a Overloading serves as obstacle to integration with neighboring ontologies The success of ontology alignment demands that ontological relations (is_a, part_of,...) have the same meanings in the different ontologies to be aligned.

26 To the degree that the above rules are not satisfied, error checking and ontology alignment will be achievable, at best, only with human intervention and via force majeure

27 Current Best Practice: The Foundational Model of Anatomy

28 Pleural Cavity Pleural Cavity Interlobar recess Interlobar recess Mesothelium of Pleura Mesothelium of Pleura Pleura(Wall of Sac) Pleura(Wall of Sac) Visceral Pleura Visceral Pleura Pleural Sac Parietal Pleura Parietal Pleura Anatomical Space Organ Cavity Organ Cavity Serous Sac Cavity Serous Sac Cavity Anatomical Structure Anatomical Structure Organ Serous Sac Mediastinal Pleura Mediastinal Pleura Tissue Organ Part Organ Subdivision Organ Subdivision Organ Component Organ Component Organ Cavity Subdivision Organ Cavity Subdivision Serous Sac Cavity Subdivision Serous Sac Cavity Subdivision part_of is_a

29 Current Best Practice: The Foundational Model of Anatomy Follows formal rules for definitions laid down by Aristotle. When A is_a B, the definition of ‘A’ takes the form: an A =def. a B which... a human being =def. an animal which is rational

30 FMA Example Cell def an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus Plasma membrane =def a cell part that surrounds the cytoplasm

31 The FMA regimentation Brings the advantage that each definition reflects the position in the hierarchy to which a defined term belongs. The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it. The entire information content of the FMA’s term hierarchy can be translated very cleanly into a computer representation

32 GO now adopting structured definitions contain both genus and differentiae Essence = Genus + Differentiae neuron cell differentiation = Genus: differentiation (processes whereby a relatively unspecialized cell acquires the specialized features of..) Differentiae: acquires features of a neuron

33 Ontology alignment One of the current goals of GO is to align: cone cell fate commitmentretinal_cone_cell keratinocyte differentiation keratinocyte adipocyte differentiation fat_cell dendritic cell activation dendritic_cell lymphocyte proliferation lymphocyte T-cell homeostasis T_lymphocyte garland cell differentiation garland_cell heterocyst cell differentiation heterocyst Cell Types in GOCell Types in the Cell Ontology with

34 Alignment of the two ontologies will permit the generation of consistent and complete definitions id: CL: name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A ] is_a: CL: relationship: develops_from CL: relationship: develops_from CL: GO Cell type New Definition + = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

35 Other Ontologies to be aligned with GO Chemical ontologies –3,4-dihydroxy-2-butanone-4-phosphate synthase activity Anatomy ontologies –metanephros development GO itself –mitochondrial inner membrane peptidase activity  OBO core

36 eventually to comprehend all of OBO

37 Top Level OBO-UBO continuants: objects, characteristics, spatial regions occurrents: processes, temporal regions, spatio-temporal regions

38 Definitions should be intelligible to both machines and humans Machines can cope with the full formal representation Humans need modularity

39 Fifth Rule: Terms and relations should have clear definitions These tell us how the ontology relates to the world of biological instances, meaning the actual particulars in reality: –actual cells, actual portions of cytoplasm, and so on

40 But Some terms are primitive (cannot be defined) AVOID CIRCULAR DEFINITIONS ! Avoid definitions of the forms: An A is an A which is B (person = person with identity documents) An A is the B of an A (heptolysis = the causes of heptolysis)

41 siamese mammal cat organism substance types animal instances frog leaf type

42 Benefits of well-defined relationships If the relations in an ontology are well- defined, then reasoning can cascade from one relational assertion (A R 1 B) to the next (B R 2 C). Find all DNA binding proteins should also find all transcription factor proteins because transcription factor is_a DNA binding protein

43 What happens when an ontology has no clear definition of A is_a B: cancer documentation is_a cancer disease prevention is_a disease living subject is_a information object representing an animal or complex organism individual allele is_a act of observation

44 Pleural Cavity Pleural Cavity Interlobar recess Interlobar recess Mesothelium of Pleura Mesothelium of Pleura Pleura(Wall of Sac) Pleura(Wall of Sac) Visceral Pleura Visceral Pleura Pleural Sac Parietal Pleura Parietal Pleura Anatomical Space Organ Cavity Organ Cavity Serous Sac Cavity Serous Sac Cavity Anatomical Structure Anatomical Structure Organ Serous Sac Mediastinal Pleura Mediastinal Pleura Tissue Organ Part Organ Subdivision Organ Subdivision Organ Component Organ Component Organ Cavity Subdivision Organ Cavity Subdivision Serous Sac Cavity Subdivision Serous Sac Cavity Subdivision part_of is_a

45 How to define A is_a B A is_a B =def. all instances of A are as a matter of biological science also instances of B here A and B are names of types in reality

46 How to define A is_a B A is_a B =def. for all a if a instance_of A, then a instance_of B

47 Kinds of relations Between types: –is_a, part_of,... Between an instance and a type –this explosion instance_of the type explosion Between instances: –Mary’s heart part_of Mary

48 Part_of as a relation between types is more problematic than is standardly supposed heart part_of human being ? human heart part_of human being ? human being has_part human testis ? testis part_of human being ?

49 Definition of part_of as a relation between types A part_of B =Def all instances of A are instance-level parts of some instance of B human testis part_of adult human being

50 Instance level this nucleus is adjacent to this cytoplasm implies: this cytoplasm is adjacent to this nucleus Type level nucleus adjacent_to cytoplasm Not: cytoplasm adjacent_to nucleus seminal vesicle adjacent_to urinary bladder Not: urinary bladder adjacent_to seminal vesicle

51 Definitions of the all-some form allow cascading inferences If A R 1 B and B R 2 C, then we know that every A stands in R 1 to some B, but we know also that, whichever B this is, it can be plugged into the R 2 relation

52 c at t 1 C c at t C 1 time same instance transformation_of pre-RNAmature RNA adultchild

53 transformation_of A transformation_of B =Def. Every instance of A was at some earlier time an instance of B adult transformation_of child

54 embryological development C c at t c at t 1 C 1

55 C c at t c at t 1 C 1 tumor development

56 C c at t C 1 c 1 at t 1 C' c' at t time instances zygote derives_from ovum sperm derives_from

57 One main obstacle to integrating biological and experiment- generated data Most ontologies have no facility for dealing with time and instances

58 EXPO: Experiment Ontology

59 representational style part_of experimental hypothesis experimental actions part_of experimental design

60 tool part_of experimental design (confuses object with specification)

61 hypothesis driven is_a Galilean

62 physical is_a scientific experiment (avoid abbreviations)

63 admin info about experiment is_a scientific experiment

64 where is the top level? objects, processes, characteristics

65 is_a and part_of never cross categorial divides (cf. tripartite organization of GO) if A is_a B then A is an object type iff B is an object type then A is a process type iff B is a process type then A is a characteristic type iff B is a characteristic type

66 Some thoughts on time continuants vs. occurrents objects, characteristics vs. processes time timeline day daytime menstrual cycle high tide

67 What is time?

68 Top Level OBO-UBO continuants: objects, characteristics, spatial regions occurrents: processes, temporal regions, spatio-temporal regions Space = the largest spatial region Time = the largest temporal region

69 Relative time, subjective time terms describing (regions of) time in special (qualitative, perspective- dependent, landmark dependent) ways tomorrow, yesterday uptown, downtown phase A trial Wednesday

70 Characteristics are continuants many characteristics have realizations, applications or executions, which are processes plan design method menstrual cycle function

71 GlaxoSmithKline* What we need is “industrial-strength” ontologies with a consistent and rich representation formalism that are amenable for use as an integration framework, and support reasoning capabilities. We anticipate that pharma’s need to bring together mountains of data and information and to properly analyse that information all depend on having a stable, well-developed semantic framework that links information/data and that allows reasoning systems to perform some of our more "mundane" analysis work. *Robin McEntire

72 OBO Relation Ontology “Relations in Biomedical Ontologies”, Genome Biology, Apr relations for continuants behave differently from relations for processes

73 part_of for component types is time-indexed A part_of B =def. given any particular a and any time t, if a is an instance of A at t, then there is some instance b of B such that a is an instance-level part_of b at t

74 part_of for process types is not time-indexed A part_of B =def. given any particular a, if a is an instance of A, then there is some instance b of B such that a is an instance-level part_of b at t

75 Main Upper Level Ontologies CYC Cycorp (Austin, TX) human being = partially tangible thing SUO (Suggested Upper Ontology) IEEE monkey, body covering DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering) BFO (Basic Formal Ontology)

76 SUO top level Entity –PhysicalPhysical Object –SelfConnectedObjectSelfConnectedObject »SubstanceSubstance »CorpuscularObjectCorpuscularObject »FoodFood –RegionRegion –CollectionCollection –AgentAgent Process –AbstractAbstract SetOrClass Relation Quantity –NumberNumber –PhysicalQuantityPhysicalQuantity Attribute Proposition

77 MIGS Specification Top Levels Organism Phenotype Environment Sample Process Data Process