Download presentation
Presentation is loading. Please wait.
1
1 How to Build an Ontology Barry Smith http://ontology.buffalo.edu/smith
2
2 Ontology A classification of entities and the relations between them. Ontology is a list of types structured by relations Defined by a scientific field's vocabulary and by the canonical formulations of its theories. Scientific theories consist of generalizations. What I will not be talking about: XML, OWL,..., data(types), information models, file formats...
3
3 Top-Level GO OBO, OBO Core NCBO FMA NCBC Roadmap Centers NCI EVS NECTAR (National Electronic Clinical Trials and Research) Network
4
4 Instances are not included in an ontology It is the generalizations that are important (but instances must still be taken into account)
5
5 A515287DC3300 Dust Collector Fan B521683Gilmer Belt C521682Motor Drive Belt
6
6 Ontology Types Instances
7
7 Ontology = A Representation of Types
8
8 Each node of an ontology consists of: preferred term (aka term) term identifier (TUI, aka CUI) synonyms definition, glosses, comments
9
9 Ontology = A Representation of Types Nodes in an ontology are connected by relations: primarily: is_a (= is subtype of) and part_of designed to support search, reasoning and annotation
10
10 Rules for formating terms Terms are names of types: if you prefix a term with the type ___ the term should still make sense Hence: terms should be in the singular Terms should be lower case Avoid abbreviations even when it is clear in context what they mean (‘breast’ for ‘breast tumor’)
11
11 Motivation: to capture reality Inferences and decisions we make are based upon what we know of reality. An ontology is a computable representation of this underlying bio(techno)logical reality. Enables a computer to reason over the data in (some of) the ways that we do.
12
12 Biomedical ontology integration / interoperability Will never be achieved through integration of meanings or concepts The problem is precisely that different user communities use different concepts What’s really needed is to have well-defined commonly used relationships
13
13 Concepts Biomedical ontology integration will never be achieved through integration of meanings or concepts The problem is precisely that different user communities use different concepts
14
14 Concepts Concepts are in your head and will change as our understanding changes Ontologies represent types: not concepts, meanings, ideas... Types exist, with their instances, in objective reality – including types of experimental process, design, method,...
15
15 Most ontologies are execrable But some good ontologies do already exist as far as possible don’t reinvent use the power of combination and collaboration ontologies are like telephones: they are valuable only to the degree that they are used and networked with other ontologies
16
16 Why do we need rules/standards for good ontology? Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking): unintuitive rules for classification lead to errors Intuitive rule facilitate training of curators and annotators Common rules allow alignment with other ontologies Logically coherent rules enhance harvesting of content through automatic reasoning systems
17
17 Rules on types Don’t confuse types with concepts Don’t confuse types with ways of getting to know types Don’t confuse types with ways of talking about types Don’t confuses types with data about types
18
18 First Rule: Univocity Terms (including those describing relations) should have the same meanings on every occasion of use. In other words, they should refer to the same types in reality
19
19 Second Rule: Positivity There are no negative types Terms such as ‘non-mammal’ or ‘non- membrane’ do not designate genuine types. (There are also no conjunctive and disjunctive types: rabbit and nailfile; rabbit or nosewipe)
20
20 Third Rule: Objectivity Which types exist is not a function of our biological knowledge. Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds.
21
21 Fourth Rule: Single Inheritance No type in a classificatory hierarchy should have more than one is_a parent on the immediate higher level
22
22 Rule of Single Inheritance no diamonds: C is_a 2 B is_a 1 A
23
23 Problems with multiple inheritance B C is_a 1 is_a 2 A ‘is_a’ no longer univocal
24
24 ‘is_a’ is pressed into service to mean a variety of different things shortfalls from single inheritance are often clues to incorrect entry of terms and relations the resulting ambiguities make the rules for correct entry difficult to communicate to human curators
25
25 is_a Overloading serves as obstacle to integration with neighboring ontologies The success of ontology alignment demands that ontological relations (is_a, part_of,...) have the same meanings in the different ontologies to be aligned.
26
26 To the degree that the above rules are not satisfied, error checking and ontology alignment will be achievable, at best, only with human intervention and via force majeure
27
27 Current Best Practice: The Foundational Model of Anatomy
28
28 Pleural Cavity Pleural Cavity Interlobar recess Interlobar recess Mesothelium of Pleura Mesothelium of Pleura Pleura(Wall of Sac) Pleura(Wall of Sac) Visceral Pleura Visceral Pleura Pleural Sac Parietal Pleura Parietal Pleura Anatomical Space Organ Cavity Organ Cavity Serous Sac Cavity Serous Sac Cavity Anatomical Structure Anatomical Structure Organ Serous Sac Mediastinal Pleura Mediastinal Pleura Tissue Organ Part Organ Subdivision Organ Subdivision Organ Component Organ Component Organ Cavity Subdivision Organ Cavity Subdivision Serous Sac Cavity Subdivision Serous Sac Cavity Subdivision part_of is_a
29
29 Current Best Practice: The Foundational Model of Anatomy Follows formal rules for definitions laid down by Aristotle. When A is_a B, the definition of ‘A’ takes the form: an A =def. a B which... a human being =def. an animal which is rational
30
30 FMA Example Cell def an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus Plasma membrane =def a cell part that surrounds the cytoplasm
31
31 The FMA regimentation Brings the advantage that each definition reflects the position in the hierarchy to which a defined term belongs. The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it. The entire information content of the FMA’s term hierarchy can be translated very cleanly into a computer representation
32
32 GO now adopting structured definitions contain both genus and differentiae Essence = Genus + Differentiae neuron cell differentiation = Genus: differentiation (processes whereby a relatively unspecialized cell acquires the specialized features of..) Differentiae: acquires features of a neuron
33
33 Ontology alignment One of the current goals of GO is to align: cone cell fate commitmentretinal_cone_cell keratinocyte differentiation keratinocyte adipocyte differentiation fat_cell dendritic cell activation dendritic_cell lymphocyte proliferation lymphocyte T-cell homeostasis T_lymphocyte garland cell differentiation garland_cell heterocyst cell differentiation heterocyst Cell Types in GOCell Types in the Cell Ontology with
34
34 Alignment of the two ontologies will permit the generation of consistent and complete definitions id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629] is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 GO Cell type New Definition + = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.
35
35 Other Ontologies to be aligned with GO Chemical ontologies –3,4-dihydroxy-2-butanone-4-phosphate synthase activity Anatomy ontologies –metanephros development GO itself –mitochondrial inner membrane peptidase activity OBO core
36
36 eventually to comprehend all of OBO
37
37 Top Level OBO-UBO continuants: objects, characteristics, spatial regions occurrents: processes, temporal regions, spatio-temporal regions
38
38 Definitions should be intelligible to both machines and humans Machines can cope with the full formal representation Humans need modularity
39
39 Fifth Rule: Terms and relations should have clear definitions These tell us how the ontology relates to the world of biological instances, meaning the actual particulars in reality: –actual cells, actual portions of cytoplasm, and so on
40
40 But Some terms are primitive (cannot be defined) AVOID CIRCULAR DEFINITIONS ! Avoid definitions of the forms: An A is an A which is B (person = person with identity documents) An A is the B of an A (heptolysis = the causes of heptolysis)
41
41 siamese mammal cat organism substance types animal instances frog leaf type
42
42 Benefits of well-defined relationships If the relations in an ontology are well- defined, then reasoning can cascade from one relational assertion (A R 1 B) to the next (B R 2 C). Find all DNA binding proteins should also find all transcription factor proteins because transcription factor is_a DNA binding protein
43
43 What happens when an ontology has no clear definition of A is_a B: cancer documentation is_a cancer disease prevention is_a disease living subject is_a information object representing an animal or complex organism individual allele is_a act of observation
44
44 Pleural Cavity Pleural Cavity Interlobar recess Interlobar recess Mesothelium of Pleura Mesothelium of Pleura Pleura(Wall of Sac) Pleura(Wall of Sac) Visceral Pleura Visceral Pleura Pleural Sac Parietal Pleura Parietal Pleura Anatomical Space Organ Cavity Organ Cavity Serous Sac Cavity Serous Sac Cavity Anatomical Structure Anatomical Structure Organ Serous Sac Mediastinal Pleura Mediastinal Pleura Tissue Organ Part Organ Subdivision Organ Subdivision Organ Component Organ Component Organ Cavity Subdivision Organ Cavity Subdivision Serous Sac Cavity Subdivision Serous Sac Cavity Subdivision part_of is_a
45
45 How to define A is_a B A is_a B =def. all instances of A are as a matter of biological science also instances of B here A and B are names of types in reality
46
46 How to define A is_a B A is_a B =def. for all a if a instance_of A, then a instance_of B
47
47 Kinds of relations Between types: –is_a, part_of,... Between an instance and a type –this explosion instance_of the type explosion Between instances: –Mary’s heart part_of Mary
48
48 Part_of as a relation between types is more problematic than is standardly supposed heart part_of human being ? human heart part_of human being ? human being has_part human testis ? testis part_of human being ?
49
49 Definition of part_of as a relation between types A part_of B =Def all instances of A are instance-level parts of some instance of B human testis part_of adult human being
50
50 Instance level this nucleus is adjacent to this cytoplasm implies: this cytoplasm is adjacent to this nucleus Type level nucleus adjacent_to cytoplasm Not: cytoplasm adjacent_to nucleus seminal vesicle adjacent_to urinary bladder Not: urinary bladder adjacent_to seminal vesicle
51
51 Definitions of the all-some form allow cascading inferences If A R 1 B and B R 2 C, then we know that every A stands in R 1 to some B, but we know also that, whichever B this is, it can be plugged into the R 2 relation
52
52 c at t 1 C c at t C 1 time same instance transformation_of pre-RNAmature RNA adultchild
53
53 transformation_of A transformation_of B =Def. Every instance of A was at some earlier time an instance of B adult transformation_of child
54
54 embryological development C c at t c at t 1 C 1
55
55 C c at t c at t 1 C 1 tumor development
56
56 C c at t C 1 c 1 at t 1 C' c' at t time instances zygote derives_from ovum sperm derives_from
57
57 One main obstacle to integrating biological and experiment- generated data Most ontologies have no facility for dealing with time and instances
58
58 EXPO: Experiment Ontology
59
59 representational style part_of experimental hypothesis experimental actions part_of experimental design
60
60 tool part_of experimental design (confuses object with specification)
61
61 hypothesis driven is_a Galilean
62
62 physical is_a scientific experiment (avoid abbreviations)
63
63 admin info about experiment is_a scientific experiment
64
64 where is the top level? objects, processes, characteristics
65
65 is_a and part_of never cross categorial divides (cf. tripartite organization of GO) if A is_a B then A is an object type iff B is an object type then A is a process type iff B is a process type then A is a characteristic type iff B is a characteristic type
66
66 Some thoughts on time continuants vs. occurrents objects, characteristics vs. processes time timeline day daytime menstrual cycle high tide
67
67 What is time?
68
68 Top Level OBO-UBO continuants: objects, characteristics, spatial regions occurrents: processes, temporal regions, spatio-temporal regions Space = the largest spatial region Time = the largest temporal region
69
69 Relative time, subjective time terms describing (regions of) time in special (qualitative, perspective- dependent, landmark dependent) ways tomorrow, yesterday uptown, downtown phase A trial Wednesday
70
70 Characteristics are continuants many characteristics have realizations, applications or executions, which are processes plan design method menstrual cycle function
71
71 GlaxoSmithKline* What we need is “industrial-strength” ontologies with a consistent and rich representation formalism that are amenable for use as an integration framework, and support reasoning capabilities. We anticipate that pharma’s need to bring together mountains of data and information and to properly analyse that information all depend on having a stable, well-developed semantic framework that links information/data and that allows reasoning systems to perform some of our more "mundane" analysis work. *Robin McEntire
72
72 OBO Relation Ontology “Relations in Biomedical Ontologies”, Genome Biology, Apr. 2005 relations for continuants behave differently from relations for processes
73
73 part_of for component types is time-indexed A part_of B =def. given any particular a and any time t, if a is an instance of A at t, then there is some instance b of B such that a is an instance-level part_of b at t
74
74 part_of for process types is not time-indexed A part_of B =def. given any particular a, if a is an instance of A, then there is some instance b of B such that a is an instance-level part_of b at t
75
75 Main Upper Level Ontologies CYC Cycorp (Austin, TX) human being = partially tangible thing SUO (Suggested Upper Ontology) IEEE monkey, body covering DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering) BFO (Basic Formal Ontology)
76
76 SUO top level Entity –PhysicalPhysical Object –SelfConnectedObjectSelfConnectedObject »SubstanceSubstance »CorpuscularObjectCorpuscularObject »FoodFood –RegionRegion –CollectionCollection –AgentAgent Process –AbstractAbstract SetOrClass Relation Quantity –NumberNumber –PhysicalQuantityPhysicalQuantity Attribute Proposition
77
77 MIGS Specification Top Levels Organism Phenotype Environment Sample Process Data Process
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.