Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 How to Build an Ontology Barry Smith

Similar presentations


Presentation on theme: "1 How to Build an Ontology Barry Smith"— Presentation transcript:

1 1 How to Build an Ontology Barry Smith http://ontology.buffalo.edu/smith

2 2 Ontology A classification of entities and the relations between them. Ontology is a list of types structured by relations Defined by a scientific field's vocabulary and by the canonical formulations of its theories. Scientific theories consist of generalizations. What I will not be talking about: XML, OWL,..., data(types), information models, file formats...

3 3 Top-Level GO OBO, OBO Core NCBO FMA NCBC Roadmap Centers NCI EVS NECTAR (National Electronic Clinical Trials and Research) Network

4 4 Instances are not included in an ontology It is the generalizations that are important (but instances must still be taken into account)

5 5 A515287DC3300 Dust Collector Fan B521683Gilmer Belt C521682Motor Drive Belt

6 6 Ontology Types Instances

7 7 Ontology = A Representation of Types

8 8 Each node of an ontology consists of: preferred term (aka term) term identifier (TUI, aka CUI) synonyms definition, glosses, comments

9 9 Ontology = A Representation of Types Nodes in an ontology are connected by relations: primarily: is_a (= is subtype of) and part_of designed to support search, reasoning and annotation

10 10 Rules for formating terms Terms are names of types: if you prefix a term with the type ___ the term should still make sense Hence: terms should be in the singular Terms should be lower case Avoid abbreviations even when it is clear in context what they mean (‘breast’ for ‘breast tumor’)

11 11 Motivation: to capture reality Inferences and decisions we make are based upon what we know of reality. An ontology is a computable representation of this underlying bio(techno)logical reality. Enables a computer to reason over the data in (some of) the ways that we do.

12 12 Biomedical ontology integration / interoperability Will never be achieved through integration of meanings or concepts The problem is precisely that different user communities use different concepts What’s really needed is to have well-defined commonly used relationships

13 13 Concepts Biomedical ontology integration will never be achieved through integration of meanings or concepts The problem is precisely that different user communities use different concepts

14 14 Concepts Concepts are in your head and will change as our understanding changes Ontologies represent types: not concepts, meanings, ideas... Types exist, with their instances, in objective reality – including types of experimental process, design, method,...

15 15 Most ontologies are execrable But some good ontologies do already exist as far as possible don’t reinvent use the power of combination and collaboration ontologies are like telephones: they are valuable only to the degree that they are used and networked with other ontologies

16 16 Why do we need rules/standards for good ontology? Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking): unintuitive rules for classification lead to errors Intuitive rule facilitate training of curators and annotators Common rules allow alignment with other ontologies Logically coherent rules enhance harvesting of content through automatic reasoning systems

17 17 Rules on types Don’t confuse types with concepts Don’t confuse types with ways of getting to know types Don’t confuse types with ways of talking about types Don’t confuses types with data about types

18 18 First Rule: Univocity Terms (including those describing relations) should have the same meanings on every occasion of use. In other words, they should refer to the same types in reality

19 19 Second Rule: Positivity There are no negative types Terms such as ‘non-mammal’ or ‘non- membrane’ do not designate genuine types. (There are also no conjunctive and disjunctive types: rabbit and nailfile; rabbit or nosewipe)

20 20 Third Rule: Objectivity Which types exist is not a function of our biological knowledge. Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds.

21 21 Fourth Rule: Single Inheritance No type in a classificatory hierarchy should have more than one is_a parent on the immediate higher level

22 22 Rule of Single Inheritance no diamonds: C is_a 2 B is_a 1 A

23 23 Problems with multiple inheritance B C is_a 1 is_a 2 A ‘is_a’ no longer univocal

24 24 ‘is_a’ is pressed into service to mean a variety of different things shortfalls from single inheritance are often clues to incorrect entry of terms and relations the resulting ambiguities make the rules for correct entry difficult to communicate to human curators

25 25 is_a Overloading serves as obstacle to integration with neighboring ontologies The success of ontology alignment demands that ontological relations (is_a, part_of,...) have the same meanings in the different ontologies to be aligned.

26 26 To the degree that the above rules are not satisfied, error checking and ontology alignment will be achievable, at best, only with human intervention and via force majeure

27 27 Current Best Practice: The Foundational Model of Anatomy

28 28 Pleural Cavity Pleural Cavity Interlobar recess Interlobar recess Mesothelium of Pleura Mesothelium of Pleura Pleura(Wall of Sac) Pleura(Wall of Sac) Visceral Pleura Visceral Pleura Pleural Sac Parietal Pleura Parietal Pleura Anatomical Space Organ Cavity Organ Cavity Serous Sac Cavity Serous Sac Cavity Anatomical Structure Anatomical Structure Organ Serous Sac Mediastinal Pleura Mediastinal Pleura Tissue Organ Part Organ Subdivision Organ Subdivision Organ Component Organ Component Organ Cavity Subdivision Organ Cavity Subdivision Serous Sac Cavity Subdivision Serous Sac Cavity Subdivision part_of is_a

29 29 Current Best Practice: The Foundational Model of Anatomy Follows formal rules for definitions laid down by Aristotle. When A is_a B, the definition of ‘A’ takes the form: an A =def. a B which... a human being =def. an animal which is rational

30 30 FMA Example Cell def an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus Plasma membrane =def a cell part that surrounds the cytoplasm

31 31 The FMA regimentation Brings the advantage that each definition reflects the position in the hierarchy to which a defined term belongs. The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it. The entire information content of the FMA’s term hierarchy can be translated very cleanly into a computer representation

32 32 GO now adopting structured definitions contain both genus and differentiae Essence = Genus + Differentiae neuron cell differentiation = Genus: differentiation (processes whereby a relatively unspecialized cell acquires the specialized features of..) Differentiae: acquires features of a neuron

33 33 Ontology alignment One of the current goals of GO is to align: cone cell fate commitmentretinal_cone_cell keratinocyte differentiation keratinocyte adipocyte differentiation fat_cell dendritic cell activation dendritic_cell lymphocyte proliferation lymphocyte T-cell homeostasis T_lymphocyte garland cell differentiation garland_cell heterocyst cell differentiation heterocyst Cell Types in GOCell Types in the Cell Ontology with

34 34 Alignment of the two ontologies will permit the generation of consistent and complete definitions id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629] is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 GO Cell type New Definition + = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

35 35 Other Ontologies to be aligned with GO Chemical ontologies –3,4-dihydroxy-2-butanone-4-phosphate synthase activity Anatomy ontologies –metanephros development GO itself –mitochondrial inner membrane peptidase activity  OBO core

36 36 eventually to comprehend all of OBO

37 37 Top Level OBO-UBO continuants: objects, characteristics, spatial regions occurrents: processes, temporal regions, spatio-temporal regions

38 38 Definitions should be intelligible to both machines and humans Machines can cope with the full formal representation Humans need modularity

39 39 Fifth Rule: Terms and relations should have clear definitions These tell us how the ontology relates to the world of biological instances, meaning the actual particulars in reality: –actual cells, actual portions of cytoplasm, and so on

40 40 But Some terms are primitive (cannot be defined) AVOID CIRCULAR DEFINITIONS ! Avoid definitions of the forms: An A is an A which is B (person = person with identity documents) An A is the B of an A (heptolysis = the causes of heptolysis)

41 41 siamese mammal cat organism substance types animal instances frog leaf type

42 42 Benefits of well-defined relationships If the relations in an ontology are well- defined, then reasoning can cascade from one relational assertion (A R 1 B) to the next (B R 2 C). Find all DNA binding proteins should also find all transcription factor proteins because transcription factor is_a DNA binding protein

43 43 What happens when an ontology has no clear definition of A is_a B: cancer documentation is_a cancer disease prevention is_a disease living subject is_a information object representing an animal or complex organism individual allele is_a act of observation

44 44 Pleural Cavity Pleural Cavity Interlobar recess Interlobar recess Mesothelium of Pleura Mesothelium of Pleura Pleura(Wall of Sac) Pleura(Wall of Sac) Visceral Pleura Visceral Pleura Pleural Sac Parietal Pleura Parietal Pleura Anatomical Space Organ Cavity Organ Cavity Serous Sac Cavity Serous Sac Cavity Anatomical Structure Anatomical Structure Organ Serous Sac Mediastinal Pleura Mediastinal Pleura Tissue Organ Part Organ Subdivision Organ Subdivision Organ Component Organ Component Organ Cavity Subdivision Organ Cavity Subdivision Serous Sac Cavity Subdivision Serous Sac Cavity Subdivision part_of is_a

45 45 How to define A is_a B A is_a B =def. all instances of A are as a matter of biological science also instances of B here A and B are names of types in reality

46 46 How to define A is_a B A is_a B =def. for all a if a instance_of A, then a instance_of B

47 47 Kinds of relations Between types: –is_a, part_of,... Between an instance and a type –this explosion instance_of the type explosion Between instances: –Mary’s heart part_of Mary

48 48 Part_of as a relation between types is more problematic than is standardly supposed heart part_of human being ? human heart part_of human being ? human being has_part human testis ? testis part_of human being ?

49 49 Definition of part_of as a relation between types A part_of B =Def all instances of A are instance-level parts of some instance of B human testis part_of adult human being

50 50 Instance level this nucleus is adjacent to this cytoplasm implies: this cytoplasm is adjacent to this nucleus Type level nucleus adjacent_to cytoplasm Not: cytoplasm adjacent_to nucleus seminal vesicle adjacent_to urinary bladder Not: urinary bladder adjacent_to seminal vesicle

51 51 Definitions of the all-some form allow cascading inferences If A R 1 B and B R 2 C, then we know that every A stands in R 1 to some B, but we know also that, whichever B this is, it can be plugged into the R 2 relation

52 52 c at t 1 C c at t C 1 time same instance transformation_of pre-RNAmature RNA adultchild

53 53 transformation_of A transformation_of B =Def. Every instance of A was at some earlier time an instance of B adult transformation_of child

54 54 embryological development C c at t c at t 1 C 1

55 55 C c at t c at t 1 C 1 tumor development

56 56 C c at t C 1 c 1 at t 1 C' c' at t time instances zygote derives_from ovum sperm derives_from

57 57 One main obstacle to integrating biological and experiment- generated data Most ontologies have no facility for dealing with time and instances

58 58 EXPO: Experiment Ontology

59 59 representational style part_of experimental hypothesis experimental actions part_of experimental design

60 60 tool part_of experimental design (confuses object with specification)

61 61 hypothesis driven is_a Galilean

62 62 physical is_a scientific experiment (avoid abbreviations)

63 63 admin info about experiment is_a scientific experiment

64 64 where is the top level? objects, processes, characteristics

65 65 is_a and part_of never cross categorial divides (cf. tripartite organization of GO) if A is_a B then A is an object type iff B is an object type then A is a process type iff B is a process type then A is a characteristic type iff B is a characteristic type

66 66 Some thoughts on time continuants vs. occurrents objects, characteristics vs. processes time timeline day daytime menstrual cycle high tide

67 67 What is time?

68 68 Top Level OBO-UBO continuants: objects, characteristics, spatial regions occurrents: processes, temporal regions, spatio-temporal regions Space = the largest spatial region Time = the largest temporal region

69 69 Relative time, subjective time terms describing (regions of) time in special (qualitative, perspective- dependent, landmark dependent) ways tomorrow, yesterday uptown, downtown phase A trial Wednesday

70 70 Characteristics are continuants many characteristics have realizations, applications or executions, which are processes plan design method menstrual cycle function

71 71 GlaxoSmithKline* What we need is “industrial-strength” ontologies with a consistent and rich representation formalism that are amenable for use as an integration framework, and support reasoning capabilities. We anticipate that pharma’s need to bring together mountains of data and information and to properly analyse that information all depend on having a stable, well-developed semantic framework that links information/data and that allows reasoning systems to perform some of our more "mundane" analysis work. *Robin McEntire

72 72 OBO Relation Ontology “Relations in Biomedical Ontologies”, Genome Biology, Apr. 2005 relations for continuants behave differently from relations for processes

73 73 part_of for component types is time-indexed A part_of B =def. given any particular a and any time t, if a is an instance of A at t, then there is some instance b of B such that a is an instance-level part_of b at t

74 74 part_of for process types is not time-indexed A part_of B =def. given any particular a, if a is an instance of A, then there is some instance b of B such that a is an instance-level part_of b at t

75 75 Main Upper Level Ontologies CYC Cycorp (Austin, TX) human being = partially tangible thing SUO (Suggested Upper Ontology) IEEE monkey, body covering DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering) BFO (Basic Formal Ontology)

76 76 SUO top level Entity –PhysicalPhysical Object –SelfConnectedObjectSelfConnectedObject »SubstanceSubstance »CorpuscularObjectCorpuscularObject »FoodFood –RegionRegion –CollectionCollection –AgentAgent Process –AbstractAbstract SetOrClass Relation Quantity –NumberNumber –PhysicalQuantityPhysicalQuantity Attribute Proposition

77 77 MIGS Specification Top Levels Organism Phenotype Environment Sample Process Data Process


Download ppt "1 How to Build an Ontology Barry Smith"

Similar presentations


Ads by Google