Presentation is loading. Please wait.

Presentation is loading. Please wait.

What developers need to know about ontologies? Barry Smith 1.

Similar presentations


Presentation on theme: "What developers need to know about ontologies? Barry Smith 1."— Presentation transcript:

1 What developers need to know about ontologies? Barry Smith http://ontology.buffalo.edu/smith 1

2 HL7 Watch (blog) Microsoft Healthvault: Allergic Episode is_a Health Record Item, Health Record Item =def. A single piece of data in a health record that is accessible through the HealthVault service 2

3 3 Problem of ensuring sensible cooperation in a massively interdisciplinary community concept type instance model representation data

4 4 What do these mean? ‘conceptual data model’ ‘semantic knowledge model’ ‘reference information model’

5 You’re interested in which genes control heart muscle development 17,536 results 5

6 attacked time control Puparial adhesion Molting cycle hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Microarray data shows changed expression of thousands of genes. How will you spot the patterns? 6

7 Lab / pathology data EHR data Clinical trial data Family history data Medical image data Microarray data Model organism data Flow cytometry Mass spec Genotype / SNP data How will you find the data you need? 7

8 −Human −Mouse −Rat −Fish −Yeast −E. coli How will you find the compare the data? How will you integrate the data 8

9 :. The GO Idea MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity

10 :. annotation using common ontologies yields integration of databases MouseEcotope GlyProt DiabetInGene GluChem Holliday junction helicase complex

11 For this to work, ontologies cannot be allowed to proliferate uncontrollably Rather, we need as far as possible non- overlapping ontology modules (OBO Foundry) How should we build these modules in such a way as to ensure glue-ability of annotations?

12 12 Glue-ability / integration rests on the existence of a common benchmark called ‘reality’ the ontologies we want to glue together are representations of what exists in the world not of what exists in the heads of different groups of people

13 13 two kinds of annotations

14 14 names of types

15 15 names of instances

16 16 First basic distinction type vs. instance (science text vs. diary) (human being vs. Tom Cruise)

17 17 For ontologies it is generalizations that are important = ontologies are about types, kinds, universals

18 18 Ontology types Instances

19 19 Ontology = A Representation of types

20 20 An ontology is a representation of types We learn about types in reality from looking at the results of scientific experiments in the form of scientific theories experiments relate to what is particular science describes what is general

21 21 Inventory vs. Catalog Two kinds of representational artifact Very roughly: Databases represent instances Ontologies represent types

22 22 A515287DC3300 Dust Collector Fan B521683Gilmer Belt C521682Motor Drive Belt Catalog vs. inventory

23 23 Catalog vs. inventory

24 24 Catalog of types/Types

25 25 siamese mammal cat organism object types animal frog instances

26 26 Ontologies are here

27 27 or here

28 28 ontologies represent general structures in reality (leg)

29 29 Ontologies do not represent concepts in people’s heads

30 30 They represent types in reality

31 31 which provide the benchmark for integration

32 32 Entity =def anything which exists, including things and processes, functions and qualities, beliefs and actions, documents and software (Levels 1, 2 and 3)

33 33 what are the kinds of entity?

34 34 First basic distinction type vs. instance (science text vs. diary) (human being vs. Tom Cruise)

35 35 Ontology Types Instances

36 36 Ontology = A Representation of types

37 37 Domain =def a portion of reality that forms the subject- matter of a single science or technology or mode of study or administrative practice...; proteomics HIV epidemiology

38 38 Representation =def an image, idea, map, picture, name or description... of some entity or entities.

39 39 Ontologies are representational artifacts comparable to science texts and subject to the same sorts of constraints (including need for update)

40 40 Representational units =def terms, icons, alphanumeric identifiers... which refer, or are intended to refer, to entities and which are minimal (atoms)

41 41 Composite representation =def representation (1) built out of representational units which (2) form a structure that mirrors, or is intended to mirror, the entities in some domain

42 42 Analogue representations no representational units, no ‘atoms’

43 43 Periodic Table The Periodic Table

44 44 Class =def a maximal collection of particulars determined by a general term (‘cell’. ‘electron’ but also: ‘ ‘restaurant in Palo Alto’, ‘Italian’) the class A = the collection of all particulars x for which ‘x is A’ is true

45 45 types vs. their extensions types {a,b,c,...} collections of particulars

46 46 Extension =def The extension of a type A is the class: instance of the type A (it is the class of A’s instances) (the class of all entities to which the term ‘A’ applies)

47 47 Problem The same general term can be used to refer both to types and to collections of particulars. Consider: HIV is an infectious retrovirus HIV is spreading very rapidly through Asia

48 48 types vs. classes types {c,d,e,...} classes

49 49 types vs. classes types ~ defined classes

50 50 types vs. classes types e.g. populations,...

51 51 Defined class =def a class defined by a general term which does not designate a type the class of all diabetic patients in Leipzig on 4 June 1952

52 52 OWL is a good representation of defined classes sibling of Finnish spy member of Abba aged > 50 years pizza with > 4 different toppings

53 53 Terminology =def. a representational artifact whose representational units are natural language terms (with IDs, synonyms, comments, etc.) which are intended to designate types together with defined classes, with no particular attention to composite representations

54 54 types, classes, concepts types defined classes ‘concepts’ ?

55 55 types < defined classes < ‘concepts’ ‘concepts’ which do not correspond to defined classes: ‘Surgical or other procedure not carried out because of patient's decision’ ‘Congenital absent nipple’ because they do not correspond to anything

56 Gene Ontology: The Very Top cellular component molecular function biological process 56

57 Gene Ontology: The Very Top continuant cellular component molecular function occurrent biological process 57

58 BFO: The Very Top continuant occurrent biological processes independent continuant cellular component dependent continuant molecular function 58

59 Basic Formal Ontology continuant occurrent independent continuant dependent continuant organism 59

60 Basic Formal Ontology continuant occurrent independent continuant dependent continuant anatomical structure 60

61 Continuants continue to exist through time, preserving their identity while undergoing different sorts of changes independent continuants – objects, things,... dependent continuants – qualities, attributes, shapes, potentialities... 61

62 Qualities temperature blood pressure mass... are continuants they exist through time while undergoing changes 62

63 Qualities temperature / blood pressure / mass... are dimensions of variation within the structure of the entity; a quality is something which can change while its bearer remains one and the same 63

64 Qualities temperature / blood pressure / mass... are dimensions of variation within the structure of the entity; a quality is something which can change while its bearer remains one and the same hence only independent continuants may have qualities 64

65 A Chart representing how John’s temperature changes 65

66 John’s temperature the temperature he has throughout his entire life, cycles through different determinate temperatures from one time to the next John’s temperature is a physiology variable which, in thus changing, exerts an influence on other physiology variables through time 66

67 BFO: The Very Top continuant independent continuant dependent continuant quality occurrent temperature 67

68 Blinding Flash of the Obvious independent continuant dependent continuant quality temperature types instances organism John John’s temperature 68

69 Blinding Flash of the Obvious independent continuant dependent continuant quality temperature types instances organism John John’s temperature 69

70 Blinding Flash of the Obvious temperature types instances organism John John’s temperature 70 inheres_in

71 temperature types instances John’s temperature 71 37ºC37.1ºC37.5ºC37.2ºC37.3ºC37.4ºC instantiates at t 1 instantiates at t 2 instantiates at t 3 instantiates at t 4 instantiates at t 5 instantiates at t 6

72 human types instances John 72 embryofetusadultneonateinfantchild instantiates at t 1 instantiates at t 2 instantiates at t 3 instantiates at t 4 instantiates at t 5 instantiates at t 6

73 lower lever of types does not ‘carry identity’ in OntoClean terms are threshold divisions (hence we do not have sharp boundaries, and we have a certain degree of choice, e.g. in how many subtypes to distinguish, though not in their ordering) 73

74 independent continuant dependent continuant quality temperature types instances organism John John’s temperature 74

75 independent continuant dependent continuant quality temperature organism John John’s temperature occurrent process course of temperature changes John’s temperature history 75

76 independent continuant dependent continuant quality temperature organism John John’s temperature occurrent process life of an organism John’s life 76

77 BFO/GO: The Very Top continuant occurrent biological processes independent continuant cellular component dependent continuant molecular function 77

78 BFO: The Very Top continuantoccurrent independent continuant dependent continuant quality function role disposition 78

79 :. Function - of liver: to store glycogen - of birth canal: to enable transport - of eye: to see - of mitochondrion: to produce ATP - of liver: to store glycogen not optional; reflection of physical makeup of bearer; can malfunction 79

80 :. Role optional: exists because the bearer is in some special natural, social, or institutional set of circumstances in which the bearer does not have to be 80

81 :. Role - bearers can have more than one role person as student / as staff member - roles often form systems of mutual dependence husband / wife first in queue / last in queue doctor / patient host / pathogen 81

82 :. Role of some chemical compound: to serve as analyte in an experiment of a dose of penicillin in this human child: to treat a disease of this bacteria in a primary host: to cause infection 82

83 :. Qualities are categorical features of reality – you just have them Functions, roles and dispositions are potential featires of reality: they are realizable dependent continuants, realized in certain associated processes 83

84 independent continuant dependent continuant role drug role portion of chemical compound this portion of aspirin role of this portion of aspirin occurrent process process of drug adminstration John’s taking this portion of aspirin 84

85 independent continuant dependent continuant role drug role portion of chemical compound this portion of aspirin role of this portion of aspirin occurrent process process of drug adminstration John’s taking this portion of aspirin 85 inheres_in realized_in

86 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry 86

87 The Road to Convergence All ontologies for each given domain (anatomy, chemistry…) should be part of a single suite of interoperable ontologies should use a common top-level core for subdomains with many variants, should follow the strategy of canonical ontologies with extensions should require acceptance of common, tested guidelines on all subscribing ontology developers 87

88 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) initial OBO Foundry coverage, ontologies automatically semantically coupled GRANULARITY RELATION TO TIME 88

89 Disposition (Internally- Grounded Realizable Entity) disposition =def. a realizable entity which if it ceases to exist, then its bearer is physically changed, and whose realization occurs when this bearer is in some special physical circumstances, in virtue of the bearer’s physical make-up 89

90 Function A Disposition (Internally-Grounded Realizable Entity) that is designed or selected for 90

91 OGMS Ontology for General Medical Science http://code.google.com/p/ogms 91

92 :. Physical Disorder – independent continuant fiat object part 92

93 Big Picture 93

94 A disease is a disposition rooted in a physical disorder in the organism and realized in pathological processes. etiological process produces disorder bears disposition realized_in pathological process produces abnormal bodily features recognized_as signs & symptomsinterpretive process produces diagnosis used_in 94

95 Elucidation of Primitive Terms ‘bodily feature’ - an abbreviation for a physical component, a bodily quality, or a bodily process. disposition - an attribute describing the propensity to initiate certain specific sorts of processes when certain conditions are satisfied. clinically abnormal - some bodily feature that –(1) is not part of the life plan for an organism of the relevant type (unlike aging or pregnancy), –(2) is causally linked to an elevated risk either of pain or other feelings of illness, or of death or dysfunction, and –(3) is such that the elevated risk exceeds a certain threshold level.* *Compare: baldness 95

96 Definitions - Foundational Terms Disorder =def. – A causally linked combination of physical components that is clinically abnormal. Pathological Process =def. – A bodily process that is a manifestation of a disorder and is clinically abnormal. Disease =def. – A disposition (i) to undergo pathological processes that (ii) exists in an organism because of one or more disorders in that organism. 96

97 Dispositions and Predispositions All diseases are dispositions; not all dispositions are diseases. A predisposition is a disposition. Predisposition to Disease of Type X =def. – A disposition in an organism that constitutes an increased risk of the organism’s subsequently developing the disease X. HNPCC is caused by a –disorder (mutation) in a DNA mismatch repair gene that –disposes to the acquisition of additional mutations from defective DNA repair processes, and thus is a –predisposition to the development of colon cancer. 97

98 Cirrhosis - environmental exposure Etiological process - phenobarbitol- induced hepatic cell death –produces Disorder - necrotic liver –bears Disposition (disease) - cirrhosis –realized_in Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death –produces Abnormal bodily features –recognized_as Symptoms - fatigue, anorexia Signs - jaundice, splenomegaly Symptoms & Signs used_in Interpretive process produces Hypothesis - rule out cirrhosis suggests Laboratory tests produces Test results - elevated liver enzymes in serum used_in Interpretive process produces Result - diagnosis that patient X has a disorder that bears the disease cirrhosis 98

99 Influenza - infectious Etiological process - infection of airway epithelial cells with influenza virus –produces Disorder - viable cells with influenza virus –bears Disposition (disease) - flu –realized_in Pathological process - acute inflammation –produces Abnormal bodily features –recognized_as Symptoms - weakness, dizziness Signs - fever Symptoms & Signs used_in Interpretive process produces Hypothesis - rule out influenza suggests Laboratory tests produces Test results - elevated serum antibody titers used_in Interpretive process produces Result - diagnosis that patient X has a disorder that bears the disease flu But the disorder also induces normal physiological processes (immune response) that can results in the elimination of the disorder (transient disease course). 99

100 Huntington’s Disease - genetic Etiological process - inheritance of >39 CAG repeats in the HTT gene –produces Disorder - chromosome 4 with abnormal mHTT –bears Disposition (disease) - Huntington’s disease –realized_in Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum –produces Abnormal bodily features –recognized_as Symptoms - anxiety, depression Signs - difficulties in speaking and swallowing Symptoms & Signs used_in Interpretive process produces Hypothesis - rule out Huntington’s suggests Laboratory tests produces Test results - molecular detection of the HTT gene with >39CAG repeats used_in Interpretive process produces Result - diagnosis that patient X has a disorder that bears the disease Huntington’s disease 100

101 HNPCC - genetic pre-disposition Etiological process - inheritance of a mutant mismatch repair gene –produces Disorder - chromosome 3 with abnormal hMLH1 –bears Disposition (disease) - Lynch syndrome –realized_in Pathological process - abnormal repair of DNA mismatches –produces Disorder - mutations in proto-oncogenes and tumor suppressor genes with microsatellite repeats (e.g. TGF-beta R2) –bears Disposition (disease) - non-polyposis colon cancer –realized in Symptoms (including pain) 101

102 The OBO Foundry Initiative 102

103 A good solution to the data integration problem must be: modular incremental bottom-up evidence-based revisable incorporate a strategy for motivating potential developers and users 103

104 GO is amazingly successful – but covers only three sorts of biological entities: –cellular components –molecular functions –biological processes and does not provide representations of disease-related phenomena 104

105 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry 105

106 OBO Foundry provides tested guidelines enabling new groups to develop the ontologies they need in ways which counteract forking and dispersion of effort an incremental bottoms-up approach to evidence-based terminology practices in medicine that is rooted in basic biology automatic web-based linkage between medical terminologies and biological knowledge resources traffic laws and traffic police 106

107 the strategy establish common rules governing best practices for creating ontologies in coordinated fashion, with an evidence- based pathway to incremental improvement 107

108 The methodology of cross-products compound terms in ontologies to be defined as cross-products of simpler terms: E.g elevated blood glucose is a cross-product of PATO: increased concentration with FMA: blood and CheBI: glucose. = factoring out of ontologies into discipline- specific modules (orthogonality) 108

109 The methodology of cross-products enforcing use of common relations in linking terms drawn from Foundry ontologies serves to ensure that the ontologies are maintained and revised in tandem logically defined relations serve to bind terms in different ontologies together to create a network 109

110 CRITERIA  opennness  common formal language.  collaborative development  evidence-based maintenance  identifiers  versioning  textual and formal definitions CRITERIA 110

111 Orthogonality = modularity one ontology for each domain no need for mappings (which are in any case too expensive, too fragile, too difficult to keep up-to-date as mapped ontologies change) everyone knows where to look to find out how to annotate each kind of data 111

112 Ontologies and research groups using BFO and RO –OBO Foundry (60 biomedical ontologies, including GO, OBI, Protein Ontology, Cell Ontology, IDO … –National Cancer Institute (BiomedGT) –NIF (NIH Neuroscience Information Framework) –Cleveland Clinic Semantic Database –Siemens –AstraZeneca –EU (ACGT Cancer Ontology, RAPS, …) 112

113 Because the ontologies in the Foundry are built as orthogonal modules which form an incrementally evolving network scientists are motivated to commit to developing ontologies because they will need in their own work ontologies that fit into this network users are motivated by the assurance that the ontologies they turn to are maintained by experts 113

114 More benefits of orthogonality helps those new to ontology to find what they need to find models of good practice ensures mutual consistency of ontologies (trivially) and thereby ensures additivity of annotations 114

115 More benefits of orthogonality it rules out the sorts of simplification and partiality which may be acceptable under more pluralistic regimes thereby brings an obligation on the part of ontology developers to commit to scientific accuracy and domain-completeness 115

116 More criteria of a successful standard 1.intelligibility to users, consistent use of terms like ‘term’, ‘class’, ‘entity’, ‘object’ …) 2.track record of lessons learned (GO has 10 years of hard user testing) 3.lots of existing users (ontologies are like telephone networks) 116

117  The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the Basic Formal Ontology (BFO) including the Relation Ontology (RO) http://ifomis.org/bfo http://www.obofoundry.org/ro/ COMMON ARCHITECTURE 117

118 Anatomy Ontology (FMA*, CARO) Environment Ontology (EnvO) Infectious Disease Ontology (IDO*) Biological Process Ontology (GO*) Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Phenotypic Quality Ontology (PaTO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Molecular Function (GO*) Protein Ontology (PRO*) OBO Foundry Modular Organization top level mid-level domain level Information Artifact Ontology (IAO) Ontology for Biomedical Investigations (OBI) Spatial Ontology (BSPO) Basic Formal Ontology (BFO) 118

119 continuant independent continuant portion of material object fiat object part object aggregate object boundary site dependent continuant generically dependent continuant information artifact specifically dependent continuant quality realizable entity function role disposition spatial region 0D-region 1D-region 2D-region 3D-region BFO:continuant

120 occurrent processual entity process fiat process part process aggregate process boundary processual context spatiotemporal region scattered spatiotemporal region connected spatiotemporal region spatiotemporal instant spatiotemporal interval temporal region scattered temporal region connected temporal region temporal instant temporal interval BFO:occurrent

121 Example: The Cell Ontology

122 Anatomy Ontology (FMA*, CARO) Environment Ontology (EnvO) Infectious Disease Ontology (IDO*) Biological Process Ontology (GO*) Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Phenotypic Quality Ontology (PaTO) Subcellular Anatomy Ontology (SAO) Sequence Ontology (SO*) Molecular Function (GO*) Protein Ontology (PRO*) OBO Foundry Modular Organization top level mid-level domain level Information Artifact Ontology (IAO) Ontology for Biomedical Investigations (OBI) Spatial Ontology (BSPO) Basic Formal Ontology (BFO) 122


Download ppt "What developers need to know about ontologies? Barry Smith 1."

Similar presentations


Ads by Google