Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Ontologies in Biomedicine: The Good, The Bad and The Ugly Barry Smith

Similar presentations


Presentation on theme: "1 Ontologies in Biomedicine: The Good, The Bad and The Ugly Barry Smith"— Presentation transcript:

1 http://ncor.us 1 Ontologies in Biomedicine: The Good, The Bad and The Ugly Barry Smith http://ontology.buffalo.edu/smith

2 http://ncor.us 2 The Good Foundational Model of Anatomy (FMA) Pro Very clear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromolecule Powerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning Con Some unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé)

3 http://ncor.us 3 Intermediate GALEN Pro Allows formal representation of clinical information Allows multiple views of relevant detail as needed Uses powerful Description Logic (DL)-based formal structure Con Remains only partially developed Contains errors: Vomitus contains carrot – which DLs did not prevent

4 http://ncor.us 4 Intermediate The Gene Ontology Con Poor formal architecture Full of errors menopause part_of death Poor support for automatic reasoning and error- checking Poor treatment of definitions Not trans-granular No relation to time or instances

5 http://ncor.us 5 The Gene Ontology Pro Open Source Cross-Species... has recognized the need for reform, including explicit representation of granular levels

6 http://ncor.us 6 Problem of Circularity GO:0042270: Protection from natural killer cell mediated cytolysis Definition: The process of protecting a cell from cytolysis by natural killer cells.

7 http://ncor.us 7 GO:0019836 hemolysis Definition: The processes that cause hemolysis X = def. the Y of X this is worse than circular

8 http://ncor.us 8 The Bad Reactome Pro Rich catalogue of biological process Con Incoherent treatment of categories: ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles). Similarly CatalystActivity is a sibling of Event.

9 http://ncor.us 9 The Bad National Cancer Institute Thesaurus Pro Open source; ambitiously broad coverage; DL-based Con Poor realization of DL formalism Full of mistakes (many inherited from its UMLS sources): –three disjoint classes of plants: Vascular Plant, Non-vascular Plant, Other Plant –three disjoint kinds of cells: Cell, Normal Cell, Abnormal Cell –Normal Cell is_a Microanatomy See http://ontology.buffalo.edu/medo/NCIT_Smith.html

10 http://ncor.us 10 National Cancer Institute Thesaurus Duratec, Lactobutyrin and Stilbene Aldehyde classified as: Unclassified Drugs and Chemicals Pro NCIT, too, has recognized the need for reform (NCIT is part of the OBO library)

11 http://ncor.us 11 The Ugly UMLS Semantic Network Pros Broad coverage; no multiple inheritance Cons Incoherent use of ‘conceptual entities’ (e.g. the digestive system as a conceptual part of the organism) Full of errors

12 http://ncor.us 12 UMLS Semantic Network Edges in the graph represent merely “possible significant relations”: –Bacterium causes Experimental Model of Disease –Experimental Model of Disease affects Fungus –Experimental model of disease is_a Pathologic Function

13 http://ncor.us 13 UMLS Semantic Network Unclear what the nodes of the graph are: Drug Delivery Device contains Clinical Drug Drug Delivery Device narrower_in_meaning_than Manufactured Object The use-mention confusion: “Swimming is healthy and has 8 letters”

14 http://ncor.us 14 The Ugly Clinical Terms Version 2 (The Read Codes) Classifies chemicals into: chemicals whose name begins with ‘A’, chemicals whose name begins with ‘B’, chemicals whose name begins with ‘C’,...

15 http://ncor.us 15 The Astonishingly (Criminally?) Ugly Health Level 7 HL7 is a UML-based standard for exchange of information between clinical information systems has proved very crumbly as a standard The HL7 Reference Information Model (RIM) is supposed to overcome this problem by defining the universe of healthcare data in a rigorous way

16 http://ncor.us 16 HL7-RIM Animal Definition: A subtype of Living Subject representing any animal-of-interest to the Personnel Management domain. Person A subtype of Living Subject representing single human being [sic] who, in the context of the Personnel Management domain, must also be uniquely identifiable through one or more legal documents. LivingSubject Definition: A subtype of Entity representing an organism or complex animal, alive or not.

17 http://ncor.us 17 HL7 RIM: The Problem of Circularity Person = Person with documents has the form: ‘An A is an A which is B’ – useless in practical terms since neither we nor the machine can use them to find out what ‘A’ means – incorporate a vicious infinite regress – have the effect of making it impossible to refer to A’s which are not Bs, for example to an undocumented person

18 http://ncor.us 18 HL7 Logically Incoherent act = the record of an act This has the form: An X is the Y of an X again worse than circular

19 http://ncor.us 19 HL7-RIM: Logically Contradictory Definitions Definition of Act: An Act is an action of interest that has happened, can happen, is happening, is intended to happen, or is requested/demanded to happen. Definition of Act: An Act is the record of something that is being done, has been done, can be done, or is intended or requested to be done.

20 http://ncor.us 20 HL7 RIM Ontologically Incoherent The truth about the real world is constructed through a combination and arbitration of attributed statements... As such, there is no distinction between an activity and its documentation.

21 http://ncor.us 21 HL7 Incredibly Successful embraced as US federal standard; central part of $15 billion program to integrate all UK hospital information systems made mandatory by Canada Health Infoway adopted by Oracle as basis for its EHR support programs

22 http://ncor.us 22 HL7 Merchandizing

23 http://ncor.us 23 From molecules to diseases A good ontology should enable us to organize our information resources in such a way that we can bridge the granularity gap between genomics and proteomics data and phenotype (clinical, pharmacological, patient-centered) data

24 http://ncor.us 24 good ontologies require: Coherent upper level taxonomy distinguishing continuants (cells, molecules, organisms...) occurrents (events, processes) dependent entities (qualities, functions...) independent entities (their bearers) universals (types, kinds) instances (tokens, instances) Coherent relation ontology supporting inference both within and between ontologies.

25 http://ncor.us 25 good ontologies require: Consistent use of terms, supported by logically coherent (non-circular) definitions, in both human-readable and computable formats

26 http://ncor.us 26 Open Biomedical Ontologies (OBO) Upper Biomedical Ontology (UBO) root UBO:0000001:topUBO:0000001:top subclass BFO:continuant:continuantBFO:continuant:continuant – subclass BFO:dependent_entity:dependent_entity BFO:dependent_entity:dependent_entity subclass UBO:0000023:quality UBO:0000023:quality – subclass UBO:0000026:phenotype UBO:0000026:phenotype » subclass UBO:0000025:state UBO:0000025:state – subclass UBO:0000027:disease UBO:0000027:disease » subclass UBO:0000005:function UBO:0000005:function – subclass GO:0003674:molecular_function GO:0003674:molecular_function subclass BFO:disposition:disposition BFO:disposition:disposition – subclass BFO:independent_entity:independent_entity BFO:independent_entity:independent_entity subclass UBO:0000002:substance UBO:0000002:substance – subclass UBO:0000019:protein UBO:0000019:protein – subclass GO:0005575:cellular_component GO:0005575:cellular_component – subclass UBO:0000006:anatomical_entity UBO:0000006:anatomical_entity » subclass UBO:0000008:gross_anatomical_entity UBO:0000008:gross_anatomical_entity – subclass UBO:0000007:organism UBO:0000007:organism » subclass UBO:0000015:microbe UBO:0000015:microbe » subclass UBO:0000014:plant UBO:0000014:plant » subclass UBO:0000017:animal UBO:0000017:animal subclass BFO:fiat_part_of_substance:fiat_part_of_substance BFO:fiat_part_of_substance:fiat_part_of_substance subclass BFO:boundary_of_substance:boundary_of_substance BFO:boundary_of_substance:boundary_of_substance subclass BFO:aggregate_of_substances:aggregate_of_substances BFO:aggregate_of_substances:aggregate_of_substances subclass BFO:occurrent:occurrentBFO:occurrent:occurrent – subclass BFO:dependent_occurrent:dependent_occurrent BFO:dependent_occurrent:dependent_occurrent subclass UBO:0000004:process UBO:0000004:process –subclass GO:0008150:biological_processGO:0008150:biological_process subclass BFO:fiat_part_of_process:fiat_part_of_process BFO:fiat_part_of_process:fiat_part_of_process – subclass UBO:0000029:life_cycle_stage UBO:0000029:life_cycle_stage subclass BFO:aggregate_of_processes:aggregate_of_processes BFO:aggregate_of_processes:aggregate_of_processes –subclass EO:0007359:environment ontologyEO:0007359:environment ontology subclass BFO:temporal_boundary_of_process:temporal_boundary_of_process BFO:temporal_boundary_of_process:temporal_boundary_of_process – subclass BFO:independent_occurrent:independent_occurrent BFO:independent_occurrent:independent_occurrent

27 http://ncor.us 27 OBO Relation Ontology (RO) Clear distinction between universals (classes, kinds, types and instances (individuals, tokens Precise formal definitions of relations Automatic applicability to time-indexed instance- data e.g. in Electronic Health Record Consistency with the Relation Ontology now a criterion for admission to the OBO ontology library see Genome Biology Apr. 2006

28 http://ncor.us 28 Three types of relations between instances: Mary’s heart part_of Mary between an instance and a universal: Mary instance_of homo sapiens between universals: gastrulation part_of embryonic development

29 http://ncor.us 29 A suite of primitive instance-level relations identical_to part_of located_in adjacent_to earlier derives_from...

30 http://ncor.us 30 A suite of defined relations between universals Foundationalis_a part_of Spatiallocated_in contained_in adjacent_to Temporaltransformation_of derives_from preceded_by Participationhas_participant has_agent

31 http://ncor.us 31 GALEN: Vomitus contains carrot All portions of vomit contain all portions of carrot All portions of vomit contain some portion of carrot Some portions of vomit contain some portion of carrot Some portions of vomit contain all portions of carrot

32 http://ncor.us 32 all-some structure A part_of B =def. given any instance a of A there is some instance b of B such that a part_of b on the instance level Allows automatic ontology integration via cascading reasoning: A R 1 B B R 2 C  A R 3 C

33 http://ncor.us 33 adjacent_to cell wall adjacent_to cytoplasm intron adjacent_to exon Golgi apparatus adjacent_to endoplasmic reticulum periplasm adjacent_to plasma membrane presynaptic membrane adjacent_to synaptic cleft

34 http://ncor.us 34 A adjacent_to B every instance of A stands in the instance- level adjacent_to relation to some instance of B

35 http://ncor.us 35 adjacent_to as a relation between universals is not symmetric nucleus adjacent_to cytoplasm Not: cytoplasm adjacent_to nucleus seminal vesicle adjacent_to urinary bladder Not: urinary bladder adjacent_to seminal vesicle

36 http://ncor.us 36 The Granularity Gulf most existing data-sources are of fixed, single granularity many (all?) clinical phenomena cross granularities

37 http://ncor.us 37 Main obstacle to integrating genetic and EHR data No facility for dealing with time and instances (particulars, individuals) in current ontologies

38 http://ncor.us 38 Key idea To define ontological relations like part_of, develops_from it is not enough to look just at universals / classes / types / ‘concepts’ : we need also to take account of instances and time

39 http://ncor.us 39 transformation_of A transformation_of B =def. any instance of A was at some earlier time an instance of B

40 http://ncor.us 40 transformation_of c at t 1 C c at t C 1 time same instance mature RNA transformation_of pre-RNA adult transformation_of child carcinomatous colon transformation_of colon

41 http://ncor.us 41 transformation_of relations cross both time and granularity C c at t c at t 1 C 1

42 http://ncor.us 42 Advantages of the methodology of enforcing commonly accepted coherent definitions promote quality assurance (better coding) guarantee automatic reasoning across ontologies and across data at different granularities yields direct connection to times and instances in the EHR


Download ppt "1 Ontologies in Biomedicine: The Good, The Bad and The Ugly Barry Smith"

Similar presentations


Ads by Google