VT. 2 Ontology, the Semantic Web and the Unification of Medical Knowledge Barry Smith.

Slides:



Advertisements
Similar presentations
Ontology Assessment – Proposed Framework and Methodology.
Advertisements

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Ontology From Wikipedia, the free encyclopedia In philosophy, ontology (from the Greek oν, genitive oντος: of being (part. of εiναι: to be) and –λογία:
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
A Framework for Ontology-Based Knowledge Management System
Deriving Semantic Description Using Conceptual Schemas Embedded into a Geographic Context Centre for Computing Research, IPN Geoprocessing Laboratory Miguel.
1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data.
1 An Ontology of Relations for Biomedical Informatics Barry Smith 10 January 2005.
The Semantic Web Barry Smith
1 Ontology in 15 Minutes Barry Smith. 2 Main obstacle to integrating genetic and EHR data No facility for dealing with time and instances (particulars)
Thomas Bittner and Barry Smith IFOMIS (Saarbrücken) Normalizing Medical Ontologies Using Basic Formal Ontology.
VT. From Basic Formal Ontology to Medicine Barry Smith and Anand Kumar.
Werner Ceusters Language & Computing nv Ontologies for the medical domain: current deficiencies in light of the needs of medical natural language.
The Semantic Web Week 12 Term 1 Recap Lee McCluskey, room 2/07 Department of Computing And Mathematical Sciences Module Website:
BFO/MedO: Basic Formal Ontology and Medical Ontology Draft ( )
1 VT 2 Ontology and Ontologies Barry Smith 3 IFOMIS Strategy get real ontology right first and then investigate ways in which this real ontology can.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Knowledge Representation Reading: Chapter
Kmi.open.ac.uk Semantic Execution Environments Service Engineering and Execution Barry Norton and Mick Kerrigan.
Philosophy and Computer Science: New Perspectives of Collaboration
Ifomis.org 1 Biomedical Ontology in Saarbrücken Barry Smith
1 Formal Ontology and Information Systems Barry Smith
Foundations This chapter lays down the fundamental ideas and choices on which our approach is based. First, it identifies the needs of architects in the.
Domain-Specific Software Engineering Alex Adamec.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Formulating objectives, general and specific
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Computational Thinking The VT Community web site:
1 Introduction to Modeling Languages Striving for Engineering Precision in Information Systems Jim Carpenter Bureau of Labor Statistics, and President,
Knowledge representation
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
Ontologies for the Integration of Geospatial Data Michael Lutz Workshop: Semantics and Ontologies for GI Services, 2006 Paper: Lutz et al., Overcoming.
VT. 2 The First Industrial- Strength Philosophy 3 IFOMIS Institute for Formal Ontology and Medical Information Science
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Copyright OpenHelix. No use or reproduction without express written consent1.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Mining the Biomedical Research Literature Ken Baclawski.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
Approach to building ontologies A high-level view Chris Wroe.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
VT. Institute for Formal Ontology and Medical Information Science.
Background-assumptions in knowledge representation systems Center for Cultural Informatics, Institute of Computer Science Foundation for Research and Technology.
VT. Realism, Concepts and Categories Or: how realism can be pragmatically useful for information systems Barry Smith.
International Workshop 28 Jan – 2 Feb 2011 Phoenix, AZ, USA Ontology in Model-Based Systems Engineering Henson Graves 29 January 2011.
1 Biomarkers in the Ontology for General Medical Science Medical Informatics Europe (MIE) 2015 May 28, 2015 – Madrid, Spain Werner CEUSTERS 2, MD and Barry.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
1 Standards and Ontology Barry Smith
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Philosophy and Computer Science: New Perspectives of Collaboration
COMP6215 Semantic Web Technologies
CCNT Lab of Zhejiang University
ece 627 intelligent web: ontology and beyond
The Systems Engineering Context
Ontology From Wikipedia, the free encyclopedia
Ontology in 15 Minutes Barry Smith.
Chapter 2 Database Environment.
Introduction to Applied and Theoretical Ontology Barry Smith
Dr. Awad Khalil Computer Science Department AUC
Ontology in 15 Minutes Barry Smith.
Presentation transcript:

VT

2 Ontology, the Semantic Web and the Unification of Medical Knowledge Barry Smith

3 IFOMIS Institute for Formal Ontology and Medical Information Science

4 The problem Different communities of medical researchers use different and often incompatible category systems in expressing the results of their work

5 Example: Medical Nomenclature UMLS: blood is a tissue MeSH: blood is a body fluid

6 The solution “ONTOLOGY!” But what does “ontology” mean?

7 Two alternative readings Ontologies are special sorts of terminology systems = currently popular IT conception, with roots in KR Ontologies are special sorts of theories about entities in reality = traditional philosophical conception, embraced by IFOMIS

8 Example: The Gene Ontology (GO) hormone ; GO: %digestive hormone ; GO: %peptide hormone ; GO: %adrenocorticotropin ; GO: %glycopeptide hormone ; GO: %follicle-stimulating hormone ; GO: % = subsumption (lower term is_a higher term)

9 as tree hormone digestive hormone peptide hormone adrenocorticotropin glycopeptide hormone follicle-stimulating hormone

10 GO is very useful for purposes of standardization in the reporting of genetic information but it is not much more than a telephone directory of standardized designations organized into hierarchies

11 GO can in practice be used only by trained biologists whether a GO-term stands in the subsumption relationship depends on the context in which the term is used (for example on the type of organism)

12 A still more important problem: GDB Genome Database of Human Genome Project GenBank National Center for Biotechnology Information, Washington DC etc.

13 What is a gene? GDB: a gene is a DNA fragment that can be transcribed and translated into a protein GenBank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype GO uses ‘gene’ in its term hierarchy, but it does not tell us which of these definitions is correct

14 GO has no robust formal organization no capability to be aligned with systems which would have the power to use it to reason with genetic information

15 GO deals with basic ontological notions very haphazardly GO’s three main term-hierarchies are: component, function and process But GO confuses functions with structures, and also with executions of functions and has no clear account of the relation between functions and processes

16 IFOMIS: Get basic ontological organization right and problems of formalization (consistency, portability) will become easier to solve later

17 Current orthodoxy focuses instead on issues of representation (XML) and reasoning (Description logics)

18 Description logics decidable logics, thus expressively weaker than first-order predicate logic used for ensuring consistency of definitions of terms and for computing relations of subsumption ontologically neutral (i.e. neutral as between good ontology and ontological nonsense)

19 SNOMED RT (2000) already has description logic definitions but it also has some bad coding, which derives from failure to pay attention to ontological principles: e.g. both testes is_a testis

20 See Workshop: CEUSTERS Werner, SMITH Barry Ontology for the Medical Domain Room E Today: Ontology for the Medical Domain

21 is supposed to DL is supposed to allow future SNOMED to reason from data formulated in a structured way to handle multiple relationship types, in addition to is_a to take account of context-sensitivity in use of terms

22 The long march of Description Logic Today SNOMED Tomorrow THE WORLD

23 The Semantic Web Initiative The Web is a vast edifice of heterogeneous data sources Needs the ability to query and integrate across different conceptual systems

24 How resolve such incompatibilities? enforce terminological compatibility via standardized term hierarchies, with standardized definitions of terms, which 1. satisfy the constraints of a description logic (DL) 2. are applied as meta-tags to websites

25 Metadata: the new Silver Bullet agree on a metadata standard for washing machines as concerns size, price, etc. create machine-readable databases and put them on the net  consumers can query multiple sites simultaneously and search for highly specific, reliable, context-sensitive results

26 A world of exhaustive, reliable metadata would be a utopia.

27 PLAN General problems with the Semantic Web initiative (Partial) solutions to these general problems in the medical domain Problems specific to the medical domain

28 The Semantic Web General problems with the Semantic Web initiative (Partial) solutions to these general problems in the medical domain Problems specific to the medical domain

29 Problem 1: People lie Meta-utopia is a world of reliable metadata. But poisoning the well can confer benefits to the poisoners Metadata exists in a competitive world. Some people are crooks. Some people are cranks.

30 Problem 2: People are lazy Half the pages on Geocities are called “Please title this page”

31 Problem 3: People are stupid The vast majority of the Internet's users (even those who are native speakers of English) cannot spell or punctuate Will internet users learn to accurately tag their information with whatever DL- hierarchy they're supposed to be using?

32 Problem 4: Multiple descriptions “Requiring everyone to use the same vocabulary denudes the cognitive landscape, enforces homogeneity in ideas.” (Cary Doctorow)

33 Problem 5: Ontology Impedance = semantic mismatch between ontologies being merged This problem recognized in Semantic Web literature: /About/Deliverables/ontoweb-del-7.6-swws1.pdf

34 Solution 1: treat it as (inevitable) ‘impedance’ and learn to find ways to cope with the disturbance which it brings Suggested here: out/Deliverables/ontoweb-del-7.6-swws1.pdf

35 Solution 2: resolve the impedance problem on a case-by-case basis Suppose two databases are put on the web. Someone notices that "where" in the friends table and "zip" in the places table mean the same thing.

36 Both solutions fail 1.treating mismatches as ‘impedance’ ignores the problem of error propagation (and is inappropriate in an area like medicine) 2. resolving impedance on a case-by- case basis defeats the very purpose of the Semantic Web

37 The Semantic Web General problems with the Semantic Web initiative (Partial) solutions to these general problems in the medical domain Problems specific to the medical domain

38 Solutions in the medical domain Problem 1: People lie Problem 2: People are lazy Problem 3: People are stupid None of these is true in the world of medical informatics

39 Solutions in the medical domain Problem 1: People lie Problem 2: People are lazy Problem 3: People are stupid Achieve quality control via division of labour

40 Division of Labour 1. Clinical activities 2. Structured data representation 3. Software coding (e.g. for NLP)

41 Division of Labour 1. Clinical activities 2. Structured data representation 3. Software coding 4. Ontology building Use 4. to constrain 2. and 3. to achieve better data processing via quality control

42 DL-Division of Labour 1. Clinical activities 2. Structured data representation 3. Software coding 4. Ontology building For DL 4. is a special case of 3.

43 For DL Ontologies are software tools thus limited in their expressive power and in their effectiveness as quality controls

44 IFOMIS idea: distinguish two separate tasks: - the task of developing computer applications capable of running in real time -the task of developing an expressively rich ontology of a sort which will allow sophisticated quality control

45 The Semantic Web General problems with the Semantic Web initiative (Partial) solutions to these general problems in the medical domain Problems specific to, or made more acute within, the medical domain

46 Problem 4: Multiple descriptions Requiring everyone to use the same vocabulary to describe their material is not always medically practicable

47 Clinicians often do not use category systems at all – they use unstructured text from which usable data has to be extracted in a further step Why? Because every case is different, much patient data is context-dependent

48 Problem 5: Ontology Impedance = semantic mismatch between ontologies ‘gene’ used in websites issued by biotech companies involved in gene patenting medical researchers interested in role of genes in predisposition to smoking insurance companies

49 Other problems with DL-based ontologies DL poor when dealing with context- dependent information/usages of terms DL poor when it comes to dealing with information about instances (rather than concepts or classes)  also DL poor when it comes to dealing with time

50 SARS is NOT Severe Acute Respiratory Syndrome it is THIS collection of instances of Severe Acute Respiratory Syndrome associated with THIS coronavirus and ITS mutations

51 different terminology systems

52 need not interconnect at all for example they may relate to entities of different granularity

53 we cannot make incompatible terminology-systems interconnect just by looking at concepts, or knowledge or language

54 to decide which of a plurality of competing definitions to accept we need some tertium quid

55 we need, in other words, to take the world itself into account

56 BFO = basic formal ontology

57 BFO ontology not the ‘standardization’ or ‘specification’ of concepts (not a branch of knowledge or concept engineering) but an inventory of the types of entities existing in reality

58 BFO goal: to remove ontological impedance by constraining terminology systems with good ontology

59 BFO not a computer application but a reference ontology (not a reference terminology in the sense of SNOMED)

60 Recall: GDB: a gene is a DNA fragment that can be transcribed and translated into a protein Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype

61 Ontology ‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’, ‘type’... ‘part’, ‘whole’, ‘function’, ‘inhere’, ‘substance’ … are ontological terms in the sense of traditional (philosophical) ontology

62 UMLS has ontological problems, too Idea or Concept Functional Concept Qualitative Concept Quantitative Concept Spatial Concept Body Location or Region Body Space or Junction Geographic Area Molecular Sequence Amino Acid Sequence Carbohydrate Sequence Nucleotide Sequence

63 UMLS has ontological problems, too Idea or Concept Functional Concept Qualitative Concept Quantitative Concept Spatial Concept Body Location or Region Body Space or Junction Geographic Area Molecular Sequence Amino Acid Sequence Carbohydrate Sequence Nucleotide Sequence

64 St. Malo is an Idea or Concept

65 UMLS has ontological problems, too Idea or Concept Functional Concept Qualitative Concept Quantitative Concept Spatial Concept Body Location or Region Body Space or Junction Geographic Area Molecular Sequence Amino Acid Sequence Carbohydrate Sequence Nucleotide Sequence

66 The Reference Ontology Community IFOMIS (Leipzig) Laboratories for Applied Ontology (Trento/Rome, Turin) Foundational Ontology Project (Leeds) Ontology Works (Baltimore) Ontek Corporation (Buffalo/Leeds) Language and Computing (L&C) (Belgium/Philadelphia)

67 Domains of Current Work IFOMIS Leipzig: Medicine, Bioinformatics Laboratories for Applied Ontology Trento/Rome: Ontology of Cognition/Language Turin: Law Foundational Ontology Project: Space, Physics Ontology Works: Genetics, Molecular Biology Ontek Corporation: Biological Systematics Language and Computing: Natural Language Understanding

68 Two basic BFO oppositions Granularity (of molecules, genes, cells, organs, organisms...) SNAP vs. SPAN getting time right of crucial importance for medical informatics

69 SNAP vs. SPAN Two different ways of existing in time: continuing to exist (of organisms, their qualities, roles, functions, conditions) occurring (of processes) SNAP vs. SPAN = Anatomy vs. Physiology

SNAP: Entities existing in toto at a time

71 Three kinds of SNAP entities 1.SNAP Independent: Substances, Objects, Things 2.SNAP Dependent: Qualities, Functions, Conditions, Roles 3.SNAP Spatial regions

SNAP-Independent

SNAP Dependent

SNAP-Spatial Region

75 SPAN: Entities occurring in time

76 SPAN Dependent (Processes)

77 SPAN Spatiotemporal Regions

78 Realization (SNAP  SPAN) the execution of a plan the expression of a function the exercise of a role the realization of a disposition the course of a disease the application of a therapy

79 SNAP dependent entities and their SPAN realizations plan function role disposition disease therapy SNAP

80 SNAP dependent entities and their SPAN realizations execution expression exercise realization course application SPAN

81 More examples: performance of a symphony projection of a film expression of an emotion utterance of a sentence increase of body temperature spreading of an epidemic extinguishing of a forest fire movement of a tornado

82 BFO = SNAP/SPAN + Theory of Granular Partitions + theory of universals and instances theory of part and whole theory of boundaries theory of functions, powers, qualities, roles theory of environments theory of spatial and spatiotemporal regions

83 MedO: medical domain ontology universals and instances and normativity theory of part and whole and absence theory of boundaries/membranes theory of functions, powers, qualities, roles, (mal)functions, bodily systems theory of environments: inside and outside the organism theory of spatial and spatiotemporal regions: anatomical mereotopology

84 MedO: medical domain ontology theory of granularity relations between molecule ontology gene ontology cell ontology anatomical ontology etc.

85 Theory of Granular Partitions See Workshop: Ontology for the Medical Domain Ontology for the Medical Domain Room E:

86 Testing the BFO/MedO approach collaboration with Language and Computing nv (

87 The Project collaborate with L&C to show how an ontology constructed on the basis of philosophical principles can help in overhauling and validating the large terminology-based medical ontology LinkBase ® used by L&C for NLP

88 L&C LinKBase®: world’s largest terminology-based ontology with mappings to UMLS, SNOMED, etc. + LinKFactory®: suite for developing and managing large terminology-based ontologies

89 LinKBase BFO and MedO designed to add better reasoning capacity by tagging LinKBase domain-entities with corresponding BFO/MedO categories by constraining links within LinKBase according to the theory of granular partitions

90 L&C’s long-term goal Transform the mass of unstructured patient records into a gigantic medical experiment

91 IFOMIS’s long-term goal Build a robust high-level BFO-MedO framework THE WORLD’S FIRST INDUSTRIAL- STRENGTH PHILOSOPHY which can serve as the basis for an ontologically coherent unification of medical knowledge and terminology

92 END

93 Description Logics allow specifying a terminological hierarchy using a restricted set of first order formulas. They usually have nice computational properties (often decidable and tractable) but the inference services are restricted to classification and subsumption. That means, given formulae describing classes, the classifier associated with a certain description logic will place them inside a hierarchy, and given an instance description, the classifier will determine the most specific classes to which the particular instance belongs.

94 Good metadata Google exploits metadata in the form of: number of links pointing at a page – a measure of reliability Observational metadata vs. good human- created metadata vs. marketing hype

95 Two super-categories in DL Concepts (e.g. blood) Definitions (term strings associated with concepts) Relationships (e.g. is_a) E.g. fetal blood stands in the relation is_a to blood

96 DL thus goes hand in hand with the assumption that ontology deals with ‘simplified models’ Tom Gruber (1993): An ontology should make as few claims as possible about the world being modeled … specifying the weakest theory (allowing the most models) and defining only those terms that are essential to the communication of knowledge consistent with that theory.

97 Semantic Web effort thus far devoted primarily to developing systems for standardized representation of web pages and web processes (= ontology of web typography) not to the harder task of developing of ontologies (term hierarchies) for the content of such web pages

98 BFO vs. KR In the knowledge engineering world in which information systems ontology has its home terms and definitions come first, – the job is to validate them and reason with them In the BFO world robust ontology (with all its reasoning power) comes first and terms and term-hierarchies must be subjected to the constraints of ontological coherence

99 Problem 4: Metrics influence results Example: software which scores well on convenience scores badly on security Every player in a metadata standards body will want to emphasize their high- scoring axes