Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontology mapping: a way out of the medical tower of Babel?

Similar presentations


Presentation on theme: "Ontology mapping: a way out of the medical tower of Babel?"— Presentation transcript:

1 Ontology mapping: a way out of the medical tower of Babel?
Frank van Harmelen Vrije Universiteit Amsterdam The Netherlands Antilles

2 Before we start… a talk on ontology mappings is difficult talk to give: no concensus in the field on merits of the different approaches on classifying the different approaches no one can speak with authority on the solution this is a personal view, with a sell-by date other speakers will entirely disagree (or disapprove) picture of tower of babel?

3 Good overviews of the topic
Knowledge Web D2.2.3: “State of the art on ontology alignment” Ontology Mapping Survey talk by Siyamed Seyhmus SINIR ESWC'05 Tutorial on Schema and Ontology Matching by Pavel Shvaiko Jerome Euzenat KER 2003 paper Kalfoglou & Schorlemmer These are all different & incompatible…

4 Ontology mapping: a way out of the medical tower of Babel?

5 The Medical tower of Babel
Mesh Medical Subject Headings, National Library of Medicine descriptions EMTREE Commercial Elsevier, Drugs and diseases terms, synonyms UMLS Integrates 100 different vocabularies SNOMED concepts, College of American Pathologists Gene Ontology terms in molecular biology NCI Cancer Ontology: 17,000 classes (about 1M definitions),

6 Ontology mapping: a way out of the medical tower of Babel?

7 What are ontologies & what are they used for
world concept language Agree on a conceptualization no shared understanding Conceptual and terminological confusion Make it explicit in some language. Actors: both humans and machines

8 Ontologies come in very different kinds
From lightweight to heavyweight: Yahoo topic hierarchy Open directory ( general categories) Cyc, axioms From very specific to very general METAR code (weather conditions at air terminals) SNOMED (medical concepts) Cyc (common sense knowledge)

9 What’s inside an ontology?
terms + specialisation hierarchy classes + class-hierarchy instances slots/values inheritance (multiple? defaults?) restrictions on slots (type, cardinality) properties of slots (symm., trans., …) relations between classes (disjoint, covers) reasoning tasks: classification, subsumption Increasing semantic “weight” increasing degree of semantics/formality

10 In short (for the duration of this talk)
Ontologies are not definitive descriptions of what exists in the world (= philosphy) Ontologies are models of the world constructed to facilitate communication Yes, ontologies exist (because we build them)

11 Ontology mapping: a way out of the medical tower of Babel?

12  Ontology mapping is old & inevitable
db schema integration federated databases Ontology mapping is inevitable ontology language is standardised, don't even try to standardise contents compare relational (only structural, not semantics) against ontology (constrain semantics, logical axoims)

13  Ontology mapping is important
database integration, heterogeneous database retrieval (traditional) catalog matching (e-commerce) agent communication (theory only) web service integration (urgent) P2P information sharing (emerging) personalisation (emerging)

14  Ontology mapping is now urgent
Ontology mapping has acquired new urgency physical and syntactic integration is ± solved, (open world, web) automated mappings are now required (P2P) shift from off-line to run-time matching Ontology mapping has new opportunities larger volumes of data richer schemas (relational vs. ontology) applications where partial mappings work

15 Different aspects of ontology mapping
how to discover a mapping how to represent a mapping subset/equal/disjoint/overlap/ is-somehow-related-to logical/equational/category-theoretical atomic/complex arguments, confidence measure how to use it We only talk about “how to discover”

16 Many experimental systems: (non-exhaustive!)
Prompt (Stanford SMI) Anchor-Prompt (Stanford SMI) Chimerae (Stanford KSL) Rondo (Stanford U./ULeipzig) MoA (ETRI) Cupid (Microsoft research) Glue (Uof Washington) FCA-merge (UKarlsruhe) IF-Map Artemis (UMilano) T-tree (INRIA Rhone-Alpes) S-MATCH (UTrento) Coma (ULeipzig) Buster (UBremen) MULTIKAT (INRIA S.A.) ASCO (INRIA S.A.) OLA (INRIA R.A.) Dogma's Methodology ArtGen (Stanford U.) Alimo (ITI-CERTH) Bibster (UKarlruhe) QOM (UKarlsruhe) KILT (INRIA LORRAINE)

17 Different approaches to ontology matching
Linguistics & structure Shared vocabulary Instance-based matching Shared background knowledge going to review the first 3 quickly, spend most time on the fourth one

18 Linguistic & structural mappings
normalisation (case,blanks,digits,diacritics) lemmatization, N-grams, edit-distance, Hamming distance, distance = fraction of common parents elements are similar if their parents/children/siblings are similar problem: ontologies are semantic objects, these methods entirely ignore the semantics decreasing order of boredom

19 Different approaches to ontology matching
Linguistics & structure Shared vocabulary Instance-based matching Shared background knowledge

20 Matching through shared vocabulary
Q Up(Q) Q Low(Q) U Low(Q) µ Q µ I Up(Q) Early results with post-doc

21 Matching through shared vocabulary
Used in mapping geospatial databases from German land-registration authorities (small) Used in mapping bio-medical and genetic thesauri (large) Early results with post-doc

22 Different approaches to ontology matching
Linguistics & structure Shared vocabulary Instance-based matching Shared background knowledge

23 Matching through shared instances
Early results with post-doc

24 Matching through shared instances
Used by Ichise et al (IJCAI’03) to succesfully map parts of Yahoo to parts of Google Yahoo = 8402 classes, instances Google = 8343 classes, instances Only 6000 shared instances 70% - 80% accuracy obtained (!) Conclusions from authors: semantics is needed to improve on this ceiling Early results with post-doc

25 Different approaches to ontology matching
Linguistics & structure Shared vocabulary Instance-based matching Shared background knowledge

26 Matching using shared background knowledge
ontology 1 ontology 2 Early results with post-doc

27 Ontology mapping using background knowledge Case study 1
Work with Zharko Philips Michel VU AMC PHILIPS

28 Overview of test data Two terminologies from intensive care domain
OLVG list List of reasons for ICU admission AMC list DICE hierarchy Additional hierarchical knowledge describing the reasons for ICU admission

29 OLVG list developed by clinician 3000 reasons for ICU admission
1390 used in first 24 hours of stay 3600 patients since 2000 based on ICD9 + additional material List of problems for patient admission Each reason for admission is described with one label Labels consist of 1.8 words on average redundancy because of spelling mistakes implicit hierarchy (e.g. many fractures)

30 AMC list List of 1460 problems for ICU admission
Each problem is described using 5 aspects from the DICE terminology: 2500 concepts (5000 terms), 4500 links Abnormality (size: 85) Action taken (size: 55) Body system (size: 13) Location (size: 1512) Cause (size: 255) expressed in OWL allows for subsumption & part-of reasoning

31 Why mapping AMC list $ OLVG list?
allow easy entering of OLVG data re-use of data in epidemiology quality of care assessment data-mining (patient prognosis)

32 Linguistic mapping: Compare each pair of concepts
Use labels and synonyms of concepts Heuristic method to discover equivalence and subclass relations More specific than Long brain tumor Long tumor First round compare with complete DICE 313 suggested matches, around 70 % correct Second round: only compare with “reasons for admission” subtree 209 suggested matches, around 90 % correct High precision, low recall (“the easy cases”)

33 Using background knowledge
Use properties of concepts Use other ontologies to discover relation between properties ? …. ….

34 DICE aspect taxonomies
Semantic match DICE aspect taxonomies Given Lexical match ? Abnormality taxonomy ? Action taxonomy ? Body system taxonomy ? Location taxonomy ? Cause taxonomy Implicit matching: property match OLVG problem list DICE problem list

35 Semantic match Lexical match: has location Lexical match: has location
Taxonomy of body parts Blood vessel is more general is more general Vein Artery is more general Aorta Lexical match: has location Lexical match: has location Reasoning: implies Aorta thoracalis dissection Dissection of artery Location match: has more general location

36 Example: “Heroin intoxication” – “drugs overdose”
Cause taxonomy Drugs is more general Heroine Lexical match: cause Lexical match: cause Cause match: has more specific cause Heroin intoxication Drugs overdosis Abnormality match: has more general abnormality Lexical match: abnormality Lexical match: abnormality Abnormality taxonomy Intoxicatie is more general Overdosis

37 Example results OLVG: Acute respiratory failure DICE: Asthma cardiale
OLVG: Aspergillus fumigatus DICE: Aspergilloom OLVG: duodenum perforation DICE: Gut perforation OLVG: HIV DICE: AIDS OLVG: Aorta thoracalis dissectie type B DICE: Dissection of artery abnormality cause abnormality, cause cause location, abnormality

38 Extension: approximate matching
Terms are not precisely defined Terms are not precisely used Exact reasoning will not be useful B A A ½ B ?

39 Approximate matching Translate every class-name into a propositional formula (both DNF and CNF versions) A  B = (Ai  Bk) = i,k (Ai  Bk) ignore increasing number. of (i,k)-subsumption pairs varies from classical to trivial

40 Results (obtained on different domain)

41 Ontology mapping using background knowledge Case study 2
Work with Heiner Stuckenschmidt @ VU

42 Case Study: Map GALEN & Tambis, using UMLS as background knowledge
Select three topics with sufficient overlap Substances Structures Processes Define some partial & ad-hoc manual mappings between individual concepts Represent mappings in C-OWL Use semantics of C-OWL to verify and complete mappings Partial -> complete later Ad-hoc -> verify later

43 (medical terminology)
Case Study: UMLS (medical terminology) verification & derivation verification & derivation Animate diagram Derived mapping only possible after identity assumption on equal domains lexical mapping lexical mapping derived mapping GALEN (medical ontology) Tambis (genetic ontology)

44 Ad hoc mappings: Substances
UMLS GALEN Notice: UMLS has two views vs. GALEN mixed, Notice: mappings high and low in the hierarchy, few in the middle Notice: mappings high and low in the hierarchy, few in the middle

45 Ad hoc mappings: Substances
UMLS Tambis Notice different grainsize: UMLS course, Tambis fine

46 Verification of mappings
UMLS:Chemicals = Tambis:Chemical UMLS:Chemicals_ viewed_structurally ? Tambis:enzyme UMLS:Chemicals_ viewed_functionally = Either: mapping is wrong or UMLS classes are non-disjoint UMLS:enzyme

47 Deriving new mappings   =  UMLS:substance
UMLS:Phenomenon_ or_process UMLS:Chemicals Galen: ChemicalSubstance UMLS:OrganicChemical =

48 Ontology mapping: a way out of the medical tower of Babel?

49 “Conclusions” Ontology mapping is (still) hard & open
Many different approaches will be required: linguistic, structural statistical semantic Currently no roadmap theory on what's good for which problems

50 Challenges roadmap theory run-time matching “good-enough” matches
large scale evaluation methodology hybrid matchers (needs roadmap theory)

51 Ontology mapping: a way out of the medical tower of Babel?


Download ppt "Ontology mapping: a way out of the medical tower of Babel?"

Similar presentations


Ads by Google