Download presentation
Presentation is loading. Please wait.
1
Semantic Enrichment of Ontology Mappings
Insights and Outlook Patrick Arnold Seminar Zingst 2015
2
Semantic Enrichment of Ontology Mappings
Outline Introduction The STROMA System STROMA Strategies The SemRep System SemRep Quality Augmentation Evaluations Outlook and Conclusions 5/9/2019 Semantic Enrichment of Ontology Mappings
3
Semantic Enrichment of Ontology Mappings
Introduction 5/9/2019 Semantic Enrichment of Ontology Mappings
4
Semantic Enrichment of Ontology Mappings
1. Introduction Semantic Enrichment: Determining the relation type of correspondences within a mapping Input: Initial mapping Output: Enriched mapping 5/9/2019 Semantic Enrichment of Ontology Mappings
5
Semantic Enrichment of Ontology Mappings
1. Introduction Focus: Schema/ontology structure, mapping structure Concept names, possibly concept paths No instance data No ontologies or schema Two general approaches: Lexicographic (morphological) analysis Background Knowledge 5/9/2019 Semantic Enrichment of Ontology Mappings
6
1. Introduction – Example 1
Prevent imprecise ontology merging 5/9/2019 Semantic Enrichment of Ontology Mappings
7
1. Introduction – Example 2
Recognize incompatible ontologies 5/9/2019 Semantic Enrichment of Ontology Mappings
8
1. Introduction – Example 3
Detect the need for transformation functions Only needed for databases SRC TRG First name part-of Name Last name part-of concatenate() 5/9/2019 Semantic Enrichment of Ontology Mappings
9
Semantic Enrichment of Ontology Mappings
2. STROMA 5/9/2019 Semantic Enrichment of Ontology Mappings
10
2. Semantic Enrichment with STROMA 2.1 Overview
STROMA: Mapping Enrichment Tool to determine the relation types of correspondences Iterates through all correspondences Comprises several strategies to determine the type Type having the highest score will be the final relation type 5/9/2019 Semantic Enrichment of Ontology Mappings
11
2. Semantic Enrichment with STROMA 2.2 Architecture / Workflow
5/9/2019 Semantic Enrichment of Ontology Mappings
12
2. Semantic Enrichment with STROMA 2.3 Benefits and Drawbacks
Much faster than classic match approaches Completely independent from initial match system Easy to handle Drawbacks High dependence on initial mapping (“garbage in – garbage out”) Initial mapping drawn to equality correspondences 5/9/2019 Semantic Enrichment of Ontology Mappings
13
Semantic Enrichment of Ontology Mappings
3. STROMA STRATEGIES 5/9/2019 Semantic Enrichment of Ontology Mappings
14
3. STROMA Strategies 3.2 Basic Strategies
Compound Strategy Concludes is-a relation if compound matches its head Compounds: Very productive means of word formation Is-a Mountain bike Bike ? Saw tooth Tooth 5/9/2019 Semantic Enrichment of Ontology Mappings
15
3. STROMA Strategies 3.2 Basic Strategies
Background Knowledge Strategy Ask a thesaurus or dictionary SemRep: Integrated Repository of lexicographic resources cpu, computer, English Background Knowledge cpu computer part-of 5/9/2019 Semantic Enrichment of Ontology Mappings
16
3. STROMA Strategies 3.3 Derived Strategies
Itemization Strategy: How to handle itemizations? Most frequently in product taxonomies Examples: Laptops and Computers Bikes, Scooters and Motorbikes Approach: Remove items from item sets Goal: Empty set, or set with at most 1 concept 5/9/2019 Semantic Enrichment of Ontology Mappings
17
3. STROMA Strategies 3.4 Heuristic Strategies
Multiple Linkage Conclude inverse is-a relation if a source node s is connected to several target nodes t1, ..., tn n ≥ 3 5/9/2019 Semantic Enrichment of Ontology Mappings
18
3. STROMA Strategies 3.4 Heuristic Strategies
Word Frequency Comparison Compare the rank of two concepts in a Word Frequency Table Conclude A is-a B relation if one rank(A) is considerably lower compared to rank(B) Examples laptop (#11,028) <is-a> computer (#1,326) vehicle (#2,845) <is-a> car (#243) 5/9/2019 Semantic Enrichment of Ontology Mappings
19
3. STROMA Strategies 3.5 Verification
Comparing concepts alone may lead to false conclusions Example 1: False IS-A 5/9/2019 Semantic Enrichment of Ontology Mappings
20
3. STROMA Strategies 3.5 Verification
Comparing concepts alone may lead to false conclusions Example 2: False EQUAL 5/9/2019 Semantic Enrichment of Ontology Mappings
21
3. STROMA Strategies 3.6 Comparison
Strategy Dependencies 5/9/2019 Semantic Enrichment of Ontology Mappings
22
3. STROMA Strategies 3.6 Comparison
Strategy-Type-Comparison 5/9/2019 Semantic Enrichment of Ontology Mappings
23
Semantic Enrichment of Ontology Mappings
4. SEMREP SYSTEM 5/9/2019 Semantic Enrichment of Ontology Mappings
24
4. SemRep System 4.1 Overview
SemRep: Semantic Repository combining different lexicographic resources Allows queries across those resources Multi-linguistic Extandable Designed for Mapping Enrichment Integrated Resources: WordNet Wikipedia Relations UMLS (extract) OpenThesaurus (German) 5/9/2019 Semantic Enrichment of Ontology Mappings
25
4. SemRep System 4.2 Wikipedia Extraction
Extracting semantic concept relations from Wikipedia Parse the definition sentence (first sentence) of each Wikipedia article Find the semantic relation pattern(s) Extract the concept terms the pattern connects Determine the semantic relations 5/9/2019 Semantic Enrichment of Ontology Mappings
26
4. SemRep System 4.2 Wikipedia Extraction
5/9/2019 Semantic Enrichment of Ontology Mappings
27
4. SemRep System 4.2 Wikipedia Extraction
Benefits: Approach can extract millions of valuable semantic relations Challenges: Most Wikipedia articles are about entities Irrelevant for the mapping enrichment domain Examples: persons, locations, organizations, movies, events etc. 5/9/2019 Semantic Enrichment of Ontology Mappings
28
4. SemRep System 4.3 SemRep Architecture
Implemented as graph structure Concepts: nodes Semantic relations: edges 5/9/2019 Semantic Enrichment of Ontology Mappings
29
4. SemRep System 4.4 Query Processing
Example: Query Processing What is the relation between CPU and PC? 5/9/2019 Semantic Enrichment of Ontology Mappings
30
4. SemRep System 4.4 Query Processing
Homoger Pfad Typ + Equal Is-A + Part-Of Inverser Pfade Path types in complex paths A B C x y y x equal is-a inv. is-a part-of has-a inv is-a related – 5/9/2019 Semantic Enrichment of Ontology Mappings
31
4. SemRep System 4.5 Possibilities of Preprocessing
Preprocessing: Gradual Modifier Removal One concept w is an open compound not contained by the repository Remove modifiers from left to right Iteratively check whether the remaining part of the word w‘ is contained by the repository If this is the case, execute the query using w‘ 5/9/2019 Semantic Enrichment of Ontology Mappings
32
4. SemRep System 4.5 Possibilities of Preprocessing
Preprocessing: Same-Modifier-Removal 5/9/2019 Semantic Enrichment of Ontology Mappings
33
5. SEMREP QUALITY AUGMENTATION
5/9/2019 Semantic Enrichment of Ontology Mappings
34
5. SemRep Quality Augmentation 5.1 Overview
Challenge: Quality issues caused by Wikipedia relations Many irrelevant concepts (entities) Imprecise or false relations Concept cleaning: Find and remove inappropriate concepts Use different filters 5/9/2019 Semantic Enrichment of Ontology Mappings
35
5. SemRep Quality Augmentation 5.2 Concept Filtering
Filter 1: Category Filter of Wikipedia Remove articles that are in typical entity-categories Using more than 400 pattern extractions to block articles Precision: About 99.8 % How to filter ‘Leipzig‘? 5/9/2019 Semantic Enrichment of Ontology Mappings
36
5. SemRep Quality Augmentation 5.2 Concept Filtering
Filter 2: Morphological Analysis Check all concepts whether they are morphologically valid words of the English language Sequence of letters (A-Z), dashes and blanks Precision: About 100 % São Paulo C8H10N4O2 Apple Casio 9850G Leipzig Angela Merkel 5/9/2019 Semantic Enrichment of Ontology Mappings
37
5. SemRep Quality Augmentation 5.2 Concept Filtering
Filter 3: Remove all concepts that do not appear in Wiktionary Wiktionary: Very comprehensive, multilinguistic dictionary Similar to Wikipedia (4,000,000+ entries) More restrictive Precision about 97 % Many terms from biological / biomedical domain are blocked 5/9/2019 Semantic Enrichment of Ontology Mappings
38
5. SemRep Quality Augmentation 5.2 Concept Filtering
Filter 4: Universal Concept Filter (UCF) Remove ‘universal‘ concepts List of approx. 80 concepts (manually created) Examples: abstraction, activity, entity, event, process, thing Useful for all resources 5/9/2019 Semantic Enrichment of Ontology Mappings
39
5. SemRep Quality Augmentation 5.2 Concept Filtering
5/9/2019 Semantic Enrichment of Ontology Mappings
40
5. SemRep Quality Augmentation 5.3 Concept Filtering – Evaluation
Reduction of the Wikipedia Data Set #Relations #Concepts Remaining Rel. Remaining Con. Original... 12,519,916 4,386,119 100 % Category Filter 4,693,514 1,477,462 37.5 % 33.7 % Morph. Analysis 2,843,428 1,051,170 22.8 % 24.0 % Wiktionary Filter 1,489,577 548,685 11.9 % 12.5 % UCF 1,488,784 548,610 5/9/2019 Semantic Enrichment of Ontology Mappings
41
5. SemRep Quality Augmentation 5.4 Relation Filtering
Challenge: How to discover false relations in the Wikipedia Data Set? Approach: Formulate search queries and check how many results are returned by a search engine Question: Is door part of a house? 5/9/2019 Semantic Enrichment of Ontology Mappings
42
5. SemRep Quality Augmentation 5.4 Relation Filtering
5/9/2019 Semantic Enrichment of Ontology Mappings
43
5. SemRep Quality Augmentation 5.4 Relation Filtering
Problems: Approach takes about 1 year 1.5 million relations 4 types to be checked About 10 different expressions per type necessary One search query: 0.5 seconds Search Engines very restrictive Google API allows 100 queries/day Approach has a very bad precision Even by using Google, precision is about 65 % for correct relations Thus, 35 % of all correct relations are removed 5/9/2019 Semantic Enrichment of Ontology Mappings
44
Semantic Enrichment of Ontology Mappings
6. EVALUATIONS 5/9/2019 Semantic Enrichment of Ontology Mappings
45
6. Evaluations 6.1 Evaluation of Semantic Mappings
How to evaluate a semantic mapping? A correspondence can be correct or false A correct correspondence can be correctly typed or not There are 2 scenarios We have a perfect (complete and correct) mapping We have a non-perfect mapping Some correspondences are missing Some correspondences are false 5/9/2019 Semantic Enrichment of Ontology Mappings
46
6. Evaluations 6.1 Evaluation of Semantic Mappings
Using perfect mappings: Suitable to gauge the overall quality of type determination r = p = f Using authentic (real) mappings: Suitable to gauge the overall process of matching + enrichment 5/9/2019 Semantic Enrichment of Ontology Mappings
47
6. Evaluations 6.1 Evaluation of Semantic Mappings
Two ways of measurement Effective recall / precision Strict recall / precision Corresp. in B + M Corresp. in M \ B Corresp. in B \ M Type correct Less precision Less recall Type false 5/9/2019 Semantic Enrichment of Ontology Mappings
48
6. Evaluations 6.2 SemRep Overview
Applying all filters, SemRep currently consists of... 5.59 million relations 2.73 million concepts Average node degree: 2.05 32.4 % of all nodes have a degree of 1 Max degree: Person (14,975) Resource Relations Concepts WordNet 1,881,346 119,895 Wikipedia Main Wikipedia Redirects Wikipedia Field Ref. 1,488,784 1,472,117 72,500 548,610 2,117,001 66,965 OpenThesaurus 614,559 58,473 UMLS (Excerpt) 109,599 72,568 5/9/2019 Semantic Enrichment of Ontology Mappings
49
6. Evaluations 6.2 SemRep Overview
Distribution of Relation Types: 46 % equal 39 % is-a / inv. is-a 15 % has-a / part-of 5/9/2019 Semantic Enrichment of Ontology Mappings
50
6. Evaluations 6.3 SemRep Time Performance
Almost linear complexity of SemRep Initialization SemRep loads about 300,000 relations/s At times, slight variations 5/9/2019 Semantic Enrichment of Ontology Mappings
51
6. Evaluations 6.3 SemRep Time Performance
Average execution times for 1 correspondence Depends on maximum path length Depends on search mode (first path vs. best paths) Increases exponentially with path lengths P = 2 P = 3 P = 4 Best 12 ms 158 ms 990 ms Average 29 ms 231 ms 2,638 ms Worst 48 ms 597 ms 6,637 ms 5/9/2019 Semantic Enrichment of Ontology Mappings
52
Semantic Enrichment of Ontology Mappings
6. Evaluations 6.4 Quality F-Measures achieved in several benchmarks (perfect mappings) Stroma SemRep Stroma + SemRep Web Directories (DE) (63.6) 61.5 (64.2) Diseases (65.9) 73.5 (74.0) Text Mining Taxonimies (81.0) 68.4 (79.1) Furniture (60.3) 63.2 (69.1) Groceries (79.9) 49.1 (74.0) Clothing (87.3) 69.0 (86.6) Literature (78.9) 63.4 (81.7) Average (73.8) 64.4 (75.5) 5/9/2019 Semantic Enrichment of Ontology Mappings
53
Semantic Enrichment of Ontology Mappings
6. Evaluations 6.4 Quality Insights: If no heuristic strategies are used... SemRep always increased the mapping quality If heuristic strategies are used... SemRep can both increase or decrease the mapping quality The average results show a slight increase, though Insights: Under circumstances, heuristic strategies can outperform background knowledge 5/9/2019 Semantic Enrichment of Ontology Mappings
54
Semantic Enrichment of Ontology Mappings
7. Outlook / Future Work 5/9/2019 Semantic Enrichment of Ontology Mappings
55
Semantic Enrichment of Ontology Mappings
7. Outlook / Future Work Future Work: Making STROMA applicable for GOMMA Semantic Enrichment of biological mapping Evaluation / Comparison Making SemRep and STROMA generally available Extracting full UMLS resource Support SemRep in biomedical-specific mappings 5/9/2019 Semantic Enrichment of Ontology Mappings
56
Semantic Enrichment of Ontology Mappings
Thank You 5/9/2019 Semantic Enrichment of Ontology Mappings
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.