Semantic Enrichment of Ontology Mappings Insights and Outlook Patrick Arnold Seminar Zingst 2015
Semantic Enrichment of Ontology Mappings Outline Introduction The STROMA System STROMA Strategies The SemRep System SemRep Quality Augmentation Evaluations Outlook and Conclusions 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings Introduction 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings 1. Introduction Semantic Enrichment: Determining the relation type of correspondences within a mapping Input: Initial mapping Output: Enriched mapping 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings 1. Introduction Focus: Schema/ontology structure, mapping structure Concept names, possibly concept paths No instance data No ontologies or schema Two general approaches: Lexicographic (morphological) analysis Background Knowledge 5/9/2019 Semantic Enrichment of Ontology Mappings
1. Introduction – Example 1 Prevent imprecise ontology merging 5/9/2019 Semantic Enrichment of Ontology Mappings
1. Introduction – Example 2 Recognize incompatible ontologies 5/9/2019 Semantic Enrichment of Ontology Mappings
1. Introduction – Example 3 Detect the need for transformation functions Only needed for databases SRC TRG First name part-of Name Last name part-of concatenate() 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings 2. STROMA 5/9/2019 Semantic Enrichment of Ontology Mappings
2. Semantic Enrichment with STROMA 2.1 Overview STROMA: Mapping Enrichment Tool to determine the relation types of correspondences Iterates through all correspondences Comprises several strategies to determine the type Type having the highest score will be the final relation type 5/9/2019 Semantic Enrichment of Ontology Mappings
2. Semantic Enrichment with STROMA 2.2 Architecture / Workflow 5/9/2019 Semantic Enrichment of Ontology Mappings
2. Semantic Enrichment with STROMA 2.3 Benefits and Drawbacks Much faster than classic match approaches Completely independent from initial match system Easy to handle Drawbacks High dependence on initial mapping (“garbage in – garbage out”) Initial mapping drawn to equality correspondences 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings 3. STROMA STRATEGIES 5/9/2019 Semantic Enrichment of Ontology Mappings
3. STROMA Strategies 3.2 Basic Strategies Compound Strategy Concludes is-a relation if compound matches its head Compounds: Very productive means of word formation Is-a Mountain bike Bike ? Saw tooth Tooth 5/9/2019 Semantic Enrichment of Ontology Mappings
3. STROMA Strategies 3.2 Basic Strategies Background Knowledge Strategy Ask a thesaurus or dictionary SemRep: Integrated Repository of lexicographic resources cpu, computer, English Background Knowledge cpu computer part-of 5/9/2019 Semantic Enrichment of Ontology Mappings
3. STROMA Strategies 3.3 Derived Strategies Itemization Strategy: How to handle itemizations? Most frequently in product taxonomies Examples: Laptops and Computers Bikes, Scooters and Motorbikes Approach: Remove items from item sets Goal: Empty set, or set with at most 1 concept 5/9/2019 Semantic Enrichment of Ontology Mappings
3. STROMA Strategies 3.4 Heuristic Strategies Multiple Linkage Conclude inverse is-a relation if a source node s is connected to several target nodes t1, ..., tn n ≥ 3 5/9/2019 Semantic Enrichment of Ontology Mappings
3. STROMA Strategies 3.4 Heuristic Strategies Word Frequency Comparison Compare the rank of two concepts in a Word Frequency Table Conclude A is-a B relation if one rank(A) is considerably lower compared to rank(B) Examples laptop (#11,028) <is-a> computer (#1,326) vehicle (#2,845) <is-a> car (#243) 5/9/2019 Semantic Enrichment of Ontology Mappings
3. STROMA Strategies 3.5 Verification Comparing concepts alone may lead to false conclusions Example 1: False IS-A 5/9/2019 Semantic Enrichment of Ontology Mappings
3. STROMA Strategies 3.5 Verification Comparing concepts alone may lead to false conclusions Example 2: False EQUAL 5/9/2019 Semantic Enrichment of Ontology Mappings
3. STROMA Strategies 3.6 Comparison Strategy Dependencies 5/9/2019 Semantic Enrichment of Ontology Mappings
3. STROMA Strategies 3.6 Comparison Strategy-Type-Comparison 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings 4. SEMREP SYSTEM 5/9/2019 Semantic Enrichment of Ontology Mappings
4. SemRep System 4.1 Overview SemRep: Semantic Repository combining different lexicographic resources Allows queries across those resources Multi-linguistic Extandable Designed for Mapping Enrichment Integrated Resources: WordNet Wikipedia Relations UMLS (extract) OpenThesaurus (German) 5/9/2019 Semantic Enrichment of Ontology Mappings
4. SemRep System 4.2 Wikipedia Extraction Extracting semantic concept relations from Wikipedia Parse the definition sentence (first sentence) of each Wikipedia article Find the semantic relation pattern(s) Extract the concept terms the pattern connects Determine the semantic relations 5/9/2019 Semantic Enrichment of Ontology Mappings
4. SemRep System 4.2 Wikipedia Extraction 5/9/2019 Semantic Enrichment of Ontology Mappings
4. SemRep System 4.2 Wikipedia Extraction Benefits: Approach can extract millions of valuable semantic relations Challenges: Most Wikipedia articles are about entities Irrelevant for the mapping enrichment domain Examples: persons, locations, organizations, movies, events etc. 5/9/2019 Semantic Enrichment of Ontology Mappings
4. SemRep System 4.3 SemRep Architecture Implemented as graph structure Concepts: nodes Semantic relations: edges 5/9/2019 Semantic Enrichment of Ontology Mappings
4. SemRep System 4.4 Query Processing Example: Query Processing What is the relation between CPU and PC? 5/9/2019 Semantic Enrichment of Ontology Mappings
4. SemRep System 4.4 Query Processing Homoger Pfad Typ + Equal Is-A + Part-Of Inverser Pfade Path types in complex paths A B C x y y x equal is-a inv. is-a part-of has-a inv is-a related – 5/9/2019 Semantic Enrichment of Ontology Mappings
4. SemRep System 4.5 Possibilities of Preprocessing Preprocessing: Gradual Modifier Removal One concept w is an open compound not contained by the repository Remove modifiers from left to right Iteratively check whether the remaining part of the word w‘ is contained by the repository If this is the case, execute the query using w‘ 5/9/2019 Semantic Enrichment of Ontology Mappings
4. SemRep System 4.5 Possibilities of Preprocessing Preprocessing: Same-Modifier-Removal 5/9/2019 Semantic Enrichment of Ontology Mappings
5. SEMREP QUALITY AUGMENTATION 5/9/2019 Semantic Enrichment of Ontology Mappings
5. SemRep Quality Augmentation 5.1 Overview Challenge: Quality issues caused by Wikipedia relations Many irrelevant concepts (entities) Imprecise or false relations Concept cleaning: Find and remove inappropriate concepts Use different filters 5/9/2019 Semantic Enrichment of Ontology Mappings
5. SemRep Quality Augmentation 5.2 Concept Filtering Filter 1: Category Filter of Wikipedia Remove articles that are in typical entity-categories Using more than 400 pattern extractions to block articles Precision: About 99.8 % How to filter ‘Leipzig‘? 5/9/2019 Semantic Enrichment of Ontology Mappings
5. SemRep Quality Augmentation 5.2 Concept Filtering Filter 2: Morphological Analysis Check all concepts whether they are morphologically valid words of the English language Sequence of letters (A-Z), dashes and blanks Precision: About 100 % São Paulo C8H10N4O2 Apple Casio 9850G Leipzig Angela Merkel 5/9/2019 Semantic Enrichment of Ontology Mappings
5. SemRep Quality Augmentation 5.2 Concept Filtering Filter 3: Remove all concepts that do not appear in Wiktionary Wiktionary: Very comprehensive, multilinguistic dictionary Similar to Wikipedia (4,000,000+ entries) More restrictive Precision about 97 % Many terms from biological / biomedical domain are blocked 5/9/2019 Semantic Enrichment of Ontology Mappings
5. SemRep Quality Augmentation 5.2 Concept Filtering Filter 4: Universal Concept Filter (UCF) Remove ‘universal‘ concepts List of approx. 80 concepts (manually created) Examples: abstraction, activity, entity, event, process, thing Useful for all resources 5/9/2019 Semantic Enrichment of Ontology Mappings
5. SemRep Quality Augmentation 5.2 Concept Filtering 5/9/2019 Semantic Enrichment of Ontology Mappings
5. SemRep Quality Augmentation 5.3 Concept Filtering – Evaluation Reduction of the Wikipedia Data Set #Relations #Concepts Remaining Rel. Remaining Con. Original... 12,519,916 4,386,119 100 % Category Filter 4,693,514 1,477,462 37.5 % 33.7 % Morph. Analysis 2,843,428 1,051,170 22.8 % 24.0 % Wiktionary Filter 1,489,577 548,685 11.9 % 12.5 % UCF 1,488,784 548,610 5/9/2019 Semantic Enrichment of Ontology Mappings
5. SemRep Quality Augmentation 5.4 Relation Filtering Challenge: How to discover false relations in the Wikipedia Data Set? Approach: Formulate search queries and check how many results are returned by a search engine Question: Is door part of a house? 5/9/2019 Semantic Enrichment of Ontology Mappings
5. SemRep Quality Augmentation 5.4 Relation Filtering 5/9/2019 Semantic Enrichment of Ontology Mappings
5. SemRep Quality Augmentation 5.4 Relation Filtering Problems: Approach takes about 1 year 1.5 million relations 4 types to be checked About 10 different expressions per type necessary One search query: 0.5 seconds Search Engines very restrictive Google API allows 100 queries/day Approach has a very bad precision Even by using Google, precision is about 65 % for correct relations Thus, 35 % of all correct relations are removed 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings 6. EVALUATIONS 5/9/2019 Semantic Enrichment of Ontology Mappings
6. Evaluations 6.1 Evaluation of Semantic Mappings How to evaluate a semantic mapping? A correspondence can be correct or false A correct correspondence can be correctly typed or not There are 2 scenarios We have a perfect (complete and correct) mapping We have a non-perfect mapping Some correspondences are missing Some correspondences are false 5/9/2019 Semantic Enrichment of Ontology Mappings
6. Evaluations 6.1 Evaluation of Semantic Mappings Using perfect mappings: Suitable to gauge the overall quality of type determination r = p = f Using authentic (real) mappings: Suitable to gauge the overall process of matching + enrichment 5/9/2019 Semantic Enrichment of Ontology Mappings
6. Evaluations 6.1 Evaluation of Semantic Mappings Two ways of measurement Effective recall / precision Strict recall / precision Corresp. in B + M Corresp. in M \ B Corresp. in B \ M Type correct Less precision Less recall Type false 5/9/2019 Semantic Enrichment of Ontology Mappings
6. Evaluations 6.2 SemRep Overview Applying all filters, SemRep currently consists of... 5.59 million relations 2.73 million concepts Average node degree: 2.05 32.4 % of all nodes have a degree of 1 Max degree: Person (14,975) Resource Relations Concepts WordNet 1,881,346 119,895 Wikipedia Main Wikipedia Redirects Wikipedia Field Ref. 1,488,784 1,472,117 72,500 548,610 2,117,001 66,965 OpenThesaurus 614,559 58,473 UMLS (Excerpt) 109,599 72,568 5/9/2019 Semantic Enrichment of Ontology Mappings
6. Evaluations 6.2 SemRep Overview Distribution of Relation Types: 46 % equal 39 % is-a / inv. is-a 15 % has-a / part-of 5/9/2019 Semantic Enrichment of Ontology Mappings
6. Evaluations 6.3 SemRep Time Performance Almost linear complexity of SemRep Initialization SemRep loads about 300,000 relations/s At times, slight variations 5/9/2019 Semantic Enrichment of Ontology Mappings
6. Evaluations 6.3 SemRep Time Performance Average execution times for 1 correspondence Depends on maximum path length Depends on search mode (first path vs. best paths) Increases exponentially with path lengths P = 2 P = 3 P = 4 Best 12 ms 158 ms 990 ms Average 29 ms 231 ms 2,638 ms Worst 48 ms 597 ms 6,637 ms 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings 6. Evaluations 6.4 Quality F-Measures achieved in several benchmarks (perfect mappings) Stroma SemRep Stroma + SemRep Web Directories (DE) 63.6 (63.6) 61.5 64.2 (64.2) Diseases 65.9 (65.9) 73.5 74.0 (74.0) Text Mining Taxonimies 18.9 (81.0) 68.4 70.6 (79.1) Furniture 60.3 (60.3) 63.2 69.1 (69.1) Groceries 29.6 (79.9) 49.1 52.6 (74.0) Clothing 56.3 (87.3) 69.0 82.4 (86.6) Literature 66.2 (78.9) 63.4 78.9 (81.7) Average 51.5 (73.8) 64.4 70.3 (75.5) 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings 6. Evaluations 6.4 Quality Insights: If no heuristic strategies are used... SemRep always increased the mapping quality If heuristic strategies are used... SemRep can both increase or decrease the mapping quality The average results show a slight increase, though Insights: Under circumstances, heuristic strategies can outperform background knowledge 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings 7. Outlook / Future Work 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings 7. Outlook / Future Work Future Work: Making STROMA applicable for GOMMA Semantic Enrichment of biological mapping Evaluation / Comparison Making SemRep and STROMA generally available Extracting full UMLS resource Support SemRep in biomedical-specific mappings 5/9/2019 Semantic Enrichment of Ontology Mappings
Semantic Enrichment of Ontology Mappings Thank You 5/9/2019 Semantic Enrichment of Ontology Mappings