Semantic Enrichment of Ontology Mappings

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
YAGO: A Large Ontology from Wikipedia and WordNet Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum Max-Planck-Institute for Computer Science, Saarbruecken,
Konstanz, Jens Gerken ZuiScat An Overview of data quality problems and data cleaning solution approaches Data Cleaning Seminarvortrag: Digital.
Software Testing and Quality Assurance
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Chapter 5: Information Retrieval and Web Search
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
Flexible Text Mining using Interactive Information Extraction David Milward
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Dimitrios Skoutas Alkis Simitsis
Chapter 6: Information Retrieval and Web Search
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
S calable K nowledge C omposition Ontology Interoperation January 19, 1999 Jan Jannink, Prasenjit Mitra, Srinivasan Pichai, Danladi Verheijen, Gio Wiederhold.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
AIFB Ontology Mapping I3CON Workshop PerMIS August 24-26, 2004 Washington D.C., USA Marc Ehrig Institute AIFB, University of Karlsruhe.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Information Retrieval in Practice
Language Identification and Part-of-Speech Tagging
Software Testing.
Measuring Monolinguality
Linguistic Graph Similarity for News Sentence Searching
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Web News Sentence Searching Using Linguistic Graph Similarity
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms By Monika Henzinger Presented.
CRF &SVM in Medication Extraction
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Presented by: Hassan Sayyadi
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Patrick Arnold, DBS-Oberseminar in December 2013
Result of Ontology Alignment with RiMOM at OAEI’06
Presentation 王睿.
Exploring Scholarly Data with Rexplore
Objective of This Course
Extracting Semantic Concept Relations
Data Integration for Relational Web
Enriching Structured Knowledge with Open Information
Table Cell Search for Question Answering Huan Sun
Introduction Task: extracting relational facts from text
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Searching with context
Block Matching for Ontologies
Magnet & /facet Zheng Liang
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Data Warehousing Concepts
Applying principles of computer science in a biological context
Business Process Management and Semantic Technologies
Information Retrieval and Web Design
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Introduction Dataset search
Presentation transcript:

Semantic Enrichment of Ontology Mappings Insights and Outlook Patrick Arnold Seminar Zingst 2015

Semantic Enrichment of Ontology Mappings Outline Introduction The STROMA System STROMA Strategies The SemRep System SemRep Quality Augmentation Evaluations Outlook and Conclusions 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings Introduction 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings 1. Introduction Semantic Enrichment: Determining the relation type of correspondences within a mapping Input: Initial mapping Output: Enriched mapping 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings 1. Introduction Focus: Schema/ontology structure, mapping structure Concept names, possibly concept paths No instance data No ontologies or schema Two general approaches: Lexicographic (morphological) analysis Background Knowledge 5/9/2019 Semantic Enrichment of Ontology Mappings

1. Introduction – Example 1 Prevent imprecise ontology merging 5/9/2019 Semantic Enrichment of Ontology Mappings

1. Introduction – Example 2 Recognize incompatible ontologies 5/9/2019 Semantic Enrichment of Ontology Mappings

1. Introduction – Example 3 Detect the need for transformation functions Only needed for databases SRC TRG First name part-of Name Last name part-of concatenate() 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings 2. STROMA 5/9/2019 Semantic Enrichment of Ontology Mappings

2. Semantic Enrichment with STROMA 2.1 Overview STROMA: Mapping Enrichment Tool to determine the relation types of correspondences Iterates through all correspondences Comprises several strategies to determine the type Type having the highest score will be the final relation type 5/9/2019 Semantic Enrichment of Ontology Mappings

2. Semantic Enrichment with STROMA 2.2 Architecture / Workflow 5/9/2019 Semantic Enrichment of Ontology Mappings

2. Semantic Enrichment with STROMA 2.3 Benefits and Drawbacks Much faster than classic match approaches Completely independent from initial match system Easy to handle Drawbacks High dependence on initial mapping (“garbage in – garbage out”) Initial mapping drawn to equality correspondences 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings 3. STROMA STRATEGIES 5/9/2019 Semantic Enrichment of Ontology Mappings

3. STROMA Strategies 3.2 Basic Strategies Compound Strategy Concludes is-a relation if compound matches its head Compounds: Very productive means of word formation Is-a Mountain bike Bike ? Saw tooth Tooth 5/9/2019 Semantic Enrichment of Ontology Mappings

3. STROMA Strategies 3.2 Basic Strategies Background Knowledge Strategy Ask a thesaurus or dictionary SemRep: Integrated Repository of lexicographic resources cpu, computer, English Background Knowledge cpu  computer part-of 5/9/2019 Semantic Enrichment of Ontology Mappings

3. STROMA Strategies 3.3 Derived Strategies Itemization Strategy: How to handle itemizations? Most frequently in product taxonomies Examples: Laptops and Computers Bikes, Scooters and Motorbikes Approach: Remove items from item sets Goal: Empty set, or set with at most 1 concept 5/9/2019 Semantic Enrichment of Ontology Mappings

3. STROMA Strategies 3.4 Heuristic Strategies Multiple Linkage Conclude inverse is-a relation if a source node s is connected to several target nodes t1, ..., tn n ≥ 3 5/9/2019 Semantic Enrichment of Ontology Mappings

3. STROMA Strategies 3.4 Heuristic Strategies Word Frequency Comparison Compare the rank of two concepts in a Word Frequency Table Conclude A is-a B relation if one rank(A) is considerably lower compared to rank(B) Examples laptop (#11,028) <is-a> computer (#1,326) vehicle (#2,845) <is-a> car (#243) 5/9/2019 Semantic Enrichment of Ontology Mappings

3. STROMA Strategies 3.5 Verification Comparing concepts alone may lead to false conclusions Example 1: False IS-A 5/9/2019 Semantic Enrichment of Ontology Mappings

3. STROMA Strategies 3.5 Verification Comparing concepts alone may lead to false conclusions Example 2: False EQUAL 5/9/2019 Semantic Enrichment of Ontology Mappings

3. STROMA Strategies 3.6 Comparison Strategy Dependencies 5/9/2019 Semantic Enrichment of Ontology Mappings

3. STROMA Strategies 3.6 Comparison Strategy-Type-Comparison 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings 4. SEMREP SYSTEM 5/9/2019 Semantic Enrichment of Ontology Mappings

4. SemRep System 4.1 Overview SemRep: Semantic Repository combining different lexicographic resources Allows queries across those resources Multi-linguistic Extandable Designed for Mapping Enrichment Integrated Resources: WordNet Wikipedia Relations UMLS (extract) OpenThesaurus (German) 5/9/2019 Semantic Enrichment of Ontology Mappings

4. SemRep System 4.2 Wikipedia Extraction Extracting semantic concept relations from Wikipedia Parse the definition sentence (first sentence) of each Wikipedia article Find the semantic relation pattern(s) Extract the concept terms the pattern connects Determine the semantic relations 5/9/2019 Semantic Enrichment of Ontology Mappings

4. SemRep System 4.2 Wikipedia Extraction 5/9/2019 Semantic Enrichment of Ontology Mappings

4. SemRep System 4.2 Wikipedia Extraction Benefits: Approach can extract millions of valuable semantic relations Challenges: Most Wikipedia articles are about entities Irrelevant for the mapping enrichment domain Examples: persons, locations, organizations, movies, events etc. 5/9/2019 Semantic Enrichment of Ontology Mappings

4. SemRep System 4.3 SemRep Architecture Implemented as graph structure Concepts: nodes Semantic relations: edges 5/9/2019 Semantic Enrichment of Ontology Mappings

4. SemRep System 4.4 Query Processing Example: Query Processing What is the relation between CPU and PC? 5/9/2019 Semantic Enrichment of Ontology Mappings

4. SemRep System 4.4 Query Processing Homoger Pfad Typ + Equal Is-A + Part-Of Inverser Pfade Path types in complex paths A B C x y y x equal is-a inv. is-a part-of has-a inv is-a related – 5/9/2019 Semantic Enrichment of Ontology Mappings

4. SemRep System 4.5 Possibilities of Preprocessing Preprocessing: Gradual Modifier Removal One concept w is an open compound not contained by the repository Remove modifiers from left to right Iteratively check whether the remaining part of the word w‘ is contained by the repository If this is the case, execute the query using w‘ 5/9/2019 Semantic Enrichment of Ontology Mappings

4. SemRep System 4.5 Possibilities of Preprocessing Preprocessing: Same-Modifier-Removal 5/9/2019 Semantic Enrichment of Ontology Mappings

5. SEMREP QUALITY AUGMENTATION 5/9/2019 Semantic Enrichment of Ontology Mappings

5. SemRep Quality Augmentation 5.1 Overview Challenge: Quality issues caused by Wikipedia relations Many irrelevant concepts (entities) Imprecise or false relations Concept cleaning: Find and remove inappropriate concepts Use different filters 5/9/2019 Semantic Enrichment of Ontology Mappings

5. SemRep Quality Augmentation 5.2 Concept Filtering Filter 1: Category Filter of Wikipedia Remove articles that are in typical entity-categories Using more than 400 pattern extractions to block articles Precision: About 99.8 % How to filter ‘Leipzig‘? 5/9/2019 Semantic Enrichment of Ontology Mappings

5. SemRep Quality Augmentation 5.2 Concept Filtering Filter 2: Morphological Analysis Check all concepts whether they are morphologically valid words of the English language Sequence of letters (A-Z), dashes and blanks Precision: About 100 % São Paulo C8H10N4O2 Apple Casio 9850G Leipzig Angela Merkel 5/9/2019 Semantic Enrichment of Ontology Mappings

5. SemRep Quality Augmentation 5.2 Concept Filtering Filter 3: Remove all concepts that do not appear in Wiktionary Wiktionary: Very comprehensive, multilinguistic dictionary Similar to Wikipedia (4,000,000+ entries) More restrictive Precision about 97 % Many terms from biological / biomedical domain are blocked 5/9/2019 Semantic Enrichment of Ontology Mappings

5. SemRep Quality Augmentation 5.2 Concept Filtering Filter 4: Universal Concept Filter (UCF) Remove ‘universal‘ concepts List of approx. 80 concepts (manually created) Examples: abstraction, activity, entity, event, process, thing Useful for all resources 5/9/2019 Semantic Enrichment of Ontology Mappings

5. SemRep Quality Augmentation 5.2 Concept Filtering 5/9/2019 Semantic Enrichment of Ontology Mappings

5. SemRep Quality Augmentation 5.3 Concept Filtering – Evaluation Reduction of the Wikipedia Data Set #Relations #Concepts Remaining Rel. Remaining Con. Original... 12,519,916 4,386,119 100 % Category Filter 4,693,514 1,477,462 37.5 % 33.7 % Morph. Analysis 2,843,428 1,051,170 22.8 % 24.0 % Wiktionary Filter 1,489,577 548,685 11.9 % 12.5 % UCF 1,488,784 548,610 5/9/2019 Semantic Enrichment of Ontology Mappings

5. SemRep Quality Augmentation 5.4 Relation Filtering Challenge: How to discover false relations in the Wikipedia Data Set? Approach: Formulate search queries and check how many results are returned by a search engine Question: Is door part of a house? 5/9/2019 Semantic Enrichment of Ontology Mappings

5. SemRep Quality Augmentation 5.4 Relation Filtering 5/9/2019 Semantic Enrichment of Ontology Mappings

5. SemRep Quality Augmentation 5.4 Relation Filtering Problems: Approach takes about 1 year 1.5 million relations 4 types to be checked About 10 different expressions per type necessary One search query: 0.5 seconds Search Engines very restrictive Google API allows 100 queries/day Approach has a very bad precision Even by using Google, precision is about 65 % for correct relations Thus, 35 % of all correct relations are removed 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings 6. EVALUATIONS 5/9/2019 Semantic Enrichment of Ontology Mappings

6. Evaluations 6.1 Evaluation of Semantic Mappings How to evaluate a semantic mapping? A correspondence can be correct or false A correct correspondence can be correctly typed or not There are 2 scenarios We have a perfect (complete and correct) mapping We have a non-perfect mapping Some correspondences are missing Some correspondences are false 5/9/2019 Semantic Enrichment of Ontology Mappings

6. Evaluations 6.1 Evaluation of Semantic Mappings Using perfect mappings: Suitable to gauge the overall quality of type determination r = p = f Using authentic (real) mappings: Suitable to gauge the overall process of matching + enrichment 5/9/2019 Semantic Enrichment of Ontology Mappings

6. Evaluations 6.1 Evaluation of Semantic Mappings Two ways of measurement Effective recall / precision Strict recall / precision Corresp. in B + M Corresp. in M \ B Corresp. in B \ M Type correct  Less precision Less recall Type false  5/9/2019 Semantic Enrichment of Ontology Mappings

6. Evaluations 6.2 SemRep Overview Applying all filters, SemRep currently consists of... 5.59 million relations 2.73 million concepts Average node degree: 2.05 32.4 % of all nodes have a degree of 1 Max degree: Person (14,975) Resource Relations Concepts WordNet 1,881,346 119,895 Wikipedia Main Wikipedia Redirects Wikipedia Field Ref. 1,488,784 1,472,117 72,500 548,610 2,117,001 66,965 OpenThesaurus 614,559 58,473 UMLS (Excerpt) 109,599 72,568 5/9/2019 Semantic Enrichment of Ontology Mappings

6. Evaluations 6.2 SemRep Overview Distribution of Relation Types: 46 % equal 39 % is-a / inv. is-a 15 % has-a / part-of 5/9/2019 Semantic Enrichment of Ontology Mappings

6. Evaluations 6.3 SemRep Time Performance Almost linear complexity of SemRep Initialization SemRep loads about 300,000 relations/s At times, slight variations 5/9/2019 Semantic Enrichment of Ontology Mappings

6. Evaluations 6.3 SemRep Time Performance Average execution times for 1 correspondence Depends on maximum path length Depends on search mode (first path vs. best paths) Increases exponentially with path lengths P = 2 P = 3 P = 4 Best 12 ms 158 ms 990 ms Average 29 ms 231 ms 2,638 ms Worst 48 ms 597 ms 6,637 ms 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings 6. Evaluations 6.4 Quality F-Measures achieved in several benchmarks (perfect mappings) Stroma SemRep Stroma + SemRep Web Directories (DE) 63.6 (63.6) 61.5 64.2 (64.2) Diseases 65.9 (65.9) 73.5 74.0 (74.0) Text Mining Taxonimies 18.9 (81.0) 68.4 70.6 (79.1) Furniture 60.3 (60.3) 63.2 69.1 (69.1) Groceries 29.6 (79.9) 49.1 52.6 (74.0) Clothing 56.3 (87.3) 69.0 82.4 (86.6) Literature 66.2 (78.9) 63.4 78.9 (81.7) Average 51.5 (73.8) 64.4 70.3 (75.5) 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings 6. Evaluations 6.4 Quality Insights: If no heuristic strategies are used... SemRep always increased the mapping quality If heuristic strategies are used... SemRep can both increase or decrease the mapping quality The average results show a slight increase, though Insights: Under circumstances, heuristic strategies can outperform background knowledge 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings 7. Outlook / Future Work 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings 7. Outlook / Future Work Future Work: Making STROMA applicable for GOMMA Semantic Enrichment of biological mapping Evaluation / Comparison Making SemRep and STROMA generally available Extracting full UMLS resource Support SemRep in biomedical-specific mappings 5/9/2019 Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings Thank You 5/9/2019 Semantic Enrichment of Ontology Mappings