Block Matching for Ontologies Wei Hu and Yuzhong Qu School of Computer Science and Engineering, Southeast University, P.R. China 2/4/2019 XObjects Group - Southeast University
XObjects Group - Southeast University Outline Introduction Overview of the Approach Relatedness among Domain Entities Partitioning for Block Matching Evaluation Related Work Concluding Remarks 2/4/2019 XObjects Group - Southeast University
XObjects Group - Southeast University Introduction Ontology matching Enabling interoperability among different but related ontologies In practice, establishing mappings between domain entities Block matching The common relationship cardinality of mappings is 1:1. However, mappings between sets of domain entities are more pervasive. A block is a set of domain entities. A block mapping is a pair of matched blocks from different ontologies. Block matching is the process of discovering block mappings. 2/4/2019 XObjects Group - Southeast University
Introduction - Examples From a microcosmic angle of view Given two ontologies O1 and O2, O1 contains three domain entities Month, Day, Year; while O2 contains a single domain entity Date. It is more natural to match the block {Month, Day, Year} in O1 with the block {Date} in O2. From a macroscopic angle of view Block matching provides a general picture at a higher level to explore the correspondences between the main topics of ontologies. 2/4/2019 XObjects Group - Southeast University
Introduction (Cont’d.) The block matching problem a special partitioning problem All the block mappings compose a partitioning of all the domain entities from the two given ontologies. The partitioning quality – cohesiveness & coupling In addition, the mapping quality is inherently difficult to guarantee. At present, most of the algorithms proposed in literature are targeted to find 1:1 mappings. One exception – PBM Only coping with mappings between classes – not a general solution The mapping quality is not good enough for complicated ontologies. 2/4/2019 XObjects Group - Southeast University
Introduction – Our Approach So, we propose a new partitioning-based approach to address the block matching problem. The relatedness measure – Virtual Documents Novelty – both the mapping quality & the partitioning quality can be guaranteed simultaneously. The partitioning algorithm – A Hierarchical Bisection Algorithm Novelty – providing block mappings at different levels of granularity. Flat partitioning – extracting the optimal mappings with a given number of block mappings. 2/4/2019 XObjects Group - Southeast University
Overview of the Approach Our approach starts with two ontologies as input, and then after four processing stages, the output returns block mappings. Constructing virtual documents for domain entities Computing relatedness among domain entities Partitioning by a hierarchical bisection algorithm Extracting the optimal block mappings 2/4/2019 XObjects Group - Southeast University
XObjects Group - Southeast University A Toy Example Onto1 Onto2 2/4/2019 XObjects Group - Southeast University
Step 1 – Construction of Virtual Documents A virtual document represents a collection of weighted tokens, which reflects the intended meaning of a domain entity. The virtual document of a domain entity contains not only the local descriptions but also the neighboring information. Local description – for a literal node / a URIref / a blank node Neighboring information – subject / predicate / object neighbors 2/4/2019 XObjects Group - Southeast University
Step 2 – Computation of Relatedness The similarity between virtual documents is measured by the Cosine value between two vectors, corresponding to the two virtual documents in the Vector Space Model. Generating a relatedness matrix by computing the similarity among virtual documents within each of the two ontologies as well as crossing the two ontologies. Both of linguistic and structural relatedness within each of the two ontologies are reflected in W11 and W22. Linguistic relatedness crossing ontologies is characterized by W12. 2/4/2019 XObjects Group - Southeast University
Illustration by the Toy Example VD(onto1:Report) Local Description = “report” Des(onto1:Reference) = “reference” VD(onto1:Reference) Local Description = “reference” Des(onto1:Report) = “report”, Des(onto1:Book), Des(onto1:hasInstitution) VD(onto2:Entry) Local Description = “entry” Des(onto2:Article), Des(onto2:Book), Des(onto2:hasInstitution) The relatedness between onto1:Report and onto1:Reference is revealed through shared words (“report” & “reference”) obtained from neighboring relationship in Vector Space Model. The relatedness between onto1:Reference and onto2:Entry is exploited by the shared words “book”, “institution”. 2/4/2019 XObjects Group - Southeast University
Step 3 – The Hierarchical Bisection Algorithm The min-max cut (Mcut) function is adopted as the criterion function. Why is a hierarchical algorithm? It is easy to depict the partitioning for a given domain. There may be several correct answers. The overview of our partitioning algorithm Input: a relatedness matrix W It recursively bisects a matrix into two submatrices by finding the minimum Mcut. Output: a dendrogram consisting of layers of block mappings. 2/4/2019 XObjects Group - Southeast University
Step 4 – Extraction of the Optimal Block Mappings Obtaining a flat partitioning with a given number of block mappings p where g is the objective function: 2/4/2019 XObjects Group - Southeast University
Illustration by the Toy Example (Cont’d.) The dendrogram for onto1 & onto2 is shown as follows. If extracting 3 block mappings, then the selected ones are … √ √ √ 2/4/2019 XObjects Group - Southeast University
Illustration by the Toy Example (Cont’d.) 2/4/2019 XObjects Group - Southeast University
Evaluation – Experimental Methodology We implement our approach in Java, called BMO. BMO focuses on the domain entities at the conceptual level. We evaluate the performance of BMO in three experiments: The mapping quality of BMO The partitioning quality of BMO In addition, comparing BMO with PBM For both the mapping quality and the partitioning quality 2/4/2019 XObjects Group - Southeast University
Evaluation – Case Study Two pairs of ontologies – Russia12 and TourismAB Russia12 Russia1 – 151 classes & 76 properties Russia2 – 162 classes & 81 properties 85 reference alignments (1:1) TourismAB TourismA – 340 classes & 97 properties TourismB – 474 classes & 100 properties 226 reference alignments (1:1) 2/4/2019 XObjects Group - Southeast University
Evaluation – Evaluation Metrics The mapping quality – observing the correctness with the variation of the number of the block mappings. Rationale – the higher the quality of the block mappings is, the more reference alignments could be found in the block mappings. 2/4/2019 XObjects Group - Southeast University
Evaluation – Evaluation Metrics (Cont’d.) The partitioning quality – comparing the computed block mappings by BMO with the manual ones set up by volunteers. The f-measure is defined as a combination of the precision and recall. The entropy considers the distribution of the domain entities in block mappings and reflects the overall partitioning quality. 2/4/2019 XObjects Group - Southeast University
Evaluation – Experimental Results The correctness with the variation of the number of the block mappings n The partitioning quality of BMO 2/4/2019 XObjects Group - Southeast University
Evaluation – Experimental Results (Cont’d.) The comparison between BMO and PBM The partitioning quality between the two approaches are almost the same. But, the mapping quality of BMO is much better than the one of PBM. 2/4/2019 XObjects Group - Southeast University
XObjects Group - Southeast University Related Work Ontology matching There exist very few approaches raising the issue of block matching. PBM – only for class hierarchies & the mapping quality isn’t good enough In the field of schema matching iMap – complex mapping, hard to specify the domain knowledge in some cases Artemis – similar to our framework, but the partitioning quality isn’t so good Ontology partitioning Many existing works only provide a flat partitioning on a single ontology. Our work is a hierarchical one & partitions two ontologies simultaneously. 2/4/2019 XObjects Group - Southeast University
XObjects Group - Southeast University Concluding Remarks We discussed the block matching problem and suggested both the mapping quality and the partitioning quality should be considered in block matching. We proposed a relatedness measure based on virtual documents that simultaneously importing both linguistic and structural characteristics of domain entities. We presented a hierarchical bisection algorithm to provide block mappings at different levels of granularity. Also, we described a method to automatically extract the optimal block mappings. We set up two kinds of metrics to evaluate of the quality of block matching. The experimental results demonstrated that our approach is feasible. 2/4/2019 XObjects Group - Southeast University
Concluding Remarks – Future Work We would like to find other possible approaches to block matching, and compare them with each other. We look forward to setting up systematic test cases for block matching. We plan to address the block matching issue for very large ontologies. 2/4/2019 XObjects Group - Southeast University
Thanks for your attention! Any comment is welcome! 2/4/2019 XObjects Group - Southeast University