SEDEX: Scalable Entity Preserving Data Exchange

SEDEX: Scalable Entity Preserving Data Exchange
IEEE Transactions on Knowledge and Data Engineering, 28(7), Yoones A. Sekhavat1; Jeffrey Parsons2 1Tabriz Islamic Art University, 2Memorial University of Newfoundland Introduction Overview of Approach We study the problem of information integration through data exchange. The task is to generate an instance of a target schema from an instance of a source schema, such that the generated instance adheres to the target schema. Prevailing approaches are based on schema mappings – high level expressions that describe relationships between database schemas. Schema mapping-based approaches suffer from two problems: Entity Fragmentation – information about one entity is spread across several tuples in the target Generalization ambiguity – incorrect mappings resulting from different methods to represent entity type generalization in source and target We propose and evaluate an approach that addresses these problems by combining schema and data level information Preserve source entities in the target, regardless of classification of entities Identify entities in the source and find best relations in target to host source entities Our approach finds tree similarity between source tuple trees and target relation trees by computing distance functions between trees Pay-as-you-go workflow that reuses data generation scripts using pq-grams Prior Work vs. Our Approach Evaluation Clio introduced the idea of data exchange based on schema mappings [1]. Idea is to generate a core (optimal) solution from among universal solutions [2] by post-processing [3] or pre-processing [4]. Existing solutions do not handle some situations properly, due to gaps between schema mapping and data mapping. We propose a scalable method (SEDEX) to bridge the gap between schema-level and data level approaches in data exchange that: Avoids entity fragmentation Resolves ambiguous data exchange scenarios resulting from different implementations of generalization relations We performed a comprehensive set of experiments to evaluate the quality and scalability of data exchange compared to ++Spicy [4] and EDEX (an earlier version of SEDEX [5]). Used iBench to generate schemas, schema constraints, and correspondences between schemas [6]. Size of target instance was used as measure of quality. SEDEX Architecture Conclusions To address the problems of entity fragmentation and ambiguity in traditional schema-mapping data exchange scenarios, we propose SEDEX, a scalable data exchange method that focuses on preserving source entities in the target. The method works by creating tree representations of source entities and target relations. Using tree similarity metrics, we find the best target relation to match a source entity. We show that SEDEX generates good solutions, resolves ambiguities that state-of-the-art methods do not handle, and scales to large data exchange scenarios. Further research is needed to examine other types of ambiguous data exchange scenarios. Problem Statement Given a source instance I (that may contain null values representing nonexistent properties), a set of target egds (i.e., primary key constraints denoted G), and a set of direct correspondences ∑ between schema level properties of source and target, generate an instance J of the target schema, such that all source properties having a correspondence in the target are reflected in the target without entity fragmentation and ambiguity. References R. J. Miller, L. M. Haas, and M. A. Hernadez, “Schema mapping as query discovery,” in Proc. 26th Int. Conf. Very Large Data Bases, 2000, pp. 77–88. R. Fagin, P. G. Kolaitis, and L. Popa, “Data exchange: Getting to the core,” ACM Trans. Database Syst., vol. 30, no. 1, pp. 174–210, 2005. R. Pichler and V. Savenkov, “Towards practical feasibility of core computation in data exchange,” Theoretical Comput. Sci., vol. 411, nos. 7–9, pp. 935–957, 2010. B. Marnette, G. Mecca, and P. Papotti, “Scalable data exchange with functional dependencies,” Proc. Very Large Data Bases Endowment, vol. 3, nos. 1-2, pp. 105–116, 2010. . Y. A. Sekhavat and J. Parsons, “EDEX: Entity preserving data exchange,” in Proc. 2nd Int. Conf. Data Technol. Appl., 2013, pp. 221–229.. P. C. Arocena, B. Glavic, R. Ciucanu, and R. J. Miller, “The iBench integration metadata generator,” Proc. Very Large Data Bases Endowment, vol. 9, no. 3, pp. 108–119, 2015.

SEDEX: Scalable Entity Preserving Data Exchange

Similar presentations

Presentation on theme: "SEDEX: Scalable Entity Preserving Data Exchange"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SEDEX: Scalable Entity Preserving Data Exchange

Similar presentations

Presentation on theme: "SEDEX: Scalable Entity Preserving Data Exchange"— Presentation transcript:

Similar presentations

About project

Feedback