A Principled Approach to Data Integration and Reconciliation in Data Warehousing Diego Calvanese Giuseppe De Giacomo Maurizio Lenzerini Daniele Nardi Riccardo.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
CS 540 Database Management Systems
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Copyright 2008 Tieto Corporation Database merge. Copyright 2008 Tieto Corporation Table of contents Please, do not remove this slide if you want to use.
DL-LITE: TRACTABLE DESCRIPTION LOGICS FOR ONTOLOGIES AUTHORS: DIEGO CALVANESE, GIUSEPPE DE GIACOMO, DOMENICO LEMBO, MAURIZIO LENZERINI, RICCARDO ROSATI.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
An Extensible System for Merging Two Models Rachel Pottinger University of Washington Supervisors: Phil Bernstein and Alon Halevy.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Catriel Beeri Pls/Winter 2004/5 environment1 1 The Environment Model  Introduction and overview  A look at the execution model  Dynamic scoping  Static.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
IBM Proof of Technology Discovering the Value of SOA with WebSphere Process Integration © 2005 IBM Corporation SOA on your terms and our expertise WebSphere.
Genetic Programming on Program Traces as an Inference Engine for Probabilistic Languages Vita Batishcheva, Alexey Potapov
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
ITEC224 Database Programming
FEN  Concepts and terminology  Operations (relational algebra)  Integrity constraints The relational model.
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector.
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
Chapter 2 Adapted from Silberschatz, et al. CHECK SLIDE 16.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
An Algebra for Composing Access Control Policies (2002) Author: PIERO BONATTI, SABRINA DE CAPITANI DI, PIERANGELA SAMARATI Presenter: Siqing Du Date:
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Dimitrios Skoutas Alkis Simitsis
Clinical Quality Language (CQL) Bryn Rhodes Chris Moesel Mark Kramer.
FEN Introduction to the database field:  The Relational Model Seminar: Introduction to relational databases.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
The Relational Model. 2 Relational Model Terminology u A relation is a table with columns and rows. –Only applies to logical structure of the database,
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Quality Assurance in the Presence of Variability Kim Lauenroth, Andreas Metzger, Klaus Pohl Institute for Computer Science and Business Information Systems.
CS 540 Database Management Systems
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Chapter 13: Query Processing
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
15.1 – Introduction to physical-Query-plan operators
CS 440 Database Management Systems
Chapter 12: Query Processing
SQL Structured Query Language 11/9/2018 Introduction to Databases.
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Program Design Introduction to Computer Programming By:
A Unifying View on Instance Selection
More SQL: Complex Queries, Triggers, Views, and Schema Modification
CS Chapter 3 (3A and ) Part 3 of 8
CS Chapter 3 (3A and ) – Part 2 of 5
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

A Principled Approach to Data Integration and Reconciliation in Data Warehousing Diego Calvanese Giuseppe De Giacomo Maurizio Lenzerini Daniele Nardi Riccardo Rosati Presented by Alan Wessman

Introduction Problem: Acquire data from a set of sources for a particular application Typical architecture: wrappers and mediators Core problem: specify and implement mediators Paper focus: Data warehouses

Data Warehouse Integration Most sources internal to organization Need global corporate view of data Conceptual model defines sources and data warehouse (local-as-view) Three levels of architecture Conceptual: Global model Logical: Query specifications for sources and warehouse Physical: Wrappers and mediators implementing query specifications

Architecture Conceptual Model Source 1Source 2Data Warehouse q1, q2 q3, q4, q5 q6, q7

Specifying Logical Schemas For each table of source S, create an adorned query Head: Table name, # columns Body: Content of table (query over conceptual model) Adornment: Domains (data types) of columns Key attributes

Adorned Query: Example Conceptual ModelSource 1Source 2 Euro LiraYen Halibut(Date, Price) <- Menu(Date, ‘Halibut’, Price) | Price :: Lira, Date :: JulianDate Swordfish(Date, Price) <- Menu(Date, ‘Swordfish’, Price) | Price :: Lira, Date :: JulianDate SushiMenu(TunaPrice, SquidPrice, Date) <- Menu(Date, ‘Tuna’, TunaPrice), Menu(Date, ‘Squid’, SquidPrice) | TunaPrice :: Yen, SquidPrice :: Yen, Date :: JulianDate

Query Consistency Let Q be an adorned query and B its body. Let M be the conceptual model. B is inconsistent wrt M if for every interpretation of M, evaluation of B is empty Q is inconsistent wrt M if either B is inconsistent or the annotations are inconsistent Inference techniques exist for checking query consistency

Interschema Correspondences Specify how data in different schemas relates Non-materialized relational tables (computed on-demand) Like adorned query but annotations identify helper programs Reusable by other correspondences

Interschema Correspondences Three types of correspondence Conversion How data from one source is converted into data fitting a different schema Matching How data from different sources matches Reconciliation How data from different sources is reconciled to become data in the warehouse

Conversion Correspondence How data from one source is converted into data fitting a different schema convert([x], [y]) <- conj(x, y, z) through program(x, y, z) conj: Conjunctive query, specifies when conversion applies program: Program that performs the conversion x: Input tuple of values satisfying conditions for x in conj y: Output tuple of values satisfying conditions for y in conj z: Additional parameters required by program

Matching Correspondence How data from different sources matches match([x 1 ], …, [x k ]) <- conj(x 1, …, x k, z) through program(x 1, …, x k, z) Differs from Conversion Correspondence in use of k tuples that may be matched program returns true if the k tuples match

Reconciliation Correspondence How data from different sources is reconciled to the warehouse reconcile([x 1 ], …, [x k ], [z]) <- conj(x 1, …, x k, z, w) through program(x 1, …, x k, z, w) z: Data warehouse tuple; result of reconciliation. w: Additional parameters (like z in previous slides)

Reusing Correspondences Only reuse if previously defined Example 1 match([x], [y]) <- convert 1 ([x], [z]), convert 2 ([y], [z]), conj(x, y, z, w) through none Example 2 reconcile([x], [y], [z]) <- convert 1 ([x], [w 1 ]), convert 2 ([y], [w 2 ]), match 1 ([w 1 ], [w 2 ]), convert 3 ([w 1 ], [z]), conj(x, y, z, w) through none

Specifying Mediators Aim: Specify for each relation in warehouse how the tuples should be constructed from the sources Task: Materialize a new relation T in the warehouse Steps: 1. Specify T as an adorned query q <- q’ | c 1, …, c n 2. Look for a rewriting of q in terms of queries q 1, …, q s corresponding to materialized views in the warehouse 3. Look for a rewriting of (what remains of q) in terms of queries corresponding to tables in the sources and the conversion, matching, and reconciliation correspondences Resulting query is specification for the mediator for T

Computing the Rewriting Rewriting typically needs to merge results of several queries Produce set of merging clauses Form: merging tuple-spec 1 and … and tuple-spec n such that matching-condition into tuple-spec t 1 and … and tuple-spec t m Generates template; designer specifies “such that” and “into” parts, or writes custom merging clauses

Conclusion Start with conceptual model and several types of correspondences Query rewriting algorithm generates mediator specifications Designer fills in any remaining details No empirical results