Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Principled Approach to Data Integration and Reconciliation in Data Warehousing Diego Calvanese Giuseppe De Giacomo Maurizio Lenzerini Daniele Nardi Riccardo.

Similar presentations


Presentation on theme: "A Principled Approach to Data Integration and Reconciliation in Data Warehousing Diego Calvanese Giuseppe De Giacomo Maurizio Lenzerini Daniele Nardi Riccardo."— Presentation transcript:

1 A Principled Approach to Data Integration and Reconciliation in Data Warehousing Diego Calvanese Giuseppe De Giacomo Maurizio Lenzerini Daniele Nardi Riccardo Rosati Presented by Alan Wessman

2 Introduction Problem: Acquire data from a set of sources for a particular application Typical architecture: wrappers and mediators Core problem: specify and implement mediators Paper focus: Data warehouses

3 Data Warehouse Integration Most sources internal to organization Need global corporate view of data Conceptual model defines sources and data warehouse (local-as-view) Three levels of architecture Conceptual: Global model Logical: Query specifications for sources and warehouse Physical: Wrappers and mediators implementing query specifications

4 Architecture Conceptual Model Source 1Source 2Data Warehouse q1, q2 q3, q4, q5 q6, q7

5 Specifying Logical Schemas For each table of source S, create an adorned query Head: Table name, # columns Body: Content of table (query over conceptual model) Adornment: Domains (data types) of columns Key attributes

6 Adorned Query: Example Conceptual ModelSource 1Source 2 Euro LiraYen Halibut(Date, Price) <- Menu(Date, ‘Halibut’, Price) | Price :: Lira, Date :: JulianDate Swordfish(Date, Price) <- Menu(Date, ‘Swordfish’, Price) | Price :: Lira, Date :: JulianDate SushiMenu(TunaPrice, SquidPrice, Date) <- Menu(Date, ‘Tuna’, TunaPrice), Menu(Date, ‘Squid’, SquidPrice) | TunaPrice :: Yen, SquidPrice :: Yen, Date :: JulianDate

7 Query Consistency Let Q be an adorned query and B its body. Let M be the conceptual model. B is inconsistent wrt M if for every interpretation of M, evaluation of B is empty Q is inconsistent wrt M if either B is inconsistent or the annotations are inconsistent Inference techniques exist for checking query consistency

8 Interschema Correspondences Specify how data in different schemas relates Non-materialized relational tables (computed on-demand) Like adorned query but annotations identify helper programs Reusable by other correspondences

9 Interschema Correspondences Three types of correspondence Conversion How data from one source is converted into data fitting a different schema Matching How data from different sources matches Reconciliation How data from different sources is reconciled to become data in the warehouse

10 Conversion Correspondence How data from one source is converted into data fitting a different schema convert([x], [y]) <- conj(x, y, z) through program(x, y, z) conj: Conjunctive query, specifies when conversion applies program: Program that performs the conversion x: Input tuple of values satisfying conditions for x in conj y: Output tuple of values satisfying conditions for y in conj z: Additional parameters required by program

11 Matching Correspondence How data from different sources matches match([x 1 ], …, [x k ]) <- conj(x 1, …, x k, z) through program(x 1, …, x k, z) Differs from Conversion Correspondence in use of k tuples that may be matched program returns true if the k tuples match

12 Reconciliation Correspondence How data from different sources is reconciled to the warehouse reconcile([x 1 ], …, [x k ], [z]) <- conj(x 1, …, x k, z, w) through program(x 1, …, x k, z, w) z: Data warehouse tuple; result of reconciliation. w: Additional parameters (like z in previous slides)

13 Reusing Correspondences Only reuse if previously defined Example 1 match([x], [y]) <- convert 1 ([x], [z]), convert 2 ([y], [z]), conj(x, y, z, w) through none Example 2 reconcile([x], [y], [z]) <- convert 1 ([x], [w 1 ]), convert 2 ([y], [w 2 ]), match 1 ([w 1 ], [w 2 ]), convert 3 ([w 1 ], [z]), conj(x, y, z, w) through none

14 Specifying Mediators Aim: Specify for each relation in warehouse how the tuples should be constructed from the sources Task: Materialize a new relation T in the warehouse Steps: 1. Specify T as an adorned query q <- q’ | c 1, …, c n 2. Look for a rewriting of q in terms of queries q 1, …, q s corresponding to materialized views in the warehouse 3. Look for a rewriting of (what remains of q) in terms of queries corresponding to tables in the sources and the conversion, matching, and reconciliation correspondences Resulting query is specification for the mediator for T

15 Computing the Rewriting Rewriting typically needs to merge results of several queries Produce set of merging clauses Form: merging tuple-spec 1 and … and tuple-spec n such that matching-condition into tuple-spec t 1 and … and tuple-spec t m Generates template; designer specifies “such that” and “into” parts, or writes custom merging clauses

16 Conclusion Start with conceptual model and several types of correspondences Query rewriting algorithm generates mediator specifications Designer fills in any remaining details No empirical results


Download ppt "A Principled Approach to Data Integration and Reconciliation in Data Warehousing Diego Calvanese Giuseppe De Giacomo Maurizio Lenzerini Daniele Nardi Riccardo."

Similar presentations


Ads by Google