Information Integration Using Logical Views Jeffrey D. Ullman
Overview Information Integration Systems Global-as-view (Gav.) vs. Local-as-view (Lav.) Query Reformulation Specification of Source Description Adding new sources
Query Reformulation Problem: rewrite a user query expressed in the mediated schema into a query expressed in the source schema Given a query Q in terms of the mediator schema relations, and descriptions of information sources Find a query Q’ that uses only the source relations, such that – Q’ Q, and – Q’ provides all possible answers to Q given the sources
Solving Queries by Views Mediator Relations Source Relations
Query Rewriting Using Views Query Containment: q’ q D q’(D) q(D) Query Equivalence: q’=q q’ q ^ q q’ Given query q and view definitions V={v1, …, vn} q’ is an Equivalent Rewriting of q using V if – q’ refers only to views in V, and – q’ = q q’ is an Maximally-Contained Rewriting of q using V if – q’ refers only to views in V and – q’ q, and – There is no rewriting q1, such that q’ q1 and q1 q’
Computation Complexity
Complexity of Query Containment Conjunctive Queries (CQ) (NP-Complete) – Q1: p(X,Z) :- a(X,Y) & a(Y,Z) – Q2: p(X,Z) :- a(X,Y) & a(V,Z) CQ’s With Negation ( -Complete) – Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & NOT a(X,Z) CQ’s With Arithmetic Comparision ( -Complete) – Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & X<Y Datalog Programs – p(A,C) :- a(A,B) & b(B,C)
Specification of Source Description Views: resources that used by integrator to help to answer queries Gav. Mediator relation defined as view over source relations Lav. Source relation defined as view over mediator relations
Information Integration Systems Information Manifold (IM) – AT&T – Local-as-View (Lav) – Description logic – Source relations defined as views of mediator relations ( a collection of global predictions) Tsimmis – Stanford and IBM – Global-as-View (Gav) – Mediator relations defined as views of source relations
IM Example Global Predicates: Mediator relations
IM Example (Cont.) Views: Source Relations Query: “What are Sally’s phone and office?” Mediator Relations
IM Example (Cont.) Answer: Source Relations Query reformulation : Bucket Algorithm (check query containment NP-Complete (query length) )
Advantages and Disadvantages (IM) Advantage: adding new sources – Mediator (global predicates, source descriptions) – Query processing Disadvantages : query reformulation (Bucket algorithm)
Tsimmis OEM and MSL Mediator Relations
Tsimmis Example Exported OEM Objects Query: “What are Sally’s phone and office?” Mediator Relations Source Relations
Advantage and Disadvantage ( Tsimmis) Advantage – Query reformulation: rule unfolding Disadvantage – Mediation description – Adding, removing, and modifying source description
IM vs. Tsimmis Query Reformulation Adding Sources Levels of Mediation Semistructured Data Constraints Automatic Generation of Components (Wrappers and Mediators)