2005lav-i1 Local as View: First steps  Introduction and an example  Rewriting queries using views  The Information Manifold system.

Slides:



Advertisements
Similar presentations
Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.
Advertisements

CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Information Integration Using Logical Views Jeffrey D. Ullman.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
Automated Reasoning Systems For first order Predicate Logic.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
We have seen that we can use Generalized Modus Ponens (GMP) combined with search to see if a fact is entailed from a Knowledge Base. Unfortunately, there.
1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.
Lecture 23. Subset Sum is NPC
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
ICDT'2001, London, UK1 On Answering Queries in the Presence of Limited Access Patterns Chen Li Stanford University joint work with Edward Chang, UC Santa.
NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =
2005lav-ii1 Local as View: Some refinements  IM: Filtering irrelevant sources  Views with restricted access patterns  A summary of IM.
Efficient Query Evaluation on Probabilistic Databases
1 A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
CS5371 Theory of Computation
Inference and Resolution for Problem Solving
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
Catriel Beeri Pls/Winter 2004/5 type reconstruction 1 Type Reconstruction & Parametric Polymorphism  Introduction  Unification and type reconstruction.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
Approximation Algorithms
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
Search in the semantic domain. Some definitions atomic formula: smallest formula possible (no sub- formulas) literal: atomic formula or negation of an.
2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.
CSE 636 Data Integration Answering Queries Using Views Overview.
2005conjunctive1 Query languages, equivalence & containment  conjunctive queries – CQ’s  More expressive languages.
Last time Proof-system search ( ` ) Interpretation search ( ² ) Quantifiers Equality Decision procedures Induction Cross-cutting aspectsMain search strategy.
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine) Materializing Views With Minimal Size To Answer Queries.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
Slide 1 Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs Jim Little UBC CS 322 – CSP October 20, 2014.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28– Interpretation; Herbrand Interpertation 30 th Sept, 2010.
Answering Queries Using Views LMSS’95 Laks V.S. Lakshmanan Dept. of Comp. Science UBC.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
CS Introduction to AI Tutorial 8 Resolution Tutorial 8 Resolution.
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
Automated Reasoning Systems For first order Predicate Logic.
CS848 Presentation Heng YU (Henry)
LDK R Logics for Data and Knowledge Representation ClassL (Propositional Description Logic with Individuals) 1.
CS 3343: Analysis of Algorithms Lecture 19: Introduction to Greedy Algorithms.
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Approximation Algorithms based on linear programming.
1 Section 7.1 First-Order Predicate Calculus Predicate calculus studies the internal structure of sentences where subjects are applied to predicates existentially.
Answering Queries Using Views Presented by: Mahmoud ELIAS.
An Introduction to Linear Programming
Introduction to Logic for Artificial Intelligence Lecture 2
Chapter 5. Optimal Matchings
Local-as-View Mediators
Chapter 2: Intro to Relational Model
Equivalence of Aggregate Queries in Conjunctive QL
Materializing Views With Minimal Size To Answer Queries
Presentation transcript:

2005lav-i1 Local as View: First steps  Introduction and an example  Rewriting queries using views  The Information Manifold system

2005lav-i2  Introduction and an example LAV: local as view The sources are modeled as views, typically conjunctive, of the global (virtual) schema (why conjunctive?) Advantage: adding / removing a source are local to the source, and do not impact the global schema (if sufficiently general)

2005lav-i3 A source is associated with a view definition; what are the assumptions on its contents? Sound: contains a subset of the view definition Complete: contains all the data in the definition (possibly more) Exact: contains exactly the data in the view definition The common assumption: sound views (fits the WWW environment)

2005lav-i4 Answering a conjunctive query: Rewrite the query in terms of the views Equivalent rewriting: the rewriting is equivalent to the query Contained rewriting: the rewriting is contained in the query Maximally contained rewriting: contained in query, not contained in another rewriting For sound views, it is reasonable to search for maximally contained rewritings, then take their union

2005lav-i5 Example: a university db (each attribute is explained when it first occurs) course(c#,ti) //c-number, title teaches(pr, c#, qu) // prof, quarter registered(st, c#, qu) //student major(st, dp) //dept advises(pr, st) Views: v0(p, s):- advises(p,s) v1(p,s, q) :- registered(s,c,q), teaches(p,c,q), q>= w97 //winter97

2005lav-i6 Q: select a.pr, a.st, r.qu from registered r, teaches t, advises a where r.c# = t.c# and r.qu = t.qu and a.pr = t.pr a.st = r.st and r.qu >=w98 As a conjunctive query : (with individual variables) jumpjump q(p,s,q) :- registered(s,c,q), teaches(p,c,q), advises(p,s), q>=w98 Let Q’: q(p,s,q):- v1(p.s, q), v0(p,s), q>=w98 Expanding by the view defs: Q’’: q(p,s,q) :- registered(s,c,q), teaches(p,c,q), q>= w97, advises(p,s), q>=w98 Q’’ is equivalent to Q, hence this is a good rewriting

2005lav-i7 Assume v1 is replaced by v2(s,q) :- registered(s,c,q), teaches(p,c,q), q>= w98 Can we answer the query? The expansion of v2(s,q), v0(p,s): q(p,s,q) :- registered(s,c,q), teaches(p’,c,q), q>= w98, advises(p,s), We use p’ (rather than p), since it is an existential variable of v1  This is not equivalent to the query, nor contained in it Assume q was dropped from both v1 and Q, could we answer Q using v0 and v1?

2005lav-i8 Assume v1 is replaced by v3: select r.st, t.pr, r.qu from registered r, teaches t where r.c# = t.c# and r.qu >= win98 Convert to conjunctive! Can we answer the query? Interim summary: To be useable in a query rewriting, views have to export variables that are subject to (arithmetic or join) conditions in the body of the query, or to contain such joins in their bodies, treat query head variables as head variables of the view

2005lav-i9  Rewriting queries using views Scenario: We have a collection of view definitions V = {v1, v2, …, vn} Given a query Q, we ask: is there a rewriting that uses the views? If there is, how can it be computed? Applications: Query optimization by using materialized views Data integration

2005lav-i10 Example : Q: q(X, U) :- p(X, Y), r(Y, Z), s(X, W), t(W, U) V: v1(A,B) :- p(A,C), r(C,B), s(A, D) A partial rewriting: Q’: q(X, U) :- v1(X, Z), s(X, W), t(W, U) The atom s(A, D) in v1 does not replace s(X, W) in Q (and Q’) Given also v2(A,B) :- s(A, C), t(C, B), r(D, E) Q’’: q(X, U) :- v1(X, Z), v2(X, U) Is a complete equivalent rewriting Note: looking at Q’, it is not evident that by replacing the last two atoms with v2 we obtain an equivalent rewriting; v2 contains r(D, E), and Q’ does not contain r.

2005lav-i11 Classification: Partial rewriting: only part of the query is replaced by views useful in query optimization, not in data integration Complete rewriting: only views occur in it useful in both scenarios Equivalent rewriting: yields the same answer on all db’s useful in both cases Contained rewriting: its expansion is a contained query not interesting for query optimization, useful for integration Maximal contained rewriting: contained in query, not in another rewriting We assume (unless stated otherwise) conjunctive queries and views, no b.i. preds

2005lav-i12 Assume a query Q q(..):- Qbody is given. A view is usable in an equivalent rewriting if it occurs in some partial equivalent rewriting of Q Claim 1 : jump jump That is, if v(D) is empty, for some db D, then so is Q(D) Proof: If the containment does not hold, then there is a D s.t. v(D) is empty, Q(D) is not, so the rewriting is not equivalent If it holds, then Q’: q(..):- Qbody, v(Y) (Y new vars) is an equivalent partial rewriting (bold letters – vectors of variables/constants)

2005lav-i13 Corollary: checking for usability is NP-complete Corollary : if v1,…, vk are usable, then Q’: q(..):- Qbody, v1(Y1), …, vk(Yk) (Yi new and distinct) is an equivalent rewriting Apply the containment map hi to each v(Yi), obtaining Q’’ that is contained in Q’ (and does not introduce any new vars) Q’’ contains Q, hence Q, Q’ Q’’ are equivalent Q’’ still contains all atoms of bodyQ; some of these may be removed by standard minimization; then some views may be removed How many must be left?

2005lav-i14 Claim 2 : jumpjump If Qbody has n atoms, then a minimized, equivalent or contained, rewriting Q’ contains at most n (view or regular) atoms Proof: consider the expansion exp(Q’), and the containment mapping from Q to exp(Q’); its image has at most n atoms Q: q(..) :- p1(..), pk(..) pk+1(..) pn(..) Q’: q(..):- p1,(..),, pk(..), v1(..), vm(..) r1,1.. r1k 1 rm,1.. rmk 1

2005lav-i15 Claim 3: The problem: is there a complete equivalent rewriting is NP-complete Proof: given v, construct v’: Qhead(..):- Qbody, vbody Then v’ is usable (NP-complete) iff v’ is a complete rewriting Claim 4: The problem above is NP-complete even if the query and the views do not contain repeated predicates in their bodies (which simplifies the search for containment mappings) And also if we are looking for a complete contained rewriting Proof: next page

2005lav-i16 Exact cover by 3-sets : (NP-complete problem) Given s={e1,…,en} and sets s1,…,sk, each of size 3, is there a cover of s by a subset of the {sj} where each element occurs in just one set? A reduction from the above to finding a complete rewriting: The query Q: q( ) :- p1(A1,1,…,A1,k),…, pn(An,1,…, An,k) Note: since each sj contains 3 elements, Sj occurs in 3 atoms – this is a join condition in the query body The view vj (for sj): the 3 atoms (as above) for the elements of sj, and a head that contain all the Sm that occur in its body, but not Sj (in the body, Sj is existential)

2005lav-i17 Assume a complete contained rewriting Q’ exists There is a containment mapping h : body(Q)  body( exp(Q’)) Note: Sj in exp(vj) is renamed to a new var, say Gj, that does not occur in expansion of other views, since it is existential If h maps pi(..) from Q to the expansion of vj, then the expansion contain pi, so and h(Sj)=Gj  the atoms pm, pq in Q contain Sj And the body of vj contains pi, pm, pq, with Gj instead of Sj Since h(Sj) = Gj, h maps pm and pq from Q also to exp(vj) (Since vj does not export Sj, the join condition on Sj can be satisfied only if all three atoms pi, pm, pq are mapped by h to vj)  The views with expansions in image(h) provide an exact cover

2005lav-i18 The other direction (if an exact cover exists, it gives a complete rewriting) - left for you Also: a complete contained rewriting in this case is always an equivalent rewriting

2005lav-i19 Comments: For contained rewritings, the characterization (claim1, p.12) of usability does not hold (even if v(D) is empty on some D, and Q(D) is not, v may be useful for a contained rewriting)p.12 Claim 2 ( p. 14 ) holds also for contained rewritings : search for rewritings is restricted by size of query body p. 14 but: if db satisfies functional dependencies, the size bound fails Example: Database: a single relation e(X, Y, Z), with fd: X  Y Query Q: q(X, Y, Z):- e(X, Y, Z) Views: v1(X,Y):- e(X, Y, Z), v2(X,Z):- e(X, Y, Z) A complete rewriting, of minimal size : Q’: q(X, Y, Z):- v1(X, Y), v2(X, Z)

2005lav-i20 If the query and views contain b.i. predicates (comparisons) : Let Q’ be Q w/o the comparisons If Q’’ is a complete rewriting for Q’, just add to it the comparisons in Q (with variables suitably renamed), to obtain a complete contained rewriting for Q (or a contradiction!) Example: from p. 6p. 6 Views: v0(p, s):- advises(p,s) v1(p,s, q) :- registered(s,c,q), teaches(p,c,q), q>= w97 A query: q(p,s,q) :- registered(s,c,q), teaches(p,c,q), advises(q,s), q>=w98 A rewriting: q(p,s,q):- v1(p.s, q), v0(p,s), q>=w98 Can we change the example to obtain a contradiction?

2005lav-i21  The Information Manifold system (IM) A LAV system, implemented in Bell Labs around 94-96, supported about 100 WWW sources Main ideas: Rewriting queries using views for answering queries Using fine-grained descriptions of sources to eliminate irrelevant sources Support for restricted-capability sources From now, rewriting always means complete, contained

2005lav-i22 Rewriting views using views in IM – the bucket algorithm : The goal: Finding rewritings is difficult in worst case But good heuristics often work well in practice Outline: 1)Find for each query atom – a subgoal, views that may be targets of a mapping from it; put them in a bucket for the subgoal (these are candidates) (hopefully, this filters out many candidates) 2)Combine views, one from each bucket; test if 1)there is a containment mapping; 2)adding constraints yields a satisfiable query on the views 3)Minimize each rewriting, eliminate those contained in others 4)Take the union

2005lav-i23 Step 1 – computing the bucket for a query atom : Assume Q: q(X) :- p1(U1), …, pn(Un), C(Q) We add h(vj) to bucket(pi(Ui)) if: vj’s definition contains an atom pi(Y) ; // if vj is used, then possibly a containment mapping sends pi(Ui) to pi(Y) if the k’th var of Ui is a head var of Q, then the k’th var of Y is a head var of vj (this var is needed for the query result) h renames variables of vj and (some vars of) Q as follows: If y, k’th in Y, is a head var of vj, then rename to k’th var of Ui // this may be a head var of Q, or used in a join Otherwise, h(y) is a new distinct var h(pi(Ui)) = h(pi(Y)) h(C(Q)) and h(C(vj)) is satisfiable

2005lav-i24 Example (same university db) : Views: v1(s, c, q, t) :- registered(s, c, q), course(c, t), c>=500, q>=a98 v2(s, p, c, q) :- registered(s, c, q), teaches(p, c, q) v3(s, c) :- registered(s, c, q), q<=a94 v4(p, c, t, q) :- registered(s, c, q), teaches(p, c, q), course(c, t), q<=a97 Query: q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95 Bucket for registered(s, c, q): v1(s, c, q, t1), v2(s, p1, c, q) (but not v3 – for one(two?) reasons, not v4 – for one reason) Bucket for teaches(p, c, q): v2(s1, p, c, q), v4(p, c, t2, q) Bucket for course(c, t):

2005lav-i25 Step 2 – combining views, testing for satisfiability of rewriting : Example (cont’d): Combining 1 st element of each bucket: Q1: q(s, p, c) :- v1(s, c, q, t1), v2(s1, p, c, q), v1(s2, c, q1, t) Minimize by s2  s, c  c, q1  q, t  t1 (3 rd atom is removed) Q1’: q(s, p, c) :- v1(s, c, q, t1), v2(s1, p, c, q) Expand: v1: registered(s, c, q), course(c, t1), c>=500, q>=a98, v2: registered(s, c, q), teaches(p, c, q) With query constraints (under containment mapping) c>=300, q>=a95, this is satisfiable, so we have a contained rewriting

2005lav-i26 Another combination: Q2: q(s, p, c) :- v1(s, c, q, t1), v4(p, c, t2, q), v4(p2, c, t, q2) Can minimize, remove 3 rd atom Expansion: v1: registered(s, c, q), course(c, t1), c>=500, q>=a98 v4 : registered(s, c, q), teaches(p, c, q), course(c, t2), q<=a97 The conjunction is unsatisfiable (conditions in two views are contradictory), this is not a rewriting Taking the union of all the rewritings that pass the filters, we obtain (in this example) a maximally contained rewriting

2005lav-i27 Another example: The database contains a single relation : flight(from, to, carrier) The views are v1(F, T) :- flight(F, T, wn) // wn is Southwest airlines v2(F, T) :- flight (F, T, ua) // United airlines v3(F, T, C) :- flight(F, Z, C), flight(Z, T, C) A user wants to fly from Tucson to S.F, w/o changing the airline, and with at most one stop Q: q(C) :- flight(tus, sfo, C) q(C) :- flight(tus, Z, C), flight(Z, sfo, C)

2005lav-i28 Buckets are computed for each sub-query: (an extension!) For flight(tsu, sfo, C): v1(tus, sfo), v2(tus, sfo), v3(tus, T1, C), v3(F1, sfo, C) Each of these gives a candidate rewriting for q, of which only q1(wn) :- v1(tus, sfo) q2(ua) :- v2(tus, sfo) remain

2005lav-i29 An alternative presentation of the bucket (p. 23) Step 1 – computing the bucket for a query atom : Assume Q: q(X) :- p1(U1), …, pn(Un), C(Q) We add h(vj) to bucket(pi(Ui)) if: vj’s definition contains an atom pi(Y) & pi(U) and pi(Y) unify // if vj is used, then possibly a containment mapping sends pi(Ui) to pi(Y) Some more conditions (next page) are satisfied Thus, if vj’s body contains pi twice, it may be added to the bucket (at most) twice For h, see next page

2005lav-i30 The additional conditions & the definition of h: (condition) if the k’th var of Ui is a head var of Q, or a join var of Q, then the k’th var of Y is a head var of vj (this var is needed for the query result or for the join) (the condition on join vars does not occur in Alon’s survey paper!?) h renames variables of vj as follows: If y, k’th in Y, is a head var of vj, then rename to k’th var of Ui Otherwise, h(y) is a new distinct var h(pi(Ui)) = h(pi(Y)) this is simply a renaming of the variables of vj (condition) Since pi(Y) and pi(U) are unifiable, h can be extended to (some variables of) Q, so that h(pi(Ui)) = h(pi(Y)) (condition) h(C(Q)) and h(C(vj)) is satisfiable (h is defined on some variables of C(Q), so it maps part of C(Q); that part should be consistent with h(C(vj)) )