Presentation is loading. Please wait.

Presentation is loading. Please wait.

2005lav-i1 Local as View: First steps  Introduction and an example  Rewriting queries using views  The Information Manifold system.

Similar presentations


Presentation on theme: "2005lav-i1 Local as View: First steps  Introduction and an example  Rewriting queries using views  The Information Manifold system."— Presentation transcript:

1 2005lav-i1 Local as View: First steps  Introduction and an example  Rewriting queries using views  The Information Manifold system

2 2005lav-i2  Introduction and an example LAV: local as view The sources are modeled as views, typically conjunctive, of the global (virtual) schema (why conjunctive?) Advantage: adding / removing a source are local to the source, and do not impact the global schema (if sufficiently general)

3 2005lav-i3 A source is associated with a view definition; what are the assumptions on its contents? Sound: contains a subset of the view definition Complete: contains all the data in the definition (possibly more) Exact: contains exactly the data in the view definition The common assumption: sound views (fits the WWW environment)

4 2005lav-i4 Answering a conjunctive query: Rewrite the query in terms of the views Equivalent rewriting: the rewriting is equivalent to the query Contained rewriting: the rewriting is contained in the query Maximally contained rewriting: contained in query, not contained in another rewriting For sound views, it is reasonable to search for maximally contained rewritings, then take their union

5 2005lav-i5 Example: a university db (each attribute is explained when it first occurs) course(c#,ti) //c-number, title teaches(pr, c#, qu) // prof, quarter registered(st, c#, qu) //student major(st, dp) //dept advises(pr, st) Views: v0(p, s):- advises(p,s) v1(p,s, q) :- registered(s,c,q), teaches(p,c,q), q>= w97 //winter97

6 2005lav-i6 Q: select a.pr, a.st, r.qu from registered r, teaches t, advises a where r.c# = t.c# and r.qu = t.qu and a.pr = t.pr a.st = r.st and r.qu >=w98 As a conjunctive query : (with individual variables) jumpjump q(p,s,q) :- registered(s,c,q), teaches(p,c,q), advises(p,s), q>=w98 Let Q’: q(p,s,q):- v1(p.s, q), v0(p,s), q>=w98 Expanding by the view defs: Q’’: q(p,s,q) :- registered(s,c,q), teaches(p,c,q), q>= w97, advises(p,s), q>=w98 Q’’ is equivalent to Q, hence this is a good rewriting

7 2005lav-i7 Assume v1 is replaced by v2(s,q) :- registered(s,c,q), teaches(p,c,q), q>= w98 Can we answer the query? The expansion of v2(s,q), v0(p,s): q(p,s,q) :- registered(s,c,q), teaches(p’,c,q), q>= w98, advises(p,s), We use p’ (rather than p), since it is an existential variable of v1  This is not equivalent to the query, nor contained in it Assume q was dropped from both v1 and Q, could we answer Q using v0 and v1?

8 2005lav-i8 Assume v1 is replaced by v3: select r.st, t.pr, r.qu from registered r, teaches t where r.c# = t.c# and r.qu >= win98 Convert to conjunctive! Can we answer the query? Interim summary: To be useable in a query rewriting, views have to export variables that are subject to (arithmetic or join) conditions in the body of the query, or to contain such joins in their bodies, treat query head variables as head variables of the view

9 2005lav-i9  Rewriting queries using views Scenario: We have a collection of view definitions V = {v1, v2, …, vn} Given a query Q, we ask: is there a rewriting that uses the views? If there is, how can it be computed? Applications: Query optimization by using materialized views Data integration

10 2005lav-i10 Example : Q: q(X, U) :- p(X, Y), r(Y, Z), s(X, W), t(W, U) V: v1(A,B) :- p(A,C), r(C,B), s(A, D) A partial rewriting: Q’: q(X, U) :- v1(X, Z), s(X, W), t(W, U) The atom s(A, D) in v1 does not replace s(X, W) in Q (and Q’) Given also v2(A,B) :- s(A, C), t(C, B), r(D, E) Q’’: q(X, U) :- v1(X, Z), v2(X, U) Is a complete equivalent rewriting Note: looking at Q’, it is not evident that by replacing the last two atoms with v2 we obtain an equivalent rewriting; v2 contains r(D, E), and Q’ does not contain r.

11 2005lav-i11 Classification: Partial rewriting: only part of the query is replaced by views useful in query optimization, not in data integration Complete rewriting: only views occur in it useful in both scenarios Equivalent rewriting: yields the same answer on all db’s useful in both cases Contained rewriting: its expansion is a contained query not interesting for query optimization, useful for integration Maximal contained rewriting: contained in query, not in another rewriting We assume (unless stated otherwise) conjunctive queries and views, no b.i. preds

12 2005lav-i12 Assume a query Q q(..):- Qbody is given. A view is usable in an equivalent rewriting if it occurs in some partial equivalent rewriting of Q Claim 1 : jump jump That is, if v(D) is empty, for some db D, then so is Q(D) Proof: If the containment does not hold, then there is a D s.t. v(D) is empty, Q(D) is not, so the rewriting is not equivalent If it holds, then Q’: q(..):- Qbody, v(Y) (Y new vars) is an equivalent partial rewriting (bold letters – vectors of variables/constants)

13 2005lav-i13 Corollary: checking for usability is NP-complete Corollary : if v1,…, vk are usable, then Q’: q(..):- Qbody, v1(Y1), …, vk(Yk) (Yi new and distinct) is an equivalent rewriting Apply the containment map hi to each v(Yi), obtaining Q’’ that is contained in Q’ (and does not introduce any new vars) Q’’ contains Q, hence Q, Q’ Q’’ are equivalent Q’’ still contains all atoms of bodyQ; some of these may be removed by standard minimization; then some views may be removed How many must be left?

14 2005lav-i14 Claim 2 : jumpjump If Qbody has n atoms, then a minimized, equivalent or contained, rewriting Q’ contains at most n (view or regular) atoms Proof: consider the expansion exp(Q’), and the containment mapping from Q to exp(Q’); its image has at most n atoms Q: q(..) :- p1(..), pk(..) pk+1(..) pn(..) Q’: q(..):- p1,(..),, pk(..), v1(..), vm(..) r1,1.. r1k 1 rm,1.. rmk 1

15 2005lav-i15 Claim 3: The problem: is there a complete equivalent rewriting is NP-complete Proof: given v, construct v’: Qhead(..):- Qbody, vbody Then v’ is usable (NP-complete) iff v’ is a complete rewriting Claim 4: The problem above is NP-complete even if the query and the views do not contain repeated predicates in their bodies (which simplifies the search for containment mappings) And also if we are looking for a complete contained rewriting Proof: next page

16 2005lav-i16 Exact cover by 3-sets : (NP-complete problem) Given s={e1,…,en} and sets s1,…,sk, each of size 3, is there a cover of s by a subset of the {sj} where each element occurs in just one set? A reduction from the above to finding a complete rewriting: The query Q: q( ) :- p1(A1,1,…,A1,k),…, pn(An,1,…, An,k) Note: since each sj contains 3 elements, Sj occurs in 3 atoms – this is a join condition in the query body The view vj (for sj): the 3 atoms (as above) for the elements of sj, and a head that contain all the Sm that occur in its body, but not Sj (in the body, Sj is existential)

17 2005lav-i17 Assume a complete contained rewriting Q’ exists There is a containment mapping h : body(Q)  body( exp(Q’)) Note: Sj in exp(vj) is renamed to a new var, say Gj, that does not occur in expansion of other views, since it is existential If h maps pi(..) from Q to the expansion of vj, then the expansion contain pi, so and h(Sj)=Gj  the atoms pm, pq in Q contain Sj And the body of vj contains pi, pm, pq, with Gj instead of Sj Since h(Sj) = Gj, h maps pm and pq from Q also to exp(vj) (Since vj does not export Sj, the join condition on Sj can be satisfied only if all three atoms pi, pm, pq are mapped by h to vj)  The views with expansions in image(h) provide an exact cover

18 2005lav-i18 The other direction (if an exact cover exists, it gives a complete rewriting) - left for you Also: a complete contained rewriting in this case is always an equivalent rewriting

19 2005lav-i19 Comments: For contained rewritings, the characterization (claim1, p.12) of usability does not hold (even if v(D) is empty on some D, and Q(D) is not, v may be useful for a contained rewriting)p.12 Claim 2 ( p. 14 ) holds also for contained rewritings : search for rewritings is restricted by size of query body p. 14 but: if db satisfies functional dependencies, the size bound fails Example: Database: a single relation e(X, Y, Z), with fd: X  Y Query Q: q(X, Y, Z):- e(X, Y, Z) Views: v1(X,Y):- e(X, Y, Z), v2(X,Z):- e(X, Y, Z) A complete rewriting, of minimal size : Q’: q(X, Y, Z):- v1(X, Y), v2(X, Z)

20 2005lav-i20 If the query and views contain b.i. predicates (comparisons) : Let Q’ be Q w/o the comparisons If Q’’ is a complete rewriting for Q’, just add to it the comparisons in Q (with variables suitably renamed), to obtain a complete contained rewriting for Q (or a contradiction!) Example: from p. 6p. 6 Views: v0(p, s):- advises(p,s) v1(p,s, q) :- registered(s,c,q), teaches(p,c,q), q>= w97 A query: q(p,s,q) :- registered(s,c,q), teaches(p,c,q), advises(q,s), q>=w98 A rewriting: q(p,s,q):- v1(p.s, q), v0(p,s), q>=w98 Can we change the example to obtain a contradiction?

21 2005lav-i21  The Information Manifold system (IM) A LAV system, implemented in Bell Labs around 94-96, supported about 100 WWW sources Main ideas: Rewriting queries using views for answering queries Using fine-grained descriptions of sources to eliminate irrelevant sources Support for restricted-capability sources From now, rewriting always means complete, contained

22 2005lav-i22 Rewriting views using views in IM – the bucket algorithm : The goal: Finding rewritings is difficult in worst case But good heuristics often work well in practice Outline: 1)Find for each query atom – a subgoal, views that may be targets of a mapping from it; put them in a bucket for the subgoal (these are candidates) (hopefully, this filters out many candidates) 2)Combine views, one from each bucket; test if 1)there is a containment mapping; 2)adding constraints yields a satisfiable query on the views 3)Minimize each rewriting, eliminate those contained in others 4)Take the union

23 2005lav-i23 Step 1 – computing the bucket for a query atom : Assume Q: q(X) :- p1(U1), …, pn(Un), C(Q) We add h(vj) to bucket(pi(Ui)) if: vj’s definition contains an atom pi(Y) ; // if vj is used, then possibly a containment mapping sends pi(Ui) to pi(Y) if the k’th var of Ui is a head var of Q, then the k’th var of Y is a head var of vj (this var is needed for the query result) h renames variables of vj and (some vars of) Q as follows: If y, k’th in Y, is a head var of vj, then rename to k’th var of Ui // this may be a head var of Q, or used in a join Otherwise, h(y) is a new distinct var h(pi(Ui)) = h(pi(Y)) h(C(Q)) and h(C(vj)) is satisfiable

24 2005lav-i24 Example (same university db) : Views: v1(s, c, q, t) :- registered(s, c, q), course(c, t), c>=500, q>=a98 v2(s, p, c, q) :- registered(s, c, q), teaches(p, c, q) v3(s, c) :- registered(s, c, q), q<=a94 v4(p, c, t, q) :- registered(s, c, q), teaches(p, c, q), course(c, t), q<=a97 Query: q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95 Bucket for registered(s, c, q): v1(s, c, q, t1), v2(s, p1, c, q) (but not v3 – for one(two?) reasons, not v4 – for one reason) Bucket for teaches(p, c, q): v2(s1, p, c, q), v4(p, c, t2, q) Bucket for course(c, t):

25 2005lav-i25 Step 2 – combining views, testing for satisfiability of rewriting : Example (cont’d): Combining 1 st element of each bucket: Q1: q(s, p, c) :- v1(s, c, q, t1), v2(s1, p, c, q), v1(s2, c, q1, t) Minimize by s2  s, c  c, q1  q, t  t1 (3 rd atom is removed) Q1’: q(s, p, c) :- v1(s, c, q, t1), v2(s1, p, c, q) Expand: v1: registered(s, c, q), course(c, t1), c>=500, q>=a98, v2: registered(s, c, q), teaches(p, c, q) With query constraints (under containment mapping) c>=300, q>=a95, this is satisfiable, so we have a contained rewriting

26 2005lav-i26 Another combination: Q2: q(s, p, c) :- v1(s, c, q, t1), v4(p, c, t2, q), v4(p2, c, t, q2) Can minimize, remove 3 rd atom Expansion: v1: registered(s, c, q), course(c, t1), c>=500, q>=a98 v4 : registered(s, c, q), teaches(p, c, q), course(c, t2), q<=a97 The conjunction is unsatisfiable (conditions in two views are contradictory), this is not a rewriting Taking the union of all the rewritings that pass the filters, we obtain (in this example) a maximally contained rewriting

27 2005lav-i27 Another example: The database contains a single relation : flight(from, to, carrier) The views are v1(F, T) :- flight(F, T, wn) // wn is Southwest airlines v2(F, T) :- flight (F, T, ua) // United airlines v3(F, T, C) :- flight(F, Z, C), flight(Z, T, C) A user wants to fly from Tucson to S.F, w/o changing the airline, and with at most one stop Q: q(C) :- flight(tus, sfo, C) q(C) :- flight(tus, Z, C), flight(Z, sfo, C)

28 2005lav-i28 Buckets are computed for each sub-query: (an extension!) For flight(tsu, sfo, C): v1(tus, sfo), v2(tus, sfo), v3(tus, T1, C), v3(F1, sfo, C) Each of these gives a candidate rewriting for q, of which only q1(wn) :- v1(tus, sfo) q2(ua) :- v2(tus, sfo) remain

29 2005lav-i29 An alternative presentation of the bucket (p. 23) Step 1 – computing the bucket for a query atom : Assume Q: q(X) :- p1(U1), …, pn(Un), C(Q) We add h(vj) to bucket(pi(Ui)) if: vj’s definition contains an atom pi(Y) & pi(U) and pi(Y) unify // if vj is used, then possibly a containment mapping sends pi(Ui) to pi(Y) Some more conditions (next page) are satisfied Thus, if vj’s body contains pi twice, it may be added to the bucket (at most) twice For h, see next page

30 2005lav-i30 The additional conditions & the definition of h: (condition) if the k’th var of Ui is a head var of Q, or a join var of Q, then the k’th var of Y is a head var of vj (this var is needed for the query result or for the join) (the condition on join vars does not occur in Alon’s survey paper!?) h renames variables of vj as follows: If y, k’th in Y, is a head var of vj, then rename to k’th var of Ui Otherwise, h(y) is a new distinct var h(pi(Ui)) = h(pi(Y)) this is simply a renaming of the variables of vj (condition) Since pi(Y) and pi(U) are unifiable, h can be extended to (some variables of) Q, so that h(pi(Ui)) = h(pi(Y)) (condition) h(C(Q)) and h(C(vj)) is satisfiable (h is defined on some variables of C(Q), so it maps part of C(Q); that part should be consistent with h(C(vj)) )


Download ppt "2005lav-i1 Local as View: First steps  Introduction and an example  Rewriting queries using views  The Information Manifold system."

Similar presentations


Ads by Google