2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

Slides:

Advertisements

Similar presentations

SLD-resolution Introduction Most general unifiers SLD-resolution

Advertisements

CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.

2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Copyright © C. J. Date 2005page 97 S#Y S1DURINGS3DURING [d04:d10][d08:d10] S2DURINGS4DURING [d02:d04][d04:d10] [d08:d10] WITH ( EXTEND T2 ADD ( COLLAPSE.

CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.

Domain Restriction on Relation domain restriction operator,, restricts a relation to only those members whose domain is in a specified set. domain restriction.

1 Constraint operations: Simplification, Optimization and Implication.

1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.

Closure Properties of CFL's

1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.

SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.

Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.

1.2 Row Reduction and Echelon Forms

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.

1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.

Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.

A scalable algorithm for answering queries using views Rachel Pottinger, Alon Levy [2000] Rachel Pottinger and Alon Y. Levy A Scalable Algorithm for Answering.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.

Catriel Beeri Pls/Winter 2004/5 type reconstruction 1 Type Reconstruction & Parametric Polymorphism  Introduction  Unification and type reconstruction.

Local-as-View Mediators Priya Gangaraju(Class Id:203)

Winter 2004/5Pls – inductive – Catriel Beeri1 Inductive Definitions (our meta-language for specifications)  Examples  Syntax  Semantics  Proof Trees.

Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.

1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.

Search in the semantic domain. Some definitions atomic formula: smallest formula possible (no sub- formulas) literal: atomic formula or negation of an.

CSE 636 Data Integration Answering Queries Using Views MiniCon Algorithm.

Last time Proof-system search ( ` ) Interpretation search ( ² ) Quantifiers Equality Decision procedures Induction Cross-cutting aspectsMain search strategy.

1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.

Normal forms for Context-Free Grammars

Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.

2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.

Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.

2005lav-i1 Local as View: First steps  Introduction and an example  Rewriting queries using views  The Information Manifold system.

Finite State Machines Data Structures and Algorithms for Information Processing 1.

Optimizing queries using materialized views J. Goldstein, P.-A. Larson SIGMOD 2001.

Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.

©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.

Chapter 10 Functional Dependencies and Normalization for Relational Databases.

Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.

Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.

1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.

Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.

Linear Programming McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.

14/10/04 AIPP Lecture 7: The Cut1 Controlling Backtracking: The Cut Artificial Intelligence Programming in Prolog Lecturer: Tim Smith Lecture 7 14/10/04.

Relations, Functions, and Matrices Mathematical Structures for Computer Science Chapter 4 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesFunctions.

Slide 1 Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs Jim Little UBC CS 322 – CSP October 20, 2014.

Type checking and inference Applications of typing axioms / rules are replaced with type equations. A solution to the equations assigns types for every.

Equations, Inequalities, and Mathematical Models 1.2 Linear Equations

CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.

1 Functional Dependencies and Normalization Chapter 15.

Copyright © Cengage Learning. All rights reserved.

Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.

McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Supplement 6 Linear Programming.

Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding Combined Theories.

Canonical Equations of Motion -- Hamiltonian Dynamics

Linear Programming Back to Cone  Motivation: From the proof of Affine Minkowski, we can see that if we know generators of a polyhedral cone, they.

Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.

CS4432: Database Systems II

Approximation Algorithms based on linear programming.

Answering Queries Using Views Presented by: Mahmoud ELIAS.

COMP 412, FALL Type Systems C OMP 412 Rice University Houston, Texas Fall 2000 Copyright 2000, Robert Cartwright, all rights reserved. Students.

Advanced Normalization

Chapter 2: Simplification, Optimization and Implication

Answering Queries using Templates with Binding Patterns

Type checking and inference

Advanced Normalization

Functional Dependencies and Normalization

Local-as-View Mediators

Presentation transcript:

2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes tuples using views that cannot contribute to the rewriting, and then discards these tuples We show examples, and then how to address the problems

2005lav-iv2 Example : A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) // find grandchildren The algorithm inverts the view: par(C, f(C, G)), par ((f(C,G), G) -: v(C,G) Given n tuples in the view, it produces 2n tuples, then joins, the discards the results that contain f(-,-) The bucket algorithm will spend more time on rewriting, find: Q’(X, Y) :- v(X, Y) And then output the n results

2005lav-iv3 Example (university db) : Views: v1(s, c, q, t) :- registered(s, c, q), course(c, t), c>=500, q>=a98 v2(s, p, c, q) :- registered(s, c, q), teaches(p, c, q) v3(s, c) :- registered(s, c, q), q<=a94 v4(p, c, t, q) :- registered(s, c, q), teaches(p, c, q), course(c, t), q<=a97 Query: q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95 Inverting v3: registered(s, c, f(s,c)) -: v3(s, c) This may produce any number of facts for registered, but for this query none can be used – why?

2005lav-iv4 v3(s, c) :- registered(s, c, q), q<=a94 q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95 How should the constraint on q in v3 be represented? Could export it by f(s, c) = a95 in query (how is q in the query transformed to f(s,c)?) But, what if the view contained no constraint?  The view must export variables constrained in the query The query has a join on q with teaches; teaches facts are derived only from other views, so q will be exported as a different function symbol, or as q (which of these here?)  a join will fail (cannot join f1(-,-) with f2(-,-) or a regular variable)  The view must export join variables of the query

2005lav-iv5 The factors that determine usability of a view are the same as in the bucket algorithm, but the inverse rules algorithm tries to use all views anyway Solution: compose query with inverse rules, to obtain a new query that uses directly the views Composition: Consider the heads of inverse rules as a db – collection of facts Look for valuations – mapping of query variables that map query atoms to this db Then repalce query goals by views

2005lav-iv6 Example : A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) // find grandchildren The algorithm inverts the view: par(C, f(C, G)), par ((f(C,G), G) -: v(C,G) Two candidate valuation mappings: X  C, Z  f(C,G), Y  G  q(C, G) :- v(C, G), v(C, G) X  f(C, G), Z ,G, Y  f(C, G)  (assuming we add C=G) q(f(G, G), f(G,G)) :- v(G, G), v(G, G) 2 nd is discarded – no function symbols in result Minimization of 1 st gives q(C, G) :- v(C, G), same as bucket ‘db’

2005lav-iv7 q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95 registered(s, c, f(s, c)), f(s, c)<=a94 :- v3(s, c) Any valuation that uses this fact must map q  f(s, c) The constraint f(s, c) =a95, but what if there is no constraint to export? The mapping q  f(s, c) cannot be used to map teaches to any fact derived from other views  v3 cannot be used

2005lav-iv8 A mapping will fail to define a valuation if a view does not export a join variable, and does not contain the join (why?) The view does not export a variable that is constrained in the query (cannot ‘check’ the constraint in the ‘db’) Thus, the results (for a CQ query, possibly with constraints) will be the same as for bucket (assuming it is correct & complete) The amount of work invested will probably be similar Composition can be performed also for Datalog queries, but weeding out useless mappings is more difficult

2005lav-iv9 The MiniCon algorithm --- the final one?  Motivation  Preliminaries  The MiniCon algorithm

2005lav-iv10  Motivation Previous algorithms: bucket, inverse rules, may be quite expensive to use, especially for systems with many views. The bucket algorithm has a narrow peephole in 1 st stage – each bucket is for a single atom  global constraints are treated only in 2 nd stage  Many useless combinations may be examined The inverse rules algorithm improved by composition, seems to perform similar work The motivation: find an algorithm that will do more work in preliminary filtering, and will scale up to hundreds of views

2005lav-iv11  Preliminaries The idea Once a view is put in a bucket of a query atom, switch to considering join variables – and find which other atoms are necessarily covered by the view Along the way, find out also which view head variables need to be equated Given coverage by views, combine views with disjoint covers Expected gain: more filtering in the 1 st stage, better representation of information  A smaller number of combinations, reduced number of containment checks in the 2 nd stage

2005lav-iv12 Example : A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) Bucket : one view in each bucket par(X, Z): { v(X,G)} par(Z, Y): {v(P, Y)} When the two view atoms are combined, a containment check discovers that G=Y  containment, & redundancy of 2nd atom Alternative: given par(X, Z): v(X,G), since Z (join var) occurs in 2 nd atom of query, add par(Z, Y) to coverage of v(X,G), with G=Y In 2 nd stage, just use v(X, Y)

2005lav-iv13 Assumptions, terminology: CQ queries and views, for now: no constants / constraints in query/views View definitions use variables different from those in query or other views (disjoint sets of variables) b(Q) – body atoms of Q, b(V) – body atoms of view V A mapping from vars(Q) to a vars(V) is interesting only if it maps a non-empty subset of b(Q) to b(V) Considered mappings always map Q head vars to V head vars – head var preservation – (hvp) If h maps x in vars(Q) to an existential var in some V, then all atoms of b(Q) that contain x must be mapped to same V: join variable condition --- (jvc)

2005lav-iv14 Given Q(X), assume Q’ is a rewriting in terms of views Q’: q(X) :- v1(X1), …, vn(Xn) (some vi, vj may be occurrences of same view v)  Exists containment mapping h from Q to exp(Q’) (satisfies hvp) Let Gi be the set of atoms of b(Q) mapped to b(exp(vi)) h/i – h restricted to vars(Gi) Then And Gi satisfies (jvc): if h/i maps x of vars(Gi) to existential variable of vi, then every atom g in b(Q) that contains this atom is in Gi

2005lav-iv15 The occurrence of vi in Q’ may have some head variables equated Example : the original head might be vi(A, B, C) the head in Q’ : vi(X, X, Z) These equalities are given by a unique least set of equality constraints Ei (v/E -- the view v, with head variables equated as specified by E) Summary (so far): the containment mapping can be decomposed into “disjoint” components (vi, Ei, h/i, Gi) All we need to do is find such components, then combine them What is the condition for successful combination? Does a combination (s.t. ) ever fail ?

2005lav-iv16 To find such components, we must use the given view definitions (variables different from those of Q or exp(Q’)). Answer : a component and its mapping can be expressed as: Here: hi is a mapping from Q to the given view definition for vi E’i – the least set of equalities that make hi a good mapping h’i is a variable renaming E’i and hi depend only on Q and the definition of vi  We can find components mappings from Q to the view defs, then combine & rename, possibly equating more head vars Gi vi/E’i exp(vi(Xi)) hi h/i h’i

2005lav-iv17 One more step : A component (vi, Ei, hi, Gi) may be further decomposed into smaller components (vi, Ei1, hi1, Gi1), (vi, Ei2, hi2, Gi2) provided each of Gi1, Gi2 satisfies (jvc), and they are disjoint Each of Ei1, Ei2 is a subset of Ei, least sets for the mappings hi1, hi2 to be ok When these are combined, Ei1 union Ei2 is augmented with the remaining equalities of Ei Minimal such components: Easier to find Can be re-used for different combinations.

2005lav-iv18 What is a minimal component? C = (vi, Ei, hi, Gi) is minimal if hi satisfies (hvp) + (jvc) (assuming the equalities in Ei) There is no component C1 whose last three components are contained in C’s last three components (at least one is proper containment) A component: minicon (mini containment) description -- MCD The algorithm constructs and combines minimal MCDs

2005lav-iv19  The MiniCon Algorithm Minimal MCD Construction Algorithm : For each g in b(Q), each k in each b(vi) Let E(g,k) be the least set of equalities s.t. a mapping h(g,k) from g to k that satisfies (hvp) exists // E(g,k) and h(g,k), if they exist, // are uniquely determined by g, k If E(g,k) and h(g,k) exist find all minimal MCDs that extend them: (vi, Ei, hi, Gi) extends if Ei contains E(g,k), hi contains h(g,k), Gi contains g For the final set of MCDs remove duplicates

2005lav-iv20 How do we find minimal MCDs that extend a given mapping? I. Extension to one more query atom, one view atom extend (vi, E, h, g, k) // E equalities on head vars of vi // h: vars(Q)  vars(vi), partial, hvp with E // g in b(Q), k in b(vi) try to extend h to map g to k, with hvp, by adding equalities to E return fail, or the (uniquely determined) E’,h’ (The first step in alg. of previous page is this one, given empty E and h)

2005lav-iv21 How do we find minimal MCDs that extend a given mapping? II. Extend repeatedly, as long as needed and successful Given vi, g, k, E(g,k) and h(g,k) : Let C = {(vi, E(g,k), h(g,k), {g}}, MC = {} //C – initial component, (jvc) possibly not satisfied While C not empty –remove some c = (vi, E, h, G) from C –if (jvc) satisifed – put in MC –if not, exists x in vars(Q) s.t. h(x) is existential, g’ that contains x, g’ not in G –for each k’ in b(vi) if extend(vi, E, h, g’, k’) succeeds, put extension in C Remove duplicates from MC

2005lav-iv22 Example : A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) MCDs: 1 st query atom, 1 st view atom: h(1,1) = {X  C, Z  P}, E(1.1) ={} need to extend to par(Z, Y), can only map to 2 nd view atom MCD: (v, E={}, h={X  C, Z  P, Y  G}, b(Q)) 1 st query atom, 2 nd view atom: no mapping … The only MCD is the above

2005lav-iv23 Comment : In the paper, if (vi, Ei1, hi1, Gi1) and (vi, Ei2, hi2, Gi2) are both minimal extensions, and Gi1 is contained in Gi2, then the 2 nd is thrown away (another minimization) I do not know how to explain this optimization, or prove that with it the algorithm is still complete

2005lav-iv24 2 nd phase: MCD combination, and variable renaming : A set of MCDs {(vi, Ei, hi, Gi)} is a candidate if For each candidate set: Rename variables : for each view variable y : If hi(x) = y (y a view variable), rename y to x else rename y to a fresh distinct variable Note : if x in domain of both hi, hj, then hi(x), hj(x) are head variables of vi, vj (by def of MCD),  renaming makes them equal

2005lav-iv25 Example (cont’d): A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) MCD: (v, E={}, h={X  C, Z  P, Y  G}, b(Q)) Rename in v C to X, G to Y Rewriting: q(X, Y) :- v(X, Y)

2005lav-iv26 Example : A db: parenthood relation par(c, p) A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren A query: Q: q(X, X) :- par(X, Z), par(Z, X) // I am my own grandpa MCDs: 1 st query atom, 1 st view atom: h(1,1) = {X  C, Z  P}, E(1.1) ={} need to extend to par(Z, X), can only map to 2 nd view atom MCD: (v, {C=G}, {X  C, Z  P}, b(Q)) 1 st query atom, 2 nd view atom: no mapping … The only MCD is the above

2005lav-iv27 Example : A db: parenthood relation par(c, p) A view: v(C, P) :- par(C, P), par(P, G) // parents where grandparents exist A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) MCDs: h(1,1) = {X  C, Z  P}, E(1.1) ={}  MCD A1 = ( v(C, P), {}, h(1,1), {par(X,Z)} ) h(1, 2) = {X  P, Z  G}, E(1,2)={}, fails (why?) h(2, 1) = {Z  C, Y  P}, E(2,1)={}  MCD A2 = ( v(C, P), {}, h(2,1), {}, {par(Z,Y)} ) h(2, 2) = {Z  P, Y  G}, fails (why?)

2005lav-iv28 A view: v(C, P) :- par(C, P), par(P, G) A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) MCDs: A1 = ( v(C, P), {}, h(1,1), {par(X,Z)} ) A2 = ( v(C, P), {}, h(2,1), {par(Z,Y)} ) Rewritings: (rename views to have distinct vars) A1+A2: X  C1, Z  P1, Z  C2, Y  P2 : add P1 (in 1 st v) = C2 (in 2 nd v) rewriting v(C1,P1), v(P1, P2) renaming: v(X, Z), v(Z, Y) – a correct rewriting

2005lav-iv29 When Q or views contain constants: MCD formation: a of Q must be mapped to a head variable of vi, or itself If x is in headvar(Q), it can be mapped to headvar(vi) or to a Whenever x is mapped to a, hi records this fact MCD combination: If A1, A2 are defined on x, then allow also Both map x to a One maps x to a, the other to head var of view In either case, rename x to a in rewriting

2005lav-iv30 When Q or views contain comparisons: If views contain comparisons, no change to algorithm (it finds contained rewritings anyway) If Q contains comparisons, then there may be no Datalog program that computes the certain answers (can express x != y) But, we can expect that extending the algorithm for comparisons will be a good heuristics, and will find certain answers in many cases

2005lav-iv31 When Q or views contain comparisons: C(Q) – constraints of Q (closed under inference) MCD formation: (vi, Ei, hi, Gi) (extend the join variable condition) If hi(x) is existential of vi, and c(x, y) in C(Q), then hi(y) is defined C(vi) must imply all constraints in hi(C(Q)) that involve at least one existential of vi MCD combination: Add all constraints of C(Q) not covered by those of the views