Presentation is loading. Please wait.

Presentation is loading. Please wait.

Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.

Similar presentations


Presentation on theme: "Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz."— Presentation transcript:

1 Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz

2 Annotation Management System A system that is able to propagate meta-data that is associated with a piece of data along with the data as the data is being moved around Main feature:  To trace the provenance and flow of data transformation a1a1 a1a1 a2a2 a2a2

3 Tracing the Provenance and Flow of Data transformation a1a1 a1a1 a2a2 a2a2 b2b2 b1b1 b3b3 b2b2 b1b1 b3b3 a1a1 a2a2 a3a3

4 Other Applications Keep information that cannot be otherwise stored in the current database design Highlight wrong data  Errorneous data may be copied but the comment that it is wrong goes along with it Security  Annotate security level of data items Quality metric  Annotate quality level of data items

5 Main Question Are the annotated outcomes the same for equivalent queries? Why this question?  A query optimizer rewrites a query. Will the rewritten query have the same annotation propagation behavior?

6 A Simple Example Given two relation schemas: R(A,B), S(B,C) SELECT * FROM R NATURAL JOIN S versus SELECT r.A, r.B, s.C FROM R r, S s WHERE r.B = s.B R 1 2 S 2 3 ab Result 1 1 2 3 Result 2 1 2 3 ab a =a=a

7 In a More Concise Notation Ans(x,y,z) :- R(x,y), S(y,z) { x ! 1, y ! 2, z ! 3 } Ans(x,y,z) :- R(x,y), S(y’,z), y = y’ { x ! 1, y! 2, y’! 2, z ! 3 } A location is a triple (R, t, A) Annotations of values that reside in different locations but are bound to the same variable are unioned together Ans(y) :- R(x,y) Ans(y) :- S(y,z) Ans(2 ) Annotations that belong to the same output location are unioned together ab ab ab =a=a

8 More Examples Q 1 : Ans(x,v) :- R(x,y,u), R(x,z,v), R(t,w,z) Q 2 : Ans(x,v) :- R(p,q,v), R(x,z,v), R(t,w,z) First answer: Ans(1, 5 ) Second answer: Ans(1, 5 ) a bc d R 1 2 3 1 4 5 1 8 4 8 9 5 abc c d b

9 A sufficient condition for annotation containment Theorem If Q 1 and Q 2 are equivalent and Q 1 is minimal, then Q 1 is annotation-contained in Q 2 Intuition of proof:  If Q 1 is minimal, then no proper subquery of Q 1 is equivalent to Q 1  The minimal query of Q 2 is isomorphic to Q 1 up to variable renaming. Assume that they are identical.  Any valuation  for Q 1 can be simulated by a valuation  ± h that carries annotations in the same way as  of Q 1 (h is the homomorphism from Q 2 to its minimal subquery)

10 Is the sufficient condition too strong? Is it true that if Q 1 is equivalent to Q 2, then Q 1 is annotation-contained in Q 2 ?  Answer: No. Is it true that if Q 1 is contained in Q 2 and Q 1 is minimal, then Q 1 is annotation contained in Q 2 ?  Answer: No. Q 1 : Ans(x) :- R(x, y), S(x, y) Q 2 : Ans(x) :- R(x, y) Ans(1 ) Ans (1 )  Both Q 1 and Q 2 are minimal queries but neither Q 1 nor Q 2 are annotation-contained in each other R 1 2 1 3 S 1 2 a b c acba

11 Necessary and Sufficient condition? If Q 1 carries an annotation of the jth column of some S- tuple to the output, there is a way for Q 2 to simulate this behavior via homomorphism h Q 1 : H(… x …) :- … S(… x …) … Q 2 : H(… y …) :- … S(… y …) … ith columnjth columnpth subgoal ith columnjth columnqth subgoal h(y) = x, h maps the qth subgoal of Q 2 to the pth subgoal of Q 1

12 A necessary and sufficient condition for annotation-containment via homomorphisms Theorem Q 1 is annotation-contained in Q 2 iff for every distinguished variable x that occurs at the ith column in the head and jth column of the pth subgoal in the body of Q 1, there exists a homomorphism h from Q 2 to Q 1 such that  h maps the body of Q 2 into the body of Q 1 and the head of Q 2 to the head of Q 1  Let the qth subgoal Q 2 be the preimage of the pth subgoal of Q 1 under h. The variable that occurs at the jth column of the qth subgoal of Q 2 is identical to the variable that occurs at the ith column in the head of Q 2

13 Can a single homomorphism do the job? Q 1 : Ans(x) :- R(x,y), R(x,z) Q 2 : Ans(x) :- R(x,y) Every homomorphism from Q 2 to Q 1 maps the body of Q 2 to only one subgoal of Q 1

14 Complexity of Annotation- Containment Proposition It is NP-complete to decide if Q 1 is annotation-contained in Q 2

15 Propagating annotations back If we wish to attach an annotation on a piece of data in the output, on which source data should we attach an annotation? The user should be given the choice Alert the user of a side-effect-free annotation when there is one

16 Annotation Placement Problem Given the source database, the query, the output data that we wish to annotation, it is DP-hard to decide if there is a side-effect- free annotation Upper-bound is not DP Conjecture: in a class slightly above DP

17 Related Work Idea is not new though annotations were never explicitly stated as provenance- based: Wang & Madnick [VLDB 90], Lee, Bressan & Madnick [WIDM 98], Bernstein & Bergstraesser [IEEE Data Eng. 99] Annotations of Web Documents Annotations on genomic sequences

18 Open Issues Are there polynomial time algorithms for deciding annotation-containment for the class of queries with bounded treewidths Is query minimization church-rosser? Exact complexity of the annotation placement problem? Annotation and propagation for XML data Relationship between annotation-containment and containment of conjunctive queries under bag semantics

19 Open Issues (contd) Other annotation propagation semantics other than basing on provenance? Querying the annotations?

20 Other results that do not carry over Query Minimization  We can no longer minimize a query and preserve annotation-equivalence by discarding one subgoal at a time Answering Queries using Views  Some classical results no longer hold  [LMSS95] if a query Q has p subgoals and a query Q’ is a complete minimal rewriting of Q using a set of views V, then Q’ has at most p subgoals

21 Example Q: A(x) :- R(x,z,v), R(x,u,z), R(x,z’,t), R(x,s,z’) Q’: A(x) :- R(x,u,z), R(x,z’,t), R(x,s,z’) Q min : A(x) :- R(x,z’,t), R(x,s,z’) R 1 2 3 1 3 2 1 4 5 1 4 6 : a1a1 a2a2 a3a3 a4a4


Download ppt "Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz."

Similar presentations


Ads by Google