Presentation is loading. Please wait.

Presentation is loading. Please wait.

Equivalence of Aggregate Queries in Conjunctive QL

Similar presentations


Presentation on theme: "Equivalence of Aggregate Queries in Conjunctive QL"— Presentation transcript:

1 Equivalence of Aggregate Queries in Conjunctive QL
David DeHaan CS 848 February 22, 2003 4/25/2019

2 Dialects of QL (semantics) (expressiveness) Conjunctive QL with
bag semantics† Positive QL First order QL bag semantics bag semantics‡ †[Khizder et al., 1999], ‡[Lui et al., 2002] 4/25/2019

3 Conjunctive QL Q ::= D as A (quantification) | A1 = A2.R (unnest)
| A1.Pf1 = A2.Pf2 (selection) | elim A1, … , An Q (projection) | true (null tuple) | from Q1, Q2 (natural join) | ( Q ) D ::= THING | C (basic description) Pf ::= id | A.Pf (path function) 4/25/2019

4 Conjunctive QL with bag semantics
Q ::= D as A (quantification) | A1.Pf1 = A2.Pf2 (selection) | select A1, … , An Q (projection) | elim Q (duplicate elimination) | true (null tuple) | from Q1, Q2 (natural join) | ( Q ) D ::= THING | C (basic description) Pf ::= id | A.Pf (path function) 4/25/2019

5 Aggregate Conjunctive QL
Q ::= D as A (quantification) | A1.Pf1 = A2.Pf2 (selection) | select A1, … , An Q (projection) | elim Q (duplicate elimination) | agg A1, ... , An, (B) Q (aggregate) | true (null tuple) | from Q1, Q2 (natural join) | ( Q ) D ::= THING | C (basic description) Pf ::= id | A.Pf (path function) 4/25/2019

6 “Deciding Equivalences among Aggregate Queries”
W. Nutt, Y. Sagiv, S. Shurin PODS 1998 Equivalence of conjunctive queries containing a single aggregate operator with comparison predicates 4/25/2019

7 Nutt et. al. In other words, SQL queries of the form
SELECT A1, …, An, (B) FROM R1, …, Rm WHERE [Equality Conditions] AND [Binary Comparisons] GROUP BY A1, …, An where (B) 2 {count(*), cntd(B), sum(B), max(B), min(B)} Define core of q(x, (y)) as q(x, y) 4/25/2019

8 Count(*) Queries q ´ q0 $ qc ´bs q0c Relational (no comparisons):
qc ´bs q0c $ qc, q0c are isomorphic Complexity: NP [Chaudhuri, Vardi; PODS 1993] 4/25/2019

9 Count(*) Queries With Comparisons: qc, q0c isomorphic ! qc ´bs q0c
e.g. bag-set equivalent but not isomorphic: q à p(x) Æ p(y) Æ p(z) Æ x<y Æ x<z q0 à p(x) Æ p(y) Æ p(z) Æ x<z Æ y<z 4/25/2019

10 Count(*) Queries Compatible linearizations Resulting linear expansions
qc: {(x<y=z), (x<y<z)} q0c: {(x=y<z), (x<y<z)} Resulting linear expansions qL: { [q à p(x) Æ p(z) Æ p(z) Æ x<z], [q à p(x) Æ p(y) Æ p(z) Æ x<y<z] } q0L: { [q0 à p(y) Æ p(y) Æ p(z) Æ y<z], [q0 à p(x) Æ p(y) Æ p(z) Æ x<y<z] } qc ´bs q0c $ qL, q0L isomorphic Complexity: P-space 4/25/2019

11 Count Distinct Queries
Sufficient: qc ´s q0c ! q ´ q0 e.g. q ´ q0 but qc s q0c q(cntd(y)) Ã p(y) & p(z) & y<z q0(cntd(y)) Ã p(y) & p(z) & y>z qc returns all elements except greatest. q0c returns all elements except least 4/25/2019

12 Count Distinct Queries
Necessary: qc ´s q0c $ q ´ q0 only when q, q0 are reduced no variable in same position as y occurs in strict comparison (c.f. previous example) one of: q, q0 range over rationals q, q0 don’t contain constants No variable in same position as y occurs in any comparison 4/25/2019

13 Sum Queries Relational, without Constants: q ´ q0 $ qc ´bs q0c
Complexity: NP With Comparisons, without Constants: Complexity: P-space 4/25/2019

14 Sum Queries With Constants: q ´ q0 if and only if Complexity: P-space
qc ´ws q0c and qc, q0c have variable-isomorphic linear expansions Complexity: P-space 4/25/2019

15 Max/Min Queries Definition: q dominates q0 if for all databases:
whenever q returns tuple (x, y), q0 returns tuple (x, y0) where y ¸ y0 (for Max, · for Min) q ´ q0 $ qc dominates q0c and q0c dominates qc 4/25/2019

16 Max/Min Queries Relational: p dominates p0 $ p0 µs p
Complexity: NP-complete With Comparisons: p dominates p0 $ 8linearizations p0L of p0, p dominates p0L Complexity: P2-complete 4/25/2019

17 Summary - Nutt et. al. Consider equivalence of CQL queries only where agg occurs at top level Necessary & Sufficient conditions differ depending upon aggregate operator Only consider the most general case where no schema information is used In reality, schema information is often present 4/25/2019

18 “Exploiting Uniqueness in Query Optimization”
G. Paulley, P. Larson ICDE 1994 Use schema information to remove DISTINCT operator from conjunctive SQL queries (i.e. elim operator from CQL with bag semantics) 4/25/2019

19 Paulley et. al. Q: select distinct W RW(Q): select W
from C1 as A1, …, Cn as An where R RW(Q): select W from C1 as A1, …, Cn as An where R R={constraints over W [ {attributes of A1, …, An}} 4/25/2019

20 Paulley et. al. Theorem: Q ´ RW(Q) if and only if
C1,…,Cm all have candidate keys Define K = key(C1) ± … ± key(Cn) K is a candidate key for C1 £ … £ Cn One of: K µ W Some K0 µ K exists such that: K0 µ W Unique values for (K – K0) can be inferred from R + Schema (CHECK +KEY constraints) 4/25/2019

21 Paulley et. al. Testing this condition
= satisfaction of arbitrary Boolean expression = NP-complete 4/25/2019

22 “Reasoning about Duplicate Elimination with Descriptive Logic”
V. Khizder, D. Toman, G. Weddell DOOD 2002 Incrementally remove conjuncts from scope of elim operator in CQL Map equivalence to DL membership problem instead of Boolean satisfiability 4/25/2019

23 Khizder et. al. Q: select V RW(Q):
from C1 as A1,…,Cm as Am,(elim select W from Cm+1 as Am+1,…,Cn as An,R) RW(Q): select V from C1 as A1,…,Cm+1 as Am+1,(elim select W [ {Am+1} from Cm+2 as Am+2,…,Cn as An,R) R={equality constraints over Pf’s on W [ {A1, …, An}} This Normal Form can always be achieved 4/25/2019

24 Khizder et. al. Define: S = Database Schema SQ = “Query Schema”
Expressed in CFD (CLASSIC + Functional Dependencies) C(Pf1, …, Pfn ! Pf) SQ = “Query Schema” = {CQv(A1:C1), …, CQv(An:Cn), CQvR} 4/25/2019

25 Reformulating Paulley et. al.
Schema ² Q ´ RW(Q) m Schema + R + instance of W ² unique instance of K S [ SQ ² CQ v CQ(W ! K) S [ SQ ² CQ v CQ(W ! A1, …, An) 4/25/2019

26 Khizder et. al. Theorem: Q ´ RW(Q) if and only if
S [ SQ ² CQ v CQ(A [ W ! Am+1) where A = {A1, …, Am} Am+1 2 W FD obviously true Am+1  W Am+1 existentially qualified FD guarantees no duplicates introduced 4/25/2019

27 Khizder et. al. CQL: S ² Q ´ RW(Q) CFD: m
S [ SQ ² CQ v CQ({A1,…,Am} [ W ! Am+1) Apply rewrite iteratively from m=0 to m=n-1 iff: S [ SQ ² CQ v CQ(W ! A1, …, An) 4/25/2019

28 Complexity Membership in CLASSIC is P-time Holds for CFD assuming
S [ SQ only contain regular path FD’s C(Pf1,…,Pfn ! Pf), Pf prefix of some Pfi S does not contain equation constraints SQL CHECK constraints allow disjunction Not expressible in CFD P-time bound does not apply 4/25/2019

29 Usefulness of Incremental Re-write
Move out of elim: Increase search space for join Shrink size of intermediate result requiring sorting Move into elim: When W Å V =  then Values of W not important; only existence Replace subquery with probe to an index 4/25/2019

30 Rewriting with Aggregates
Using aggregate views “A view will be usable to answer a query only if there is an isomorphism between the view and a subset of the query” [Halevy, 2002] Note that the above quote is incorrect. It states a condition as necessary that is actually sufficient (but not necessary). Using schema information can increase the number of usable views. 4/25/2019

31 Simple Example Schema: Query: Cust(Id, Name) Purch(Id, Item, Price)
Cust v Cust(Id ! Name) View1(Id, Name, Sum(Price)) Ã Cust(Id, Name) Æ Purch(Id, Item, Price) Query: Q(Name, Tot) Ã Cust(Id, Name) Æ Spend(Id, Tot) Spend(Id, Sum(P)) Ã Purch(Id, P) 4/25/2019

32 Rewriting m Rewrite: Valid because
Q(Name, Tot) Ã Cust(Id, Name) Æ Spend(Id, Tot) Spend(Id, Sum(P)) Ã Purch(Id, P) m Q0(Name, Tot) Ã Spend0(Id, Name, Tot) Spend0(Id, N, Sum(P)) Ã Cust(Id, N) Æ Purch(Id, Z, P) Valid because S [ SQ ² Cust v Cust(Id ! Name) Now an Spend0 and View1 are isomorphic (Sufficient condition for using View1). 4/25/2019


Download ppt "Equivalence of Aggregate Queries in Conjunctive QL"

Similar presentations


Ads by Google