Equivalence of Aggregate Queries in Conjunctive QL David DeHaan CS 848 February 22, 2003 4/25/2019
Dialects of QL (semantics) (expressiveness) Conjunctive QL with bag semantics† Positive QL First order QL bag semantics bag semantics‡ †[Khizder et al., 1999], ‡[Lui et al., 2002] 4/25/2019
Conjunctive QL Q ::= D as A (quantification) | A1 = A2.R (unnest) | A1.Pf1 = A2.Pf2 (selection) | elim A1, … , An Q (projection) | true (null tuple) | from Q1, Q2 (natural join) | ( Q ) D ::= THING | C (basic description) Pf ::= id | A.Pf (path function) 4/25/2019
Conjunctive QL with bag semantics Q ::= D as A (quantification) | A1.Pf1 = A2.Pf2 (selection) | select A1, … , An Q (projection) | elim Q (duplicate elimination) | true (null tuple) | from Q1, Q2 (natural join) | ( Q ) D ::= THING | C (basic description) Pf ::= id | A.Pf (path function) 4/25/2019
Aggregate Conjunctive QL Q ::= D as A (quantification) | A1.Pf1 = A2.Pf2 (selection) | select A1, … , An Q (projection) | elim Q (duplicate elimination) | agg A1, ... , An, (B) Q (aggregate) | true (null tuple) | from Q1, Q2 (natural join) | ( Q ) D ::= THING | C (basic description) Pf ::= id | A.Pf (path function) 4/25/2019
“Deciding Equivalences among Aggregate Queries” W. Nutt, Y. Sagiv, S. Shurin PODS 1998 Equivalence of conjunctive queries containing a single aggregate operator with comparison predicates 4/25/2019
Nutt et. al. In other words, SQL queries of the form SELECT A1, …, An, (B) FROM R1, …, Rm WHERE [Equality Conditions] AND [Binary Comparisons] GROUP BY A1, …, An where (B) 2 {count(*), cntd(B), sum(B), max(B), min(B)} Define core of q(x, (y)) as q(x, y) 4/25/2019
Count(*) Queries q ´ q0 $ qc ´bs q0c Relational (no comparisons): qc ´bs q0c $ qc, q0c are isomorphic Complexity: NP [Chaudhuri, Vardi; PODS 1993] 4/25/2019
Count(*) Queries With Comparisons: qc, q0c isomorphic ! qc ´bs q0c e.g. bag-set equivalent but not isomorphic: q à p(x) Æ p(y) Æ p(z) Æ x<y Æ x<z q0 à p(x) Æ p(y) Æ p(z) Æ x<z Æ y<z 4/25/2019
Count(*) Queries Compatible linearizations Resulting linear expansions qc: {(x<y=z), (x<y<z)} q0c: {(x=y<z), (x<y<z)} Resulting linear expansions qL: { [q à p(x) Æ p(z) Æ p(z) Æ x<z], [q à p(x) Æ p(y) Æ p(z) Æ x<y<z] } q0L: { [q0 à p(y) Æ p(y) Æ p(z) Æ y<z], [q0 à p(x) Æ p(y) Æ p(z) Æ x<y<z] } qc ´bs q0c $ qL, q0L isomorphic Complexity: P-space 4/25/2019
Count Distinct Queries Sufficient: qc ´s q0c ! q ´ q0 e.g. q ´ q0 but qc s q0c q(cntd(y)) Ã p(y) & p(z) & y<z q0(cntd(y)) Ã p(y) & p(z) & y>z qc returns all elements except greatest. q0c returns all elements except least 4/25/2019
Count Distinct Queries Necessary: qc ´s q0c $ q ´ q0 only when q, q0 are reduced no variable in same position as y occurs in strict comparison (c.f. previous example) one of: q, q0 range over rationals q, q0 don’t contain constants No variable in same position as y occurs in any comparison 4/25/2019
Sum Queries Relational, without Constants: q ´ q0 $ qc ´bs q0c Complexity: NP With Comparisons, without Constants: Complexity: P-space 4/25/2019
Sum Queries With Constants: q ´ q0 if and only if Complexity: P-space qc ´ws q0c and qc, q0c have variable-isomorphic linear expansions Complexity: P-space 4/25/2019
Max/Min Queries Definition: q dominates q0 if for all databases: whenever q returns tuple (x, y), q0 returns tuple (x, y0) where y ¸ y0 (for Max, · for Min) q ´ q0 $ qc dominates q0c and q0c dominates qc 4/25/2019
Max/Min Queries Relational: p dominates p0 $ p0 µs p Complexity: NP-complete With Comparisons: p dominates p0 $ 8linearizations p0L of p0, p dominates p0L Complexity: P2-complete 4/25/2019
Summary - Nutt et. al. Consider equivalence of CQL queries only where agg occurs at top level Necessary & Sufficient conditions differ depending upon aggregate operator Only consider the most general case where no schema information is used In reality, schema information is often present 4/25/2019
“Exploiting Uniqueness in Query Optimization” G. Paulley, P. Larson ICDE 1994 Use schema information to remove DISTINCT operator from conjunctive SQL queries (i.e. elim operator from CQL with bag semantics) 4/25/2019
Paulley et. al. Q: select distinct W RW(Q): select W from C1 as A1, …, Cn as An where R RW(Q): select W from C1 as A1, …, Cn as An where R R={constraints over W [ {attributes of A1, …, An}} 4/25/2019
Paulley et. al. Theorem: Q ´ RW(Q) if and only if C1,…,Cm all have candidate keys Define K = key(C1) ± … ± key(Cn) K is a candidate key for C1 £ … £ Cn One of: K µ W Some K0 µ K exists such that: K0 µ W Unique values for (K – K0) can be inferred from R + Schema (CHECK +KEY constraints) 4/25/2019
Paulley et. al. Testing this condition = satisfaction of arbitrary Boolean expression = NP-complete 4/25/2019
“Reasoning about Duplicate Elimination with Descriptive Logic” V. Khizder, D. Toman, G. Weddell DOOD 2002 Incrementally remove conjuncts from scope of elim operator in CQL Map equivalence to DL membership problem instead of Boolean satisfiability 4/25/2019
Khizder et. al. Q: select V RW(Q): from C1 as A1,…,Cm as Am,(elim select W from Cm+1 as Am+1,…,Cn as An,R) RW(Q): select V from C1 as A1,…,Cm+1 as Am+1,(elim select W [ {Am+1} from Cm+2 as Am+2,…,Cn as An,R) R={equality constraints over Pf’s on W [ {A1, …, An}} This Normal Form can always be achieved 4/25/2019
Khizder et. al. Define: S = Database Schema SQ = “Query Schema” Expressed in CFD (CLASSIC + Functional Dependencies) C(Pf1, …, Pfn ! Pf) SQ = “Query Schema” = {CQv(A1:C1), …, CQv(An:Cn), CQvR} 4/25/2019
Reformulating Paulley et. al. Schema ² Q ´ RW(Q) m Schema + R + instance of W ² unique instance of K S [ SQ ² CQ v CQ(W ! K) S [ SQ ² CQ v CQ(W ! A1, …, An) 4/25/2019
Khizder et. al. Theorem: Q ´ RW(Q) if and only if S [ SQ ² CQ v CQ(A [ W ! Am+1) where A = {A1, …, Am} Am+1 2 W FD obviously true Am+1 W Am+1 existentially qualified FD guarantees no duplicates introduced 4/25/2019
Khizder et. al. CQL: S ² Q ´ RW(Q) CFD: m S [ SQ ² CQ v CQ({A1,…,Am} [ W ! Am+1) Apply rewrite iteratively from m=0 to m=n-1 iff: S [ SQ ² CQ v CQ(W ! A1, …, An) 4/25/2019
Complexity Membership in CLASSIC is P-time Holds for CFD assuming S [ SQ only contain regular path FD’s C(Pf1,…,Pfn ! Pf), Pf prefix of some Pfi S does not contain equation constraints SQL CHECK constraints allow disjunction Not expressible in CFD P-time bound does not apply 4/25/2019
Usefulness of Incremental Re-write Move out of elim: Increase search space for join Shrink size of intermediate result requiring sorting Move into elim: When W Å V = then Values of W not important; only existence Replace subquery with probe to an index 4/25/2019
Rewriting with Aggregates Using aggregate views “A view will be usable to answer a query only if there is an isomorphism between the view and a subset of the query” [Halevy, 2002] Note that the above quote is incorrect. It states a condition as necessary that is actually sufficient (but not necessary). Using schema information can increase the number of usable views. 4/25/2019
Simple Example Schema: Query: Cust(Id, Name) Purch(Id, Item, Price) Cust v Cust(Id ! Name) View1(Id, Name, Sum(Price)) Ã Cust(Id, Name) Æ Purch(Id, Item, Price) Query: Q(Name, Tot) Ã Cust(Id, Name) Æ Spend(Id, Tot) Spend(Id, Sum(P)) Ã Purch(Id, P) 4/25/2019
Rewriting m Rewrite: Valid because Q(Name, Tot) Ã Cust(Id, Name) Æ Spend(Id, Tot) Spend(Id, Sum(P)) Ã Purch(Id, P) m Q0(Name, Tot) Ã Spend0(Id, Name, Tot) Spend0(Id, N, Sum(P)) Ã Cust(Id, N) Æ Purch(Id, Z, P) Valid because S [ SQ ² Cust v Cust(Id ! Name) Now an Spend0 and View1 are isomorphic (Sufficient condition for using View1). 4/25/2019