Equivalence of Aggregate Queries in Conjunctive QL

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Advertisements

CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
CS CS4432: Database Systems II Logical Plan Rewriting.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
1 Lecture 12: Further relational algebra, further SQL
Efficient Query Evaluation on Probabilistic Databases
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
FALL 2004CENG 351 File Structures and Data Management1 SQL: Structured Query Language Chapter 5.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #3.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Constraints, Triggers Chapter 5.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. SQL - part 2 - Database Management Systems I Alex Coman, Winter 2006.
1 SQL: Structured Query Language Chapter 5. 2 SQL and Relational Calculus relationalcalculusAlthough relational algebra is useful in the analysis of query.
CS405G: Introduction to Database Systems Final Review.
Optimizing queries using materialized views J. Goldstein, P.-A. Larson SIGMOD 2001.
Introduction to Logic for Artificial Intelligence Lecture 2 Erik Sandewall 2010.
Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data retrieval) for relational model SQL’s DML (Data Manipulation.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 3: Introduction.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
Optimizing Queries Using Materialized Views Qiang Wang CS848.
Advanced SQL Murat Kantarcioglu Adapted from Silberchatz et al. slides.
The Relational Model: Relational Calculus
CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL.
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 71 The Relational Data Model, Relational Constraints & The Relational Algebra.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Select-From-Where Statements Multirelation Queries Subqueries
Database Systems Chapter 6
More SQL: Complex Queries,
Introduction to Logic for Artificial Intelligence Lecture 2
Schedule Today: Jan. 28 (Mon) Jan. 30 (Wed) Next Week Assignments !!
Slides are reused by the approval of Jeffrey Ullman’s
Module 2: Intro to Relational Model
COP Introduction to Database Structures
CSE15 Discrete Mathematics 01/23/17
Database Management System
Relational Model By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)
Relational Algebra - Part 1
Chapter 3: Relational Model III
Introduction to Database Systems, CS420
Prepared by : Ankit Patel (226)
CS 440 Database Management Systems
Chapter 2: Intro to Relational Model
Basic SQL Lecture 6 Fall
COMP 430 Intro. to Database Systems
CS 405G: Introduction to Database Systems
February 7th – Exam Review
CPSC-310 Database Systems
Instructor: Mohamed Eltabakh
More SQL: Complex Queries, Triggers, Views, and Schema Modification
CSCE 315 – Programming Studio Spring 2010 Project 1, Lecture 4
Dealing with Uniqueness Constraint in Query Optimization
Horn Clauses and Unification
MA/CSSE 474 More Math Review Theory of Computation
Chapter 2: Intro to Relational Model
Overview of Query Evaluation
Chapter 2: Intro to Relational Model
Example of a Relation attributes (or columns) tuples (or rows)
Chapter 2: Intro to Relational Model
Chen Li Information and Computer Science
Relational Algebra Chapter 4 - part I.
Instructor: Zhe He Department of Computer Science
Select-From-Where Statements Multirelation Queries Subqueries
Presentation transcript:

Equivalence of Aggregate Queries in Conjunctive QL David DeHaan CS 848 February 22, 2003 4/25/2019

Dialects of QL (semantics) (expressiveness) Conjunctive QL with bag semantics† Positive QL First order QL bag semantics bag semantics‡ †[Khizder et al., 1999], ‡[Lui et al., 2002] 4/25/2019

Conjunctive QL Q ::= D as A (quantification) | A1 = A2.R (unnest) | A1.Pf1 = A2.Pf2 (selection) | elim A1, … , An Q (projection) | true (null tuple) | from Q1, Q2 (natural join) | ( Q ) D ::= THING | C (basic description) Pf ::= id | A.Pf (path function) 4/25/2019

Conjunctive QL with bag semantics Q ::= D as A (quantification) | A1.Pf1 = A2.Pf2 (selection) | select A1, … , An Q (projection) | elim Q (duplicate elimination) | true (null tuple) | from Q1, Q2 (natural join) | ( Q ) D ::= THING | C (basic description) Pf ::= id | A.Pf (path function) 4/25/2019

Aggregate Conjunctive QL Q ::= D as A (quantification) | A1.Pf1 = A2.Pf2 (selection) | select A1, … , An Q (projection) | elim Q (duplicate elimination) | agg A1, ... , An, (B) Q (aggregate) | true (null tuple) | from Q1, Q2 (natural join) | ( Q ) D ::= THING | C (basic description) Pf ::= id | A.Pf (path function) 4/25/2019

“Deciding Equivalences among Aggregate Queries” W. Nutt, Y. Sagiv, S. Shurin PODS 1998 Equivalence of conjunctive queries containing a single aggregate operator with comparison predicates 4/25/2019

Nutt et. al. In other words, SQL queries of the form SELECT A1, …, An, (B) FROM R1, …, Rm WHERE [Equality Conditions] AND [Binary Comparisons] GROUP BY A1, …, An where (B) 2 {count(*), cntd(B), sum(B), max(B), min(B)} Define core of q(x, (y)) as q(x, y) 4/25/2019

Count(*) Queries q ´ q0 $ qc ´bs q0c Relational (no comparisons): qc ´bs q0c $ qc, q0c are isomorphic Complexity: NP [Chaudhuri, Vardi; PODS 1993] 4/25/2019

Count(*) Queries With Comparisons: qc, q0c isomorphic ! qc ´bs q0c e.g. bag-set equivalent but not isomorphic: q à p(x) Æ p(y) Æ p(z) Æ x<y Æ x<z q0 à p(x) Æ p(y) Æ p(z) Æ x<z Æ y<z 4/25/2019

Count(*) Queries Compatible linearizations Resulting linear expansions qc: {(x<y=z), (x<y<z)} q0c: {(x=y<z), (x<y<z)} Resulting linear expansions qL: { [q à p(x) Æ p(z) Æ p(z) Æ x<z], [q à p(x) Æ p(y) Æ p(z) Æ x<y<z] } q0L: { [q0 à p(y) Æ p(y) Æ p(z) Æ y<z], [q0 à p(x) Æ p(y) Æ p(z) Æ x<y<z] } qc ´bs q0c $ qL, q0L isomorphic Complexity: P-space 4/25/2019

Count Distinct Queries Sufficient: qc ´s q0c ! q ´ q0 e.g. q ´ q0 but qc s q0c q(cntd(y)) Ã p(y) & p(z) & y<z q0(cntd(y)) Ã p(y) & p(z) & y>z qc returns all elements except greatest. q0c returns all elements except least 4/25/2019

Count Distinct Queries Necessary: qc ´s q0c $ q ´ q0 only when q, q0 are reduced no variable in same position as y occurs in strict comparison (c.f. previous example) one of: q, q0 range over rationals q, q0 don’t contain constants No variable in same position as y occurs in any comparison 4/25/2019

Sum Queries Relational, without Constants: q ´ q0 $ qc ´bs q0c Complexity: NP With Comparisons, without Constants: Complexity: P-space 4/25/2019

Sum Queries With Constants: q ´ q0 if and only if Complexity: P-space qc ´ws q0c and qc, q0c have variable-isomorphic linear expansions Complexity: P-space 4/25/2019

Max/Min Queries Definition: q dominates q0 if for all databases: whenever q returns tuple (x, y), q0 returns tuple (x, y0) where y ¸ y0 (for Max, · for Min) q ´ q0 $ qc dominates q0c and q0c dominates qc 4/25/2019

Max/Min Queries Relational: p dominates p0 $ p0 µs p Complexity: NP-complete With Comparisons: p dominates p0 $ 8linearizations p0L of p0, p dominates p0L Complexity: P2-complete 4/25/2019

Summary - Nutt et. al. Consider equivalence of CQL queries only where agg occurs at top level Necessary & Sufficient conditions differ depending upon aggregate operator Only consider the most general case where no schema information is used In reality, schema information is often present 4/25/2019

“Exploiting Uniqueness in Query Optimization” G. Paulley, P. Larson ICDE 1994 Use schema information to remove DISTINCT operator from conjunctive SQL queries (i.e. elim operator from CQL with bag semantics) 4/25/2019

Paulley et. al. Q: select distinct W RW(Q): select W from C1 as A1, …, Cn as An where R RW(Q): select W from C1 as A1, …, Cn as An where R R={constraints over W [ {attributes of A1, …, An}} 4/25/2019

Paulley et. al. Theorem: Q ´ RW(Q) if and only if C1,…,Cm all have candidate keys Define K = key(C1) ± … ± key(Cn) K is a candidate key for C1 £ … £ Cn One of: K µ W Some K0 µ K exists such that: K0 µ W Unique values for (K – K0) can be inferred from R + Schema (CHECK +KEY constraints) 4/25/2019

Paulley et. al. Testing this condition = satisfaction of arbitrary Boolean expression = NP-complete 4/25/2019

“Reasoning about Duplicate Elimination with Descriptive Logic” V. Khizder, D. Toman, G. Weddell DOOD 2002 Incrementally remove conjuncts from scope of elim operator in CQL Map equivalence to DL membership problem instead of Boolean satisfiability 4/25/2019

Khizder et. al. Q: select V RW(Q): from C1 as A1,…,Cm as Am,(elim select W from Cm+1 as Am+1,…,Cn as An,R) RW(Q): select V from C1 as A1,…,Cm+1 as Am+1,(elim select W [ {Am+1} from Cm+2 as Am+2,…,Cn as An,R) R={equality constraints over Pf’s on W [ {A1, …, An}} This Normal Form can always be achieved 4/25/2019

Khizder et. al. Define: S = Database Schema SQ = “Query Schema” Expressed in CFD (CLASSIC + Functional Dependencies) C(Pf1, …, Pfn ! Pf) SQ = “Query Schema” = {CQv(A1:C1), …, CQv(An:Cn), CQvR} 4/25/2019

Reformulating Paulley et. al. Schema ² Q ´ RW(Q) m Schema + R + instance of W ² unique instance of K S [ SQ ² CQ v CQ(W ! K) S [ SQ ² CQ v CQ(W ! A1, …, An) 4/25/2019

Khizder et. al. Theorem: Q ´ RW(Q) if and only if S [ SQ ² CQ v CQ(A [ W ! Am+1) where A = {A1, …, Am} Am+1 2 W FD obviously true Am+1  W Am+1 existentially qualified FD guarantees no duplicates introduced 4/25/2019

Khizder et. al. CQL: S ² Q ´ RW(Q) CFD: m S [ SQ ² CQ v CQ({A1,…,Am} [ W ! Am+1) Apply rewrite iteratively from m=0 to m=n-1 iff: S [ SQ ² CQ v CQ(W ! A1, …, An) 4/25/2019

Complexity Membership in CLASSIC is P-time Holds for CFD assuming S [ SQ only contain regular path FD’s C(Pf1,…,Pfn ! Pf), Pf prefix of some Pfi S does not contain equation constraints SQL CHECK constraints allow disjunction Not expressible in CFD P-time bound does not apply 4/25/2019

Usefulness of Incremental Re-write Move out of elim: Increase search space for join Shrink size of intermediate result requiring sorting Move into elim: When W Å V =  then Values of W not important; only existence Replace subquery with probe to an index 4/25/2019

Rewriting with Aggregates Using aggregate views “A view will be usable to answer a query only if there is an isomorphism between the view and a subset of the query” [Halevy, 2002] Note that the above quote is incorrect. It states a condition as necessary that is actually sufficient (but not necessary). Using schema information can increase the number of usable views. 4/25/2019

Simple Example Schema: Query: Cust(Id, Name) Purch(Id, Item, Price) Cust v Cust(Id ! Name) View1(Id, Name, Sum(Price)) Ã Cust(Id, Name) Æ Purch(Id, Item, Price) Query: Q(Name, Tot) Ã Cust(Id, Name) Æ Spend(Id, Tot) Spend(Id, Sum(P)) Ã Purch(Id, P) 4/25/2019

Rewriting m Rewrite: Valid because Q(Name, Tot) Ã Cust(Id, Name) Æ Spend(Id, Tot) Spend(Id, Sum(P)) Ã Purch(Id, P) m Q0(Name, Tot) Ã Spend0(Id, Name, Tot) Spend0(Id, N, Sum(P)) Ã Cust(Id, N) Æ Purch(Id, Z, P) Valid because S [ SQ ² Cust v Cust(Id ! Name) Now an Spend0 and View1 are isomorphic (Sufficient condition for using View1). 4/25/2019