CS848: Topics in Databases: Foundations of Query Optimization Topics Covered Databases QL Query containment More on QL
CS848: Topics in Databases: Foundations of Query Optimization A simple case of finding a query plan Subsystem3 Subsystem2 Subsystem1 SQL Global Schema A single table: T Open, Scan SQL Server A “local as view (LAV) integration schema”: T ´ Q 2. User submits Q 1. Query optimizer must determine if a scan of T suffices. True iff Q 1 is equivalent to Q 2.
CS848: Topics in Databases: Foundations of Query Optimization In the beginning … Infinite countable sets of each of the following kinds of symbols: C = {C 1, C 2, … }(primitive concepts) A = {A 1, A 2, …} [ {B 1, B 2, …}(attributes) R = {R 1, R 2, … }(roles) Conventions: Attributes (resp. primitive concepts and roles) correspond to words in lower case or to positive integers (resp. words in upper case and words in mixed case).
CS848: Topics in Databases: Foundations of Query Optimization For a particular database I h , ( ¢ ) I i where is a countable possibly infinite domain, and where for each symbol (C) I µ (A) I : ! (R) I µ ( £ )
CS848: Topics in Databases: Foundations of Query Optimization Partial Databases (Aboxes) e e : {C 1, …, C n } e2e2 e1e1 A e2e2 e1e1 R e 2 e 2 e 2 (C i ) I (A) I (e 1 ) = e 2 (e 1, e 2 ) 2 (R) I e2e2 e1e1 e1 e2e1 e2 e2e2 e1e1 e1 e2e1 e2
CS848: Topics in Databases: Foundations of Query Optimization Relational Databases “John” “Mary” 33 nameage EMP e 1 : {EMP} e 2 : {EMP} 33 “Mary” “John” name age
CS848: Topics in Databases: Foundations of Query Optimization Relational Databases (cont’d) {e 1, e 2 } µ (EMP) I (name) I (e 1 ) = “John” (age) I (e 1 ) = (age) I (e 2 ) = 33 {e 1, e 2, “John”, 33, “Mary”} µ e1 e2e1 e2 “John” ? 33 e 1 : {EMP} e 2 : {EMP} 33 “Mary” “John” name age
CS848: Topics in Databases: Foundations of Query Optimization Dialects of QL ( expressiveness ) ( semantics ) Conjunctive QL with bag semantics † Positive QL First order QL Conjunctive QL First order QL with bag semantics Positive QL with bag semantics ‡ † [Khizder et al., 1999], ‡ [Lui et al., 2002]
CS848: Topics in Databases: Foundations of Query Optimization Conjunctive QL Q ::=D as A(quantification) |A 1 = A 2.R(unnest) |A 1.Pf 1 = A 2.Pf 2 (selection) |elim A 1, …, A n Q (projection) |true(null tuple) |from Q 1, Q 2 (natural join) |( Q ) D ::=THING | C (basic description) Pf ::=id | A.Pf (path function)
CS848: Topics in Databases: Foundations of Query Optimization Well Formed Queries: (Q) (D as A) ´ {A} (A 1 = A 2.R) ´ {A 1, A 2 } (A 1.Pf 1 = A 2.Pf 2 ) ´ {A 1, A 2 } (elim A 1, …, A n Q) ´ {A 1, …, A n } (true) ´ ; (from Q 1, Q 2 ) ´ (Q 1 ) [ (Q 2 ) Require {A 1, …, A n } µ (Q) for projection operators.
CS848: Topics in Databases: Foundations of Query Optimization Tuples and Bags A (duplicate) tuple t with attribute bindings for attributes {A 1, …, A n } over a database I = h , ( ¢ ) I i has the general form h A 1 : e 1, …, A n : e n, cnt : i i, where {e 1, …, e n } µ , “cnt” is a distinct attribute not used in queries, and i a positive integer. A set of duplicate tuples that contain the same attribute bindings is called a bag.
CS848: Topics in Databases: Foundations of Query Optimization Operations on Tuples (t) ´ set of attributes occurring in t, excluding cnt. ´ integer i such that “cnt : i” occurs in t ´ element e 2 such that “A : e” occurs in t; defined only when A 2 (t) t[{A 1, …, A n }] ´ {A 1 : 1, …, A n : n }; defined only when {A 1, …, A n } µ (t) [t] ´ t[ (t)]
CS848: Topics in Databases: Foundations of Query Optimization Semantics The meaning of a query Q, denoted « Q ¬, is a function that maps databases to bags. The behavior of this function on a particular database I = h , ( ¢ ) I i is defined as follows. « THING as A ¬ ( I ) ´ { h A : e, cnt : 1 i : e 2 } « C as A ¬ ( I ) ´ { h A : e, cnt : 1 i : e 2 (C) I } « A 1 = A 2.R ¬ ( I ) ´ { h A 1 : e 1, A 2 : e 2, cnt : 1 i : (e 2, e 1 ) 2 R I } « A 1.Pf 1 = A 2.Pf 2 ¬ ( I ) ´ { h A 1 : e 1, A 2 : e 2, cnt : 1 i : (Pf 1 ) I (e 1 ) = (Pf 2 ) I (e 2 )} where (id) I ´ {(e, e) : e 2 } (A.Pf ) I ´ {(e 1, e 2 ) : (Pf ) I ((A) I (e 1 )) = e 2 }
CS848: Topics in Databases: Foundations of Query Optimization Semantics (cont’d) « elim A 1, …, A n Q ¬ ( I ) ´ ;, if not well formed; otherwise { h A 1 : 1, …, A n : n, cnt : 1 i : t 2 « Q ¬ ( I )} « true ¬ ( I ) ´ { h cnt : 1 i } « from Q 1, Q 2 ¬ ( I ) ´ {t : (t) = (Q 1 ) [ (Q 2 ) Æ 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : = t £ t Æ t[ (t 1 )] = [t 1 ] Æ t[ (t 2 )] = [t 2 ]}
CS848: Topics in Databases: Foundations of Query Optimization Syntactic Sugar A A n.id ´ A A n select distinct A 1, …, A n Q ´ elim A 1, …, A n Q select * Q ´ Q Q 1 where Q 2 ´ from Q 1, Q 2 Q 1 and Q 2 ´ from Q 1, Q 2 from ´ true from Q 1, Q 2, …, Q n ´ from (from Q 1, Q 2, …) Q n
CS848: Topics in Databases: Foundations of Query Optimization Examples The names of employees who have the same age as another employee with a given name. select distinct :p, name from EMP as e, (select distinct :p, e1 from EMP as e1, EMP as e2 where e1.age = e2.age and e2.name = :p ) where e.name = name and e.id = e1.id
CS848: Topics in Databases: Foundations of Query Optimization Method Calls (more syntactic sugar) A 1.Pf 1.C(A 2.Pf 2, …, A n-1.Pf n-1 ) = A n.Pf n select distinct A 1, …, A n ´ from C as A where A.1 = A 1.Pf 1 and … and A.n = A n.Pf n A 1.Pf 1.C(A 2.Pf 2, …, A n-1.Pf n-1 ) as A n ´ A 1.Pf 1.C(A 2.Pf 2, …, A n-1.Pf n-1 ) = A n.id
CS848: Topics in Databases: Foundations of Query Optimization Examples (cont’d) select distinct name from EMP as e, (select distinct e1 from EMP as e1, EMP as e2 where e2.age.+(e2.age) = e1.age ) where e.name = name and e.id = e1.id The names of employees who have an age double that of another employee.
CS848: Topics in Databases: Foundations of Query Optimization Conjunctive Datalog (more syntactic sugar) C(A 1, …, A n ) select distinct A 1, …, A n ´ from C as A where A.1 = A 1.id and … and A.n = A n.id (A 1, …, A m ) :- Q 1, …, Q n. ´ select distinct A 1, …, A m from Q 1, …, Q n
CS848: Topics in Databases: Foundations of Query Optimization Positive QL Q ::=empty A 1, …, A n (empty set) |Q 1 union Q 2 (union) (empty A 1, …, A n ) ´ {A 1, …, A n } (Q 1 union all Q 2 ) ´ (Q 1 ) Require (Q 1 ) = (Q 2 ) in union operations.
CS848: Topics in Databases: Foundations of Query Optimization Semantics « empty A 1, …, A n ¬ ( I ) ´ ; « Q 1 union Q 2 ¬ ( I ) ´ {t : = 1 Æ (t) = (Q 1 ) Æ (t) = (Q 2 ) Æ ( ( 9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ] Æ :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ] ) Ç ( 9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ] Æ :9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ] ) Ç ( 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : [t] = [t 1 ] Æ [t] = [t 2 ] ) )}
CS848: Topics in Databases: Foundations of Query Optimization First Order QL Q ::=Q 1 minus Q 2 (difference) (Q 1 minus Q 2 ) ´ (Q 1 ) Require (Q 1 ) = (Q 2 ) in difference operations.
CS848: Topics in Databases: Foundations of Query Optimization Semantics « Q 1 minus Q 2 ¬ ( I ) ´ {t : = 1 Æ (t) = (Q 1 ) Æ (t) = (Q 2 ) Æ ( 9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ] ) Æ ( :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ] )}
CS848: Topics in Databases: Foundations of Query Optimization QL with Duplicates Q ::= select A 1, …, A n Q(duplicate preserving projection) |Q 1 union all Q 2 (bag union) |Q 1 minus all Q 2 (bag difference)
CS848: Topics in Databases: Foundations of Query Optimization Well Formed Queries (cont’d) (select A 1, …, A n Q) ´ {A 1, …, A n } (Q 1 union all Q 2 ) ´ (Q 1 ) (Q 1 minus all Q 2 ) ´ (Q 1 ) Require (Q 1 ) = (Q 2 ) in bag union and bag difference operations, and that {A 1, …, A n } µ (Q) in (duplicate preserving) projection operations.
CS848: Topics in Databases: Foundations of Query Optimization Semantics « select A 1, …, A n Q ¬ ( I ) ´ ;, if not well formed and representable † ; otherwise { h A 1 : t 1, …, A n : t n, cnt : n i : t 1 2 « Q ¬ ( I ) Æ n = (t } t 2 2 « Q ¬ ( I ) : t 2 [{A 1, …, A n }] = t 1 [{A 1, …, A n }] † The selection operation is representable on database I iff, for every t 1 2 « Q ¬ ( I ), |{t 2 2 « Q ¬ ( I ) : t 2 [{A 1, …, A n }] = t 1 [{A 1, …, A n }]}| is finite.
CS848: Topics in Databases: Foundations of Query Optimization Example A duplicate preserving projection operation that is not representable in any database with an infinite domain. select e1 from THING as e1, THING as e2 Observation: All well-formed duplicate preserving projection operations on databases with finite domains are representable.
CS848: Topics in Databases: Foundations of Query Optimization Semantics (cont’d) « Q 1 union all Q 2 ¬ ( I ) ´ ;, if not well formed; otherwise {t 2 « Q 1 ¬ ( I ) : :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ]} [ {t 2 « Q 2 ¬ ( I ) : :9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ]} [ {t : 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : [t] = [t 1 ]} Æ [t] = [t 2 ] Æ = t + t « Q 1 minus all Q 2 ¬ ( I ) ´ ;, if not well formed; otherwise {t 2 « Q 1 ¬ ( I ) : :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ]} [ {t : 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : [t] = [t 1 ]} Æ [t] = [t 2 ] Æ = t t Æ t t }
CS848: Topics in Databases: Foundations of Query Optimization Summary at, =, elim, true, from, select at, =, elim, true, from, empty, union at, =, elim, true, from, empty, union, minus at, =, elim, true, from at, =, elim, true, from, select, empty, union all, minus all at, =, elim, true, from, select, empty, union all (conjunctive) (bag semantics) (set semantics) (positive)(first order)
CS848: Topics in Databases: Foundations of Query Optimization Query Contexts An expression Q[] in the language QL enriched by an additional terminal symbol [] is called a query context. For a query Q 1 2 QL, the expression Q 1 [Q 2 ] denotes the syntactical substitution of Q 2 for []. Q 2 is compatible with Q 1 if Q 1 [Q 2 ] 2 QL. For example, Q 2 is compatible with Q 1 in the following. Q 2 : EMP as e where e.name = :p Q 1 : select distinct :p, d from DEPT as d, [] where d = e.dept
CS848: Topics in Databases: Foundations of Query Optimization The Query Equivalence Problem Q 1 is equivalent to Q 2 for database I, written I ² (Q 1 ´ Q 2 ), if « Q 1 ¬ ( I ) = « Q 2 ¬ ( I ). A query equivalence dependency E has the form (Q 1 ´ Q 2 ). E = (Q 1 ´ Q 2 ) is an axiom if, for any database I, I ² (Q 1 ´ Q 2 ). A query equivalence problem for a given set of query equivalence dependencies is to determine if a given member of the set is an axiom.
CS848: Topics in Databases: Foundations of Query Optimization Some Axioms Question: Is it true that any E with the following form is an axiom? (elim A 1, …, A m Q 1 )[elim B 1, …, B n Q 2 ] ´ elim A 1, …, A m Q 1 [Q 2 ] Answer: No. However, any such E is an axiom if any attribute in (Q 2 ) – {B 1, …, B n }) does not occur in query context (elim A 1, …, A m Q 1 []).
CS848: Topics in Databases: Foundations of Query Optimization Excluding variable reuse in QL Q has an occurrence of variable reuse if there is a query context Q 1 [] and a query of the form elim A 1, …, A n Q 2 or of the form select A 1, …, A n Q 2 such that Q = Q 1 [Q 2 ] and there exists A in ( (Q 2 ) – {A 1, …, A n }) that also occurs in Q 1 []. Observation: For any Q 1, there exists an equivalent class of query Q 2 that has no occurrence of variable reuse.
CS848: Topics in Databases: Foundations of Query Optimization The Query Containment Problem Q 1 is contained in Q 2 for database I, written I ² (Q 1 v Q 2 ), if, for any tuple t 1 in « Q 1 ¬ ( I ), there exists t 2 in « Q 2 ¬ ( I ) such that [t 1 ] = [t 2 ] and t t A query containment dependency C has the form (Q 1 v Q 2 ). C = (Q 1 v Q 2 ) is an axiom if, for any database I, I ² (Q 1 v Q 2 ). A query containment problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom.
CS848: Topics in Databases: Foundations of Query Optimization Equivalence and Containment Observation: Equivalence reduces to containment. Q 1 ´ Q 2 iff Q 1 v Q 2 and Q 2 v Q 1 Observation: Containment reduces to equivalence in first order QL. Q 1 v Q 2 iff (Q 1 minus all Q 2 ) ´ empty (Q 1 )
CS848: Topics in Databases: Foundations of Query Optimization Some Complexity Results Theorem: The query equivalence and containment problems for conjunctive QL is NP-complete. † † Chandra, A. K. and P. M. Merlin. Optimal implementation of conjunctive queries in relational databases. Proc. Ninth Annual ACM Symposium on the Theory of Computing, pp. 77–90, 1977.
CS848: Topics in Databases: Foundations of Query Optimization A Decision Procedure Theorem: The following procedure decides if C = (Q 1 v Q 2 ) is an axiom for conjunctive QL. † 1.Freeze the body of Q 1 by creating a partial database consisting of individuals that include its variables. 2.If the tuple h A 1 : A 1, …, A n : A n, cnt : 1 i occurs in «Q 2 ¬ ( I ), where (Q 1 ) = {A 1, …, A n }, then return true; otherwise return false. ‡ † Derived from [Ullman, 1999]. ‡ Use forced semantics for selection operations.
CS848: Topics in Databases: Foundations of Query Optimization Obtaining a Partial Database from Q A 1.A A m = B 1.B B n A1A1 A2A2 AmAm … B1B1 B2B2 BnBn … THING as A A C as A A : {C} A 1 = A 2.R A2A2 R A1A1
CS848: Topics in Databases: Foundations of Query Optimization Deriving Partial Databases (cont’d) w : L u : L 1 A v : L 2 A w : L u : L 1 A v : L 2 A u : L 1 v : L 3 A x : L 4 A w : L 2 u : L 1 v : L 3 A x : L 4 A w : L 2
CS848: Topics in Databases: Foundations of Query Optimization Deriving Partial Databases (cont’d) n 1 : L 1 n 2 : L 2 n 1 : L 1 [ L 2 n 2 : L 1 [ L 2 n 1 : L 1 n 2 : L 2 n 3 : L 3 n 1 : L 1 n 2 : L 2 n 3 : L 3
CS848: Topics in Databases: Foundations of Query Optimization Evaluating Selections on Partial Databases Note that selection conditions can navigate missing attribute values. In such cases, assume a forced semantics. In particular, two nodes n 1 and n 2 satisfy a selection condition iff the condition has the form n 1.Pf 1.Pf = n 2.Pf 2.Pf where (Pf 1 ) I (n 1 ) and (Pf 2 ) I (n 2 ) are defined and lead to nodes connected by an equality arc.
CS848: Topics in Databases: Foundations of Query Optimization Some Complexity Results (cont’d) Theorem: The query equivalence problem for conjunctive QL with bag semantics is NP-complete. Observation: The complexity of the query containment problem for conjunctive QL with bag semantics remains open at this time. Example: † In conjunctive QL with bag semantics, the query containment dependency Q 1 v Q 2 is an axiom, where Q 1 and Q 2 have the respective definitions select x, z select x, z from P as x, R as z from P as x, R as z where x = u.Q and z = v.Q where y = u.Q and y = v.Q † [Chaudhuri and Vardi, 1993]
CS848: Topics in Databases: Foundations of Query Optimization The Query Membership Problem A database schema, denoted T, consists of a finite set { C 1, …, C n } of query containment dependencies. C is an axiom relative to database schema T = { C 1, …, C n }, written T ² C, if, for any database I, I ² C if I ² C i for each i. A query membership problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom relative to a given database schema also consisting of members of the set.
CS848: Topics in Databases: Foundations of Query Optimization More Complexity Results Theorem: The query membership problem for conjunctive QL is undecidable. Theorem: The query membership problem for first order QL is equivalent to the query containment problem for first order QL. Proof: Assignment.
CS848: Topics in Databases: Foundations of Query Optimization More on QL Defining database schema Expressing access plans
CS848: Topics in Databases: Foundations of Query Optimization Modeling Generalization Taxonomies Consider a simple object-oriented schema language consisting of sentences of the following form. † class C {A 1 : ref C 1, …, A m : ref C m } [isa C 1, …, C n ]; Assignment: Encode a fixed collection of such sentences as a database schema in conjunctive QL. Your encoding should be as compact as possible and should enable the following questions to be expressed as query containment dependencies over your schema. 1.Is C a defined class? 2.Is attribute A defined on class C? 3.Can an object reside in both class C 1 and class C 2 ? † Assume that any object in a database was created with respect to a single class.
CS848: Topics in Databases: Foundations of Query Optimization Modeling Pipelined Query Access Plans (syntax)(defn of ( ¢ )) (parameter)Q ::= (PARAM as A){A} (index scan) |(from C as A, A.1 = B 1, …, A.n = B n ){A} (nested loops) |(from Q 1, Q 2 ) (Q 1 ) [ (Q 2 ) (noop) |(select A 1, …, A n Q) (Q) Å {A 1, …, A n } (record field access) |(A 1 = A 2.B){A 1 } (comparison) |(A 1 = A 2 ) ; (catenation) |(Q 1 union all Q 2 ) (Q 1 ) Å (Q 2 ) (cut) |(elim A 1, …, A n Q) ; |… Require: 1. ( (Q 2 ) – (Q 2 )) µ (Q 1 ) for nested loops, and 2. (Q) = (Q) for top-level queries.
CS848: Topics in Databases: Foundations of Query Optimization Alternative Semantics Require richer models theories for 1.sort operations, and 2.named cuts.