CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.

Slides:



Advertisements
Similar presentations
Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.
Advertisements

Oracle Object-Relational Model. - Structures : tables, views, indexes, etc. - Operations : actions that manipulate data stored in structures - Integrity.
CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5 Modified by Donghui Zhang.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
Relational Algebra Dashiell Fryer. What is Relational Algebra? Relational algebra is a procedural query language. Relational algebra is a procedural query.
Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara
Basic Structures: Sets, Functions, Sequences, Sums, and Matrices
Basic Structures: Sets, Functions, Sequences, Sums, and Matrices
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
Chapter 3 An Introduction to Relational Databases.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Cs3431 Relational Algebra : #I Based on Chapter 2.4 & 5.1.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Relational Calculus CS 186, Fall 2003, Lecture 6 R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion.
Advanced Database CS-426 Week 2 – Logic Query Languages, Object Model.
CS355 - Theory of Computation Lecture 2: Mathematical Preliminaries.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
Copyright © Curt Hill The Relational Algebra What operations can be done?
The Relational Model: Relational Calculus
Chapter 9. Chapter Summary Relations and Their Properties Representing Relations Equivalence Relations Partial Orderings.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Chapter 7 Relational Algebra. Topics in this Chapter Closure Revisited The Original Algebra: Syntax and Semantics What is the Algebra For? Further Points.
CS848: Topics in Databases: Foundations of Query Optimization Topics covered Overview of DEMO  Capturing database schema in QL  Differential query optimization.
CS 103 Discrete Structures Lecture 10 Basic Structures: Sets (1)
1 Relational Algebra. 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of data from a database. v Relational model supports.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
Relational Algebra.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Review of complexity.
CompSci 102 Discrete Math for Computer Science
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
Relational Algebra Operators
Advanced Relational Algebra & SQL (Part1 )
Discrete Mathematical Structures 4 th Edition Kolman, Busby, Ross © 2000 by Prentice-Hall, Inc. ISBN
CS6133 Software Specification and Verification
Mathematical Preliminaries
CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL.
LDK R Logics for Data and Knowledge Representation ClassL (Propositional Description Logic with Individuals) 1.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
1 First order theories (Chapter 1, Sections 1.4 – 1.5) From the slides for the book “Decision procedures” by D.Kroening and O.Strichman.
ece 627 intelligent web: ontology and beyond
First-Order Logic Semantics Reading: Chapter 8, , FOL Syntax and Semantics read: FOL Knowledge Engineering read: FOL.
Set Theory Concepts Set – A collection of “elements” (objects, members) denoted by upper case letters A, B, etc. elements are lower case brackets are used.
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Lecture 5 Set Theory. Plan of lecture Why set theory? Sets and their properties Membership and definition of sets “Famous” sets Types of variables and.
Chapter 2 1. Chapter Summary Sets (This Slide) The Language of Sets - Sec 2.1 – Lecture 8 Set Operations and Set Identities - Sec 2.2 – Lecture 9 Functions.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Relational Algebra & Calculus
More SQL: Complex Queries,
Computing Full Disjunctions
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Relational Algebra Friday, 11/14/2003.
Equivalence of Aggregate Queries in Conjunctive QL
Topics covered (class assignment)
Relational Algebra & Calculus
Relational Calculus Chapter 4, Part B
Presentation transcript:

CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL

CS848: Topics in Databases: Foundations of Query Optimization A simple case of finding a query plan Subsystem3 Subsystem2 Subsystem1 SQL Global Schema A single table: T Open, Scan SQL Server  A “local as view (LAV) integration schema”: T ´ Q 2.  User submits Q 1.  Query optimizer must determine if a scan of T suffices.  True iff Q 1 is equivalent to Q 2.

CS848: Topics in Databases: Foundations of Query Optimization In the beginning … Infinite countable sets of each of the following kinds of symbols: C = {C 1, C 2, … }(primitive concepts) A = {A 1, A 2, …} [ {B 1, B 2, …}(attributes) R = {R 1, R 2, … }(roles) Conventions: Attributes (resp. primitive concepts and roles) correspond to words in lower case or to positive integers (resp. words in upper case and words in mixed case).

CS848: Topics in Databases: Foundations of Query Optimization For a particular database I h , ( ¢ ) I i where  is a countable possibly infinite domain, and where for each symbol (C) I µ  (A) I :  !  (R) I µ (  £  )

CS848: Topics in Databases: Foundations of Query Optimization Partial Databases (Aboxes) e e : {C 1, …, C n } e2e2 e1e1 A e2e2 e1e1 R e 2 e 2  e 2 (C i ) I (A) I (e 1 ) = e 2 (e 1, e 2 ) 2 (R) I e2e2 e1e1 e1  e2e1  e2 e2e2 e1e1 e1  e2e1  e2

CS848: Topics in Databases: Foundations of Query Optimization Relational Databases “John” “Mary” 33 nameage EMP e 1 : {EMP} e 2 : {EMP} 33 “Mary” “John” name age

CS848: Topics in Databases: Foundations of Query Optimization Relational Databases (cont’d) {e 1, e 2 } µ (EMP) I (name) I (e 1 ) = “John” (age) I (e 1 ) = (age) I (e 2 ) = 33 {e 1, e 2, “John”, 33, “Mary”} µ  e1  e2e1  e2 “John” ? 33 e 1 : {EMP} e 2 : {EMP} 33 “Mary” “John” name age

CS848: Topics in Databases: Foundations of Query Optimization Dialects of QL ( expressiveness ) ( semantics ) Conjunctive QL with bag semantics † Positive QL First order QL Conjunctive QL First order QL with bag semantics Positive QL with bag semantics ‡ † [Khizder et al., 1999], ‡ [Lui et al., 2002]

CS848: Topics in Databases: Foundations of Query Optimization Conjunctive QL Q ::=D as A(quantification) |A 1 = A 2.R(unnest) |A 1.Pf 1 = A 2.Pf 2 (selection) |elim A 1, …, A n Q (projection) |true(null tuple) |from Q 1, Q 2 (natural join) |( Q ) D ::=THING | C (basic description) Pf ::=id | A.Pf (path function)

CS848: Topics in Databases: Foundations of Query Optimization Well Formed Queries:  (Q)  (D as A) ´ {A}  (A 1 = A 2.R) ´ {A 1, A 2 }  (A 1.Pf 1 = A 2.Pf 2 ) ´ {A 1, A 2 }  (elim A 1, …, A n Q) ´ {A 1, …, A n }   (true) ´ ;  (from Q 1, Q 2 ) ´  (Q 1 ) [  (Q 2 ) Require {A 1, …, A n } µ  (Q) for projection operators.

CS848: Topics in Databases: Foundations of Query Optimization Tuples and Bags A (duplicate) tuple t with attribute bindings for attributes {A 1, …, A n } over a database I = h , ( ¢ ) I i has the general form h A 1 : e 1, …, A n : e n, cnt : i i, where {e 1, …, e n } µ , “cnt” is a distinct attribute not used in queries, and i a positive integer. A set of duplicate tuples that contain the same attribute bindings is called a bag.

CS848: Topics in Databases: Foundations of Query Optimization Operations on Tuples  (t) ´ set of attributes occurring in t, excluding cnt. ´ integer i such that “cnt : i” occurs in t ´ element e 2  such that “A : e” occurs in t; defined only when A 2  (t) t[{A 1, …, A n }] ´ {A 1 : 1, …, A n : n }; defined only when {A 1, …, A n } µ  (t) [t] ´ t[  (t)]

CS848: Topics in Databases: Foundations of Query Optimization Semantics The meaning of a query Q, denoted « Q ¬, is a function that maps databases to bags. The behavior of this function on a particular database I = h , ( ¢ ) I i is defined as follows. « THING as A ¬ ( I ) ´ { h A : e, cnt : 1 i : e 2  } « C as A ¬ ( I ) ´ { h A : e, cnt : 1 i : e 2 (C) I } « A 1 = A 2.R ¬ ( I ) ´ { h A 1 : e 1, A 2 : e 2, cnt : 1 i : (e 2, e 1 ) 2 R I } « A 1.Pf 1 = A 2.Pf 2 ¬ ( I ) ´ { h A 1 : e 1, A 2 : e 2, cnt : 1 i : (Pf 1 ) I (e 1 ) = (Pf 2 ) I (e 2 )} where (id) I ´ {(e, e) : e 2  } (A.Pf ) I ´ {(e 1, e 2 ) : (Pf ) I ((A) I (e 1 )) = e 2 }

CS848: Topics in Databases: Foundations of Query Optimization Semantics (cont’d) « elim A 1, …, A n Q ¬ ( I ) ´ ;, if not well formed; otherwise { h A 1 : 1, …, A n : n, cnt : 1 i : t 2 « Q ¬ ( I )} « true ¬ ( I ) ´ { h cnt : 1 i } « from Q 1, Q 2 ¬ ( I ) ´ {t :  (t) =  (Q 1 ) [  (Q 2 ) Æ 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : = t £ t Æ t[  (t 1 )] = [t 1 ] Æ t[  (t 2 )] = [t 2 ]}

CS848: Topics in Databases: Foundations of Query Optimization Syntactic Sugar A A n.id ´ A A n select distinct A 1, …, A n Q ´ elim A 1, …, A n Q select * Q ´ Q Q 1 where Q 2 ´ from Q 1, Q 2 Q 1 and Q 2 ´ from Q 1, Q 2 from ´ true from Q 1, Q 2, …, Q n ´ from (from Q 1, Q 2, …) Q n

CS848: Topics in Databases: Foundations of Query Optimization Examples The names of employees who have the same age as another employee with a given name. select distinct :p, name from EMP as e, (select distinct :p, e1 from EMP as e1, EMP as e2 where e1.age = e2.age and e2.name = :p ) where e.name = name and e.id = e1.id

CS848: Topics in Databases: Foundations of Query Optimization Method Calls (more syntactic sugar) A 1.Pf 1.C(A 2.Pf 2, …, A n-1.Pf n-1 ) = A n.Pf n select distinct A 1, …, A n ´ from C as A where A.1 = A 1.Pf 1 and … and A.n = A n.Pf n A 1.Pf 1.C(A 2.Pf 2, …, A n-1.Pf n-1 ) as A n ´ A 1.Pf 1.C(A 2.Pf 2, …, A n-1.Pf n-1 ) = A n.id

CS848: Topics in Databases: Foundations of Query Optimization Examples (cont’d) select distinct name from EMP as e, (select distinct e1 from EMP as e1, EMP as e2 where e2.age.+(e2.age) = e1.age ) where e.name = name and e.id = e1.id The names of employees who have an age double that of another employee.

CS848: Topics in Databases: Foundations of Query Optimization Conjunctive Datalog (more syntactic sugar) C(A 1, …, A n ) select distinct A 1, …, A n ´ from C as A where A.1 = A 1.id and … and A.n = A n.id (A 1, …, A m ) :- Q 1, …, Q n. ´ select distinct A 1, …, A m from Q 1, …, Q n

CS848: Topics in Databases: Foundations of Query Optimization Positive QL Q ::=empty A 1, …, A n (empty set) |Q 1 union Q 2 (union)  (empty A 1, …, A n ) ´ {A 1, …, A n }  (Q 1 union all Q 2 ) ´  (Q 1 )  Require  (Q 1 ) =  (Q 2 ) in union operations.

CS848: Topics in Databases: Foundations of Query Optimization Semantics « empty A 1, …, A n ¬ ( I ) ´ ; « Q 1 union Q 2 ¬ ( I ) ´ {t : = 1 Æ  (t) =  (Q 1 ) Æ  (t) =  (Q 2 ) Æ ( ( 9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ] Æ :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ] ) Ç ( 9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ] Æ :9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ] ) Ç ( 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : [t] = [t 1 ] Æ [t] = [t 2 ] ) )}

CS848: Topics in Databases: Foundations of Query Optimization First Order QL Q ::=Q 1 minus Q 2 (difference)  (Q 1 minus Q 2 ) ´  (Q 1 ) Require  (Q 1 ) =  (Q 2 ) in difference operations.

CS848: Topics in Databases: Foundations of Query Optimization Semantics « Q 1 minus Q 2 ¬ ( I ) ´ {t : = 1 Æ  (t) =  (Q 1 ) Æ  (t) =  (Q 2 ) Æ ( 9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ] ) Æ ( :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ] )}

CS848: Topics in Databases: Foundations of Query Optimization QL with Duplicates Q ::= select A 1, …, A n Q(duplicate preserving projection) |Q 1 union all Q 2 (bag union) |Q 1 minus all Q 2 (bag difference)

CS848: Topics in Databases: Foundations of Query Optimization Well Formed Queries (cont’d)  (select A 1, …, A n Q) ´ {A 1, …, A n }   (Q 1 union all Q 2 ) ´  (Q 1 )  (Q 1 minus all Q 2 ) ´  (Q 1 ) Require  (Q 1 ) =  (Q 2 ) in bag union and bag difference operations, and that {A 1, …, A n } µ  (Q) in (duplicate preserving) projection operations.

CS848: Topics in Databases: Foundations of Query Optimization Semantics « select A 1, …, A n Q ¬ ( I ) ´ ;, if not well formed and representable † ; otherwise { h A 1 : t 1, …, A n : t n, cnt : n i : t 1 2 « Q ¬ ( I ) Æ n =  (t } t 2 2 « Q ¬ ( I ) : t 2 [{A 1, …, A n }] = t 1 [{A 1, …, A n }] † The selection operation is representable on database I iff, for every t 1 2 « Q ¬ ( I ), |{t 2 2 « Q ¬ ( I ) : t 2 [{A 1, …, A n }] = t 1 [{A 1, …, A n }]}| is finite.

CS848: Topics in Databases: Foundations of Query Optimization Example A duplicate preserving projection operation that is not representable in any database with an infinite domain. select e1 from THING as e1, THING as e2 Observation: All well-formed duplicate preserving projection operations on databases with finite domains are representable.

CS848: Topics in Databases: Foundations of Query Optimization Semantics (cont’d) « Q 1 union all Q 2 ¬ ( I ) ´ ;, if not well formed; otherwise {t 2 « Q 1 ¬ ( I ) : :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ]} [ {t 2 « Q 2 ¬ ( I ) : :9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ]} [ {t : 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : [t] = [t 1 ]} Æ [t] = [t 2 ] Æ = t + t « Q 1 minus all Q 2 ¬ ( I ) ´ ;, if not well formed; otherwise {t 2 « Q 1 ¬ ( I ) : :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ]} [ {t : 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : [t] = [t 1 ]} Æ [t] = [t 2 ] Æ = t  t Æ t  t }

CS848: Topics in Databases: Foundations of Query Optimization Summary at, =, elim, true, from, select at, =, elim, true, from, empty, union at, =, elim, true, from, empty, union, minus at, =, elim, true, from at, =, elim, true, from, select, empty, union all, minus all at, =, elim, true, from, select, empty, union all (conjunctive) (bag semantics) (set semantics) (positive)(first order)

CS848: Topics in Databases: Foundations of Query Optimization Query Contexts An expression Q[] in the language QL enriched by an additional terminal symbol [] is called a query context. For a query Q 1 2 QL, the expression Q 1 [Q 2 ] denotes the syntactical substitution of Q 2 for []. Q 2 is compatible with Q 1 if Q 1 [Q 2 ] 2 QL. For example, Q 2 is compatible with Q 1 in the following. Q 2 : EMP as e where e.name = :p Q 1 : select distinct :p, d from DEPT as d, [] where d = e.dept

CS848: Topics in Databases: Foundations of Query Optimization The Query Equivalence Problem Q 1 is equivalent to Q 2 for database I, written I ² (Q 1 ´ Q 2 ), if « Q 1 ¬ ( I ) = « Q 2 ¬ ( I ). A query equivalence dependency E has the form (Q 1 ´ Q 2 ). E = (Q 1 ´ Q 2 ) is an axiom if, for any database I, I ² (Q 1 ´ Q 2 ). A query equivalence problem for a given set of query equivalence dependencies is to determine if a given member of the set is an axiom.

CS848: Topics in Databases: Foundations of Query Optimization Some Axioms Question: Is it true that any E with the following form is an axiom? (elim A 1, …, A m Q 1 )[elim B 1, …, B n Q 2 ] ´ elim A 1, …, A m Q 1 [Q 2 ] Answer: No. However, any such E is an axiom if any attribute in  (Q 2 ) – {B 1, …, B n }) does not occur in query context (elim A 1, …, A m Q 1 []).

CS848: Topics in Databases: Foundations of Query Optimization Excluding variable reuse in QL Q has an occurrence of variable reuse if there is a query context Q 1 [] and a query of the form elim A 1, …, A n Q 2 or of the form select A 1, …, A n Q 2 such that Q = Q 1 [Q 2 ] and there exists A in (  (Q 2 ) – {A 1, …, A n }) that also occurs in Q 1 []. Observation: For any Q 1, there exists an equivalent class of query Q 2 that has no occurrence of variable reuse.

CS848: Topics in Databases: Foundations of Query Optimization The Query Containment Problem Q 1 is contained in Q 2 for database I, written I ² (Q 1 v Q 2 ), if, for any tuple t 1 in « Q 1 ¬ ( I ), there exists t 2 in « Q 2 ¬ ( I ) such that [t 1 ] = [t 2 ] and t  t A query containment dependency C has the form (Q 1 v Q 2 ). C = (Q 1 v Q 2 ) is an axiom if, for any database I, I ² (Q 1 v Q 2 ). A query containment problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom.

CS848: Topics in Databases: Foundations of Query Optimization Equivalence and Containment Observation: Equivalence reduces to containment. Q 1 ´ Q 2 iff Q 1 v Q 2 and Q 2 v Q 1 Observation: Containment reduces to equivalence in first order QL. Q 1 v Q 2 iff (Q 1 minus all Q 2 ) ´ empty  (Q 1 )

CS848: Topics in Databases: Foundations of Query Optimization Some Complexity Results Theorem: The query equivalence and containment problems for conjunctive QL is NP-complete. † † Chandra, A. K. and P. M. Merlin. Optimal implementation of conjunctive queries in relational databases. Proc. Ninth Annual ACM Symposium on the Theory of Computing, pp. 77–90, 1977.

CS848: Topics in Databases: Foundations of Query Optimization A Decision Procedure Theorem: The following procedure decides if C = (Q 1 v Q 2 ) is an axiom for conjunctive QL. † 1.Freeze the body of Q 1 by creating a partial database consisting of individuals that include its variables. 2.If the tuple h A 1 : A 1, …, A n : A n, cnt : 1 i occurs in «Q 2 ¬ ( I ), where  (Q 1 ) = {A 1, …, A n }, then return true; otherwise return false. ‡ † Derived from [Ullman, 1999]. ‡ Use forced semantics for selection operations.

CS848: Topics in Databases: Foundations of Query Optimization Obtaining a Partial Database from Q A 1.A A m = B 1.B B n A1A1 A2A2 AmAm … B1B1 B2B2 BnBn … THING as A A C as A A : {C} A 1 = A 2.R A2A2 R A1A1

CS848: Topics in Databases: Foundations of Query Optimization Deriving Partial Databases (cont’d) w : L u : L 1 A v : L 2 A w : L u : L 1 A v : L 2 A u : L 1 v : L 3 A x : L 4 A w : L 2 u : L 1 v : L 3 A x : L 4 A w : L 2

CS848: Topics in Databases: Foundations of Query Optimization Deriving Partial Databases (cont’d) n 1 : L 1 n 2 : L 2 n 1 : L 1 [ L 2 n 2 : L 1 [ L 2 n 1 : L 1 n 2 : L 2 n 3 : L 3 n 1 : L 1 n 2 : L 2 n 3 : L 3

CS848: Topics in Databases: Foundations of Query Optimization Evaluating Selections on Partial Databases Note that selection conditions can navigate missing attribute values. In such cases, assume a forced semantics. In particular, two nodes n 1 and n 2 satisfy a selection condition iff the condition has the form n 1.Pf 1.Pf = n 2.Pf 2.Pf where (Pf 1 ) I (n 1 ) and (Pf 2 ) I (n 2 ) are defined and lead to nodes connected by an equality arc.

CS848: Topics in Databases: Foundations of Query Optimization Some Complexity Results (cont’d) Theorem: The query equivalence problem for conjunctive QL with bag semantics is NP-complete. Observation: The complexity of the query containment problem for conjunctive QL with bag semantics remains open at this time. Example: † In conjunctive QL with bag semantics, the query containment dependency Q 1 v Q 2 is an axiom, where Q 1 and Q 2 have the respective definitions select x, z select x, z from P as x, R as z from P as x, R as z where x = u.Q and z = v.Q where y = u.Q and y = v.Q † [Chaudhuri and Vardi, 1993]

CS848: Topics in Databases: Foundations of Query Optimization The Query Membership Problem A database schema, denoted T, consists of a finite set { C 1, …, C n } of query containment dependencies. C is an axiom relative to database schema T = { C 1, …, C n }, written T ² C, if, for any database I, I ² C if I ² C i for each i. A query membership problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom relative to a given database schema also consisting of members of the set.

CS848: Topics in Databases: Foundations of Query Optimization More Complexity Results Theorem: The query membership problem for conjunctive QL is undecidable. Theorem: The query membership problem for first order QL is equivalent to the query containment problem for first order QL. Proof: Assignment.

CS848: Topics in Databases: Foundations of Query Optimization More on QL  Defining database schema  Expressing access plans

CS848: Topics in Databases: Foundations of Query Optimization Modeling Generalization Taxonomies Consider a simple object-oriented schema language consisting of sentences of the following form. † class C {A 1 : ref C 1, …, A m : ref C m } [isa C 1, …, C n ]; Assignment: Encode a fixed collection of such sentences as a database schema in conjunctive QL. Your encoding should be as compact as possible and should enable the following questions to be expressed as query containment dependencies over your schema. 1.Is C a defined class? 2.Is attribute A defined on class C? 3.Can an object reside in both class C 1 and class C 2 ? † Assume that any object in a database was created with respect to a single class.

CS848: Topics in Databases: Foundations of Query Optimization Modeling Pipelined Query Access Plans (syntax)(defn of  ( ¢ )) (parameter)Q ::= (PARAM as A){A} (index scan) |(from C as A, A.1 = B 1, …, A.n = B n ){A} (nested loops) |(from Q 1, Q 2 )  (Q 1 ) [  (Q 2 ) (noop) |(select A 1, …, A n Q)  (Q) Å {A 1, …, A n } (record field access) |(A 1 = A 2.B){A 1 } (comparison) |(A 1 = A 2 ) ; (catenation) |(Q 1 union all Q 2 )  (Q 1 ) Å  (Q 2 ) (cut) |(elim A 1, …, A n Q) ; |… Require: 1. (  (Q 2 ) –  (Q 2 )) µ  (Q 1 ) for nested loops, and 2.  (Q) =  (Q) for top-level queries.

CS848: Topics in Databases: Foundations of Query Optimization Alternative Semantics Require richer models theories for 1.sort operations, and 2.named cuts.