CSE 544 Theory of Query Languages Tuesday, February 22 nd, 2011 Dan Suciu -- 544, Winter 20111.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

1 Decidable Containment of Recursive Queries Diego Calvanese, Giuseppe De Giacomo, Moshe Y. Vardi presented by Axel Polleres
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 24 MAS 714 Hartmut Klauck
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
Web Science & Technologies University of Koblenz ▪ Landau, Germany Advanced Data Modeling Minimal Models Steffen Staab TexPoint fonts used in EMF. Read.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Predicate Calculus Formal Methods in Verification of Computer Systems Jeremy Johnson.
Efficient Query Evaluation on Probabilistic Databases
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
Conjunctive Queries, Datalog, and Recursion Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 23, 2003 Some slide.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
The Theory of NP-Completeness
Restricted Satisfiability (SAT) Problem
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 23 Instructor: Paul Beame.
2005conjunctive1 Query languages, equivalence & containment  conjunctive queries – CQ’s  More expressive languages.
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Deductive Databases Chapter 25
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Complexity of Classical Planning Megan Smith Lehigh University CSE 497, Spring 2007.
The Relational Model: Relational Calculus
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
CSE 544 Relational Calculus Lecture #2 January 11 th, Dan Suciu , Winter 2011.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
LDK R Logics for Data and Knowledge Representation Propositional Logic: Reasoning First version by Alessandro Agostini and Fausto Giunchiglia Second version.
CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
Querying Big Data by Accessing Small Data Wenfei FanUniversity of Edinburgh & Beihang University Floris GeertsUniversity of Antwerp Yang CaoUniversity.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
Database Management Systems Course Faculty of Computer Science Technion – Israel Institute of Technology Lecture 5: Queries in Logic.
1 Finite Model Theory Lecture 16 L  1  Summary and 0/1 Laws.
ICS 321 Fall 2011 Algebraic and Logical Query Languages (ii) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at.
CSCI 2670 Introduction to Theory of Computing December 2, 2004.
1 Finite Model Theory Lecture 5 Turing Machines and Finite Models.
1 SAT SAT: Given a Boolean function in CNF representation, is there a way to assign truth values to the variables so that the function evaluates to true?
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Extensions of Datalog Wednesday, February 13, 2001.
1 Datalog with negation Adapted from slides by Jeff Ullman.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
1 Finite Model Theory Lecture 9 Logics and Complexity Classes (cont’d)
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
1 CMPS 277 – Principles of Database Systems Lecture #8.
Containment Mappings Canonical Databases Sariaya’s Algorithm
Alternating tree Automata and Parity games
Cse 344 January 29th – Datalog.
Finite Model Theory Lecture 6
Lecture 10: Query Complexity
Logic Based Query Languages
CSE 544: Lecture 5-6 Theory Wednesday, April 10/12, 2006.
CSE 544: Lecture 8 Theory.
Datalog Inspired by the impedance mismatch in relational databases.
Conjunctive Queries, Views, Datalog Monday, 4/29/2002
CSE 544: Lecture 11 Theory Monday, May 3, 2004.
Instructor: Aaron Roth
CSE544 Wednesday, March 29, 2006.
Presentation transcript:

CSE 544 Theory of Query Languages Tuesday, February 22 nd, 2011 Dan Suciu , Winter 20111

2 Outline Conjunctive queries; containment Datalog Query complexity

3 Conjunctive Queries A subset of Relational Calculus (=FO) Correspond to SELECT-DISTINCT-FROM-WHERE Most queries in practice are conjunctive Some optimizers handle only conjunctive queries - break larger queries into many CQs CQ’s have more positive theoretical properties than arbitrary queries

4 Conjunctive Queries Definition A conjunctive query is defined by: missing are , ,   ::= R(t 1,..., t k ) | t i = t j |    ’ |  x. 

5 Conjunctive Queries, CQ Example of CQ Examples of non-CQ: q(x,y) =  z.(R(x,z)   u.(R(z,u)  R(u,y))) q(x) =  z.  u.(R(x,z)  R(z,u)  R(u,y)) q(x,y) =  z.(R(x,z)   u.(R(z,u)  R(u,y))) q(x) =  z.  u.(R(x,z)  R(z,u)  R(u,y)) q(x,y) =  z.(R(x,z)  R(y,z)) q(x) = T(x)   z.S(x,z)

6 Conjunctive Queries Any CQ query can be written as: (i.e. all quantifiers are at the beginning) Same in Datalog notation: q(x 1,...,x n ) =  y 1.  y 2...  y p.(R 1 (t 11,...,t 1m ) ...  R k (t k1,...,t km )) q(x 1,...,x n ) :- R 1 (t 11,...,t 1m ),..., R k (t k1,...,t km )) head body Datalog rule

7 Examples Employee(x), ManagedBy(x,y), Manager(y) Find all employees having the same manager as “Smith”: A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)

8 Examples Employee(x), ManagedBy(x,y), Manager(y) Find all employees having the same director as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z) CQs are useful in practice

9 CQ and SQL select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) CQ: SQL: Notice “distinct”

10 CQ and SQL Are CQ queries precisely the SELECT- DISTINCT-FROM-WHERE queries ?

11 CQ and SQL Are CQ queries precisely the SELECT- DISTINCT-FROM-WHERE queries ? No: CQ queries do not allow <, ≤, ≠ But we can extend CQ with inequality predicates, and usually write the extended language as CQ <, etc

12 CQ and RA Relational Algebra: CQ correspond precisely to  C,  A,  (missing: , –) and where C has only =  $2.name  name=“Smith” ManagedBy $1.manager=$2.manager A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)

13 Extensions of CQ CQ  A(y) :- ManagedBy(x,y), ManagedBy(z,y), x  z Find managers that manage at least 2 employees

14 Extensions of CQ CQ < A(y) :- ManagedBy(x,y), Salary(x,u), Salary(y,v), u>v Find employees earning more than their manager:

15 Extensions of CQ CQ  : negation applied only to one atom A(y) :- Office(“Alice”,u), Office(y,u), ManagedBy(“Alice”,x),  ManagedBy(x,y) Find people sharing the same office with Alice, but not the same manager:

16 Extensions of CQ UCQ A(name) :- Employee(name, dept, age, salary), age > 50 A(name) :- RetiredEmployee(name, address) Union of conjunctive queries Datalog notation is very convenient for expressing unions (no need for  ) Datalog:

17 Summary of Extensions of CQ CQ CQ ≠ CQ< UCQ CQ  Which of these classes contain only monotone queries ?

18 Query Equivalence and Containment Justified by optimization needs Intensively studied since 1977

19 Query Equivalence Queries q 1 and q 2 are equivalent if for every database D, q 1 (D) = q 2 (D). Notation: q 1  q 2

20 Query Containment Query q 1 is contained in q 2 if for every database D, q 1 (D)  q 2 (D). q 1 and q 2 are equivalent if for every database D, q 1 (D) = q 2 (D) Notation: q 1  q 2, q 1  q 2 Obviously: q 1  q 2 and q 2  q 1 iff q 1  q 2 Conversely: q 1  q 2  q 1 iff q 1  q 2 We will study the containment problem only.

Examples of Query Containments 21 q 1 (x) :- R(x,u), R(u,v), R(v,w) q 2 (x) :- R(x,u), R(u,v) q 1 (x) :- R(x,u), R(u,v), R(v,w) q 2 (x) :- R(x,u), R(u,v) Is q 1  q 2 ?

22 Examples of Query Containments Is q 1  q 2 ? q 1 (x) :- R(x,u), R(u,v), R(v,x) q 2 (x) :- R(x,u), R(u,x) q 1 (x) :- R(x,u), R(u,v), R(v,x) q 2 (x) :- R(x,u), R(u,x)

23 Examples of Query Containments Is q 1  q 2 ? q 1 (x) :- R(x,u), R(u,u) q 2 (x) :- R(x,u), R(u,v), R(v,w) q 1 (x) :- R(x,u), R(u,u) q 2 (x) :- R(x,u), R(u,v), R(v,w)

24 Examples of Query Containments Is q 1  q 2 ? q 1 (x) :- R(x,u), R(u,”Smith”) q 2 (x) :- R(x,u), R(u,v) q 1 (x) :- R(x,u), R(u,”Smith”) q 2 (x) :- R(x,u), R(u,v)

25 Query Containment Theorem Query containment for FO is undecidable Theorem Query containment for CQ is decidable and NP-complete.

Trakhtenbrot’s Theorem Dan Suciu , Winter Theorem The following problem is undecidable: Given FO sentence φ, check if φ is finitely satisfiable Theorem The following problem is undecidable: Given FO sentence φ, check if φ is finitely satisfiable Definition A sentence φ, is called finitely satisfiable if there exists a finite database instance D s.t. D |= φ Satisfiable:  x.  y.  z.(R(x,z)  R(y,z))  x.  y.T(x)   z.S(x,z) Unsatisfiable:  x.  y.  z.(R(x,y) ∧ R(x,z)  y=z) ∧  y.  x. not R(x,y)

27 Query Containment Theorem Query containment for FO is undecidable Proof: By reduction from the finite satisfiability problem: Given a sentence φ, define two queries: q1(x) = R(x), and q2(x) = R(x) ∧ φ Then q1  q2 iff φ is not finitely satisfiable

28 Query Containment Algorithm How to check q 1  q 2 Canonical database for q 1 is: D q1 = (D, R 1 D, …, R k D ) –D = all variables and constants in q 1 –R 1 D, …, R k D = the body of q 1 Canonical tuple for q 1 is: t q1 (the head of q 1 )

29 Examples of Canonical Databases Canonical database: D q1 = (D, R D ) –D={x,y,u,v} –R D = Canonical tuple: t q1 = (x,y) xu vu vy q1(x,y) :- R(x,u),R(v,u),R(v,y)

30 Examples of Canonical Databases D q1 = (D, R) –D={x,u,”Smith”,”Fred”} –R = t q1 = (x) xu u“Smith” u“Fred” uu q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u)

31 Checking Containment Example: q 1 (x,y) :- R(x,u),R(v,u),R(v,y) q 2 (x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) D={x,y,u,v} R =t q1 = (x,y) Yes, q 1  q 2 xu vu vy Theorem: q 1  q 2 iff t q1  q 2 (D q1 ).

32 Query Homomorphisms A homomorphism f : q 2  q 1 is a function f: var(q 2 )  var(q 1 )  const(q 1 ) such that: –f(body(q 2 ))  body(q 1 ) –f(t q1 ) = t q2 The Homomorphism Theorem q 1  q 2 iff there exists a homomorphism f : q 2  q 1

33 Example of Query Homeomorphism var(q 1 ) = {x, u, v, y} var(q 2 ) = {x, u, v, w, t, y} q 1 (x,y) :- R(x,u),R(v,u),R(v,y) q 2 (x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) Therefore q 1  q 2

34 Example of Query Homeomorphism var(q 1 )  const(q 1 ) = {x,u, “Smith”} var(q 2 ) = {x,u,v,w} q 1 (x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) q 2 (x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u) Therefore q 1  q 2

The Complexity 35 Theorem Checking containment of two CQ queries is NP-complete

36 Containment for extensions of CQ CQ -- NP complete CQ ≠ -- ?? CQ < -- ?? UCQ -- ?? CQ  -- ?? Relational Calculus -- undecidable

37 Query Containment for UCQ q 1  q 2  q 3 ....  q 1 ’  q 2 ’  q 3 ’ .... Notice: q 1  q 2  q 3 ....  q iff q 1  q and q 2  q and q 3  q and …. Theorem q  q 1 ’  q 2 ’  q 3 ’ .... Iff there exists some k such that q  q k ’ It follows that containment for UCQ is decidable, NP-complete.

38 Query Containment for CQ < q 1 () :- R(x,y), R(y,x) q 2 () :- R(x,y), x <= y q 1 () :- R(x,y), R(y,x) q 2 () :- R(x,y), x <= y q 1  q 2 although there is no homomorphism ! To check containment do this: -Consider all possible orderings of variables in q1 -For each of them check containment of q1 in q2 -If all hold, then q 1  q 2 Still decidable, but harder than NP: now in  p 2

39 Query Minimization Definition A conjunctive query q is minimal if for every other conjunctive query q’ s.t. q  q’, q’ has at least as many predicates (‘subgoals’) as q Are these queries minimal ? q(x) :- R(x,y), R(y,z), R(x,x) q(x) :- R(x,y), R(y,z), R(x,’Alice’)

40 Query Minimization Query minimization algorithm Choose a subgoal g of q Remove g: let q’ be the new query We already know q  q’ (why ?) If q’  q then permanently remove g Notice: the order in which we inspect subgoals doesn’t matter

41 Query Minimization In Practice No database system today performs minimization !!! Reason: –It’s hard (NP-complete) –Users don’t write non-minimal queries However, non-minimal queries arise when using views intensively

42 Query Minimization for Views CREATE VIEW HappyBoaters SELECT DISTINCT E1.name, E1.manager FROM Employee E1, Employee E2 WHERE E1.manager = E2.name and E1.boater=‘YES’ and E2.boater=‘YES’ This query is minimal

43 Query Minimization for Views SELECT DISTINCT H1.name FROM HappyBoaters H1, HappyBoaters H2 WHERE H1.manager = H2.name Now compute the Very-Happy-Boaters What happens in SQL when we run a query on a view ? This query is also minimal

44 Query Minimization for Views SELECT DISTINCT E1.name FROM Employee E1, Employee E2, Employee E3, Empolyee E4 WHERE E1.manager = E2.name and E1.boater = ‘YES’ and E2.boater = ‘YES’ and E3.manager = E4.name and E3.boater = ‘YES’ and E4.boater = ‘YES’ and E1.manager = E3.name View Expansion This query is no longer minimal ! E1 E3 E4 E2 E2 is redundant

45 Expressive Power of FO The following queries cannot be expressed in Relational Calculus (FO): Transitive closure: –  x.  y. there exists x 1,..., x n s.t. R(x,x 1 )  R(x 1,x 2 ) ...  R(x n-1,x n )  R(x n,y) Parity: the number of edges in R is even

46 Datalog Adds recursion, so we can compute transitive closure A datalog program (query) consists of several datalog rules: P 1 (t 1 ) :- body 1 P 2 (t 2 ) :- body P n (t n ) :- body n

47 Datalog Terminology: EDB = extensional database predicates –The database predicates IDB = intentional database predicates –The new predicates constructed by the program

48 Datalog Employee(x), ManagedBy(x,y), Manager(y) HMngr(x) :- Manager(x), ManagedBy(y,x), ManagedBy(z,y) Answer(x) :- HMngr(x), Employee(x) HMngr(x) :- Manager(x), ManagedBy(y,x), ManagedBy(z,y) Answer(x) :- HMngr(x), Employee(x) All higher level managers that are employees: EDBs IDBs

49 Datalog Employee(x), ManagedBy(x,y), Manager(y) Person(x) :- Manager(x) Person(x) :- Employee(x) Person(x) :- Manager(x) Person(x) :- Employee(x) All persons: Manger  Employee

50 Unfolding non-recursive rules Graph: R(x,y) P(x,y) :- R(x,u), R(u,v), R(v,y) A(x,y) :- P(x,u), P(u,y) P(x,y) :- R(x,u), R(u,v), R(v,y) A(x,y) :- P(x,u), P(u,y) Can “unfold” it into: A(x,y) :- R(x,u), R(u,v), R(v,w), R(w,m), R(m,n), R(n,y)

51 Unfolding non-recursive rules Graph: R(x,y) P(x,y) :- R(x,y) P(x,y) :- R(x,u), R(u,y) A(x,y) :- P(x,y) P(x,y) :- R(x,y) P(x,y) :- R(x,u), R(u,y) A(x,y) :- P(x,y) Now the unfolding has a union: A(x,y) :- R(x,y)   u(R(x,u)  R(u,y)) Non-recursive datalog = UCQ (why ?)

52 Recursion in Datalog Graph: R(x,y) P(x,y) :- R(x,y) P(x,y) :- P(x,u), R(u,y) P(x,y) :- R(x,y) P(x,y) :- P(x,u), R(u,y) Transitive closure: P(x,y) :- R(x,y) P(x,y) :- P(x,u), P(u,y) P(x,y) :- R(x,y) P(x,y) :- P(x,u), P(u,y) Transitive closure: Example 1: transitive closure

53 Recursion in Datalog Graph: R(x,y) Sg(x,x) :- R(x,y) Sg(y,y) :- R(x,y) Sg(x,y) :- R(x,u), Sg(u,v),R(v,y) Sg(x,x) :- R(x,y) Sg(y,y) :- R(x,y) Sg(x,y) :- R(x,u), Sg(u,v),R(v,y) Example 2: same generation

54 Recursion in Datalog Graph: And(y,z,x), Not(y,x), Value0(x), Value1(x) isZero(x) :- Value0(x) isOne(x) :- Value1(x) isZero(x) :- And(y,z,x),isZero(y) isZero(x) :- And(y,z,x),isZero(z) isOne(x) :- And(y,z,x),isOne(y),isOne(z) isZero(x) :- Not(y,x),isOne(y) isOne(x) :- Not(y,x),isZero(x) isZero(x) :- Value0(x) isOne(x) :- Value1(x) isZero(x) :- And(y,z,x),isZero(y) isZero(x) :- And(y,z,x),isZero(z) isOne(x) :- And(y,z,x),isOne(y),isOne(z) isZero(x) :- Not(y,x),isOne(y) isOne(x) :- Not(y,x),isZero(x) Example 3: value of a Boolean circuit with And/Not gates

Semantics of a Datalog Program Let: –R = the EDB predicates = some input instance D –S = the IDB predicates A datalog program P maps IDB predicate instances S to new IDB predicate instances S’: S’ = P D (S) The function P D is monotone (why ?) By Knaster-Tarski’s fixpoint Theorem, P D has a least fixpoint Definition: the meaning of P D is the least fixpoint Alternatively: compute the meaning of P D as follows: –S 0 = emptyset; S k+1 = P(S k ) –The meaning is: P D = S 0 ∪ S 1 ∪ S 2 ∪ … S n ∪ … Dan Suciu , Winter

Evaluation of Datalog Programs Naïve evaluation algorithm: –Start with S = emptyset –Evaluate P on the EDB, and on the current IDB S; new-S=P(S); note: S ⊆ new-S (why ?) –Replace S with new-S; repeat; –Stop when S = new-S –What is the complexity ? Semi-naïve evaluation algorithm: –Keep track of ΔS = new-S - S –Improve somewhat the computation of P(S) Dan Suciu , Winter What is the complexity of the naïve/ semi-naïve algorithm ?

Extensions of datalog with negation Problems with negation: S(x) :- R(x), not T(x) T(x) :- R(x), not S(x) There is no minimal fixpoint Dan Suciu , Winter

Extensions of datalog with negation Solution 1: Stratified Datalog  –Insist that the program be stratified: rules are partitioned into strata, and an IDB predicate that occurs only in strata ≤ k may be negated in strata ≥ k+1 Solution 2: Inflationary-fixpoint Datalog  –Run the naïve algorithm substituting: S = S ∪ new-S –Always terminates (why ?) Solution 3: Partial-fixpoint Datalog ,* –Run the naïve algorithm, substituting S = new-S –May not terminate (but we can detect that – how ?) Dan Suciu , Winter

59 Datalog  Graph: R(x,y) Tc(x,y) :- R(x,y) Tc(x,y) :- Tc(x,u), R(u,y) CTc(x,y) :- R(x,u),R(v,y),not Tc(x,y) Tc(x,y) :- R(x,y) Tc(x,y) :- Tc(x,u), R(u,y) CTc(x,y) :- R(x,u),R(v,y),not Tc(x,y) Example: complement transitive closure This is stratified datalog  Challenge: express CTc in inflationary datalog 

Expressive Power Dan Suciu , Winter Theorem: stratified-datalog  ⊊ inflationary-datalog  ⊊ partial-datalog 

61 Variants of Datalog without recursionwith recursion without  Non-recursive Datalog = UCQ Datalog with  Non-recursive Datalog  = FO Datalog  (three variants)

62 Non-recursive Datalog Union of Conjunctive Queries = UCQ –Containment is decidable, and NP-complete Non-recursive Datalog –Is equivalent to UCQ –Hence containment is decidable here too –Is it still NP-complete ?

63 Non-recursive Datalog A non-recursive datalog: Its unfolding as a CQ: How big is this query ? T 1 (x,y) :- R(x,u), R(u,y) T 2 (x,y) :- T 1 (x,u), T 1 (u,y)... T n (x,y) :- T n-1 (x,u), T n-1 (u,y) Answer(x,y) :- T n (x,y) T 1 (x,y) :- R(x,u), R(u,y) T 2 (x,y) :- T 1 (x,u), T 1 (u,y)... T n (x,y) :- T n-1 (x,u), T n-1 (u,y) Answer(x,y) :- T n (x,y) Anser(x,y) :- R(x,u 1 ), R(u 1, u 2 ), R(u 2, u 3 ),... R(u m, y)

64 Non-recursive Datalog A non-recursive datalog: Its unfolding as a CQ: How big is this query ? T 1 (x,y) :- R(x,u), R(u,y) T 2 (x,y) :- T 1 (x,u), T 1 (u,y)... T n (x,y) :- T n-1 (x,u), T n-1 (u,y) Answer(x,y) :- T n (x,y) T 1 (x,y) :- R(x,u), R(u,y) T 2 (x,y) :- T 1 (x,u), T 1 (u,y)... T n (x,y) :- T n-1 (x,u), T n-1 (u,y) Answer(x,y) :- T n (x,y) Anser(x,y) :- R(x,u 1 ), R(u 1, u 2 ), R(u 2, u 3 ),... R(u m, y) Although non-recursive-datalog = UCQ, the former is exponentially more concise

65 Query Complexity Given a query  in FO And given a model D = (D, R 1 D, …, R k D ) What is the complexity of computing the answer  (D)

66 Query Complexity Vardi’s taxonomy: Data Complexity: Fix . Compute  (D) as a function of |D| Query Complexity: Fix D. Compute  (D) as a function of |  | Combined Complexity: Compute  (D) as a function of |D| and |  | Which is the most important in databases ?

67 Example  (x)   u.(R(u,x)   y.(  v.S(y,v)   R(x,y))) R = S = How do we proceed ?

68 General Evaluation Algorithm for every subexpression  i of , (i = 1, …, m) compute the answer to  i as a table T i (x 1, …, x n ) return T m for every subexpression  i of , (i = 1, …, m) compute the answer to  i as a table T i (x 1, …, x n ) return T m Theorem. If  has k variables then one can compute  (D) in time O(|  |*|D| k ) Data Complexity = O(|D| k ) = in PTIME Query Complexity = O(|  |*c k ) = in EXPTIME

69 General Evaluation Algorithm Example:  1 (u,x)  R(u,x)  2 (y,v)  S(y,v)  3 (x,y)   R(x,y)  4 (y)   v.  2 (y,v)  5 (x,y)   4 (y)   3 (x,y)  6 (x)   y.  5 (x,y)  7 (u,x)   1 (u,x)   6 (x)  8 (x)   u.  7 (u,x)   (x)  (x)   u.(R(u,x)   y.(  v.S(y,v)   R(x,y)))

70 Complexity Theorem. If  has k variables then one can compute  (D) in time O(|  |*|D| k ) Remark. The number of variables matters !

71 Paying Attention to Variables Compute all chains of length m We used m+1 variables Can you rewrite it with fewer variables ? Chain m (x,y) :- R(x,u 1 ), R(u 1, u 2 ), R(u 2, u 3 ),... R(u m-1, y)

72 Counting Variables FO k = FO restricted to variables x 1,…, x k Write Chain m in FO 3 : Chain m (x,y) :-  u.R(x,u)  (  x.R(u, x)  (  u.R(x,u)…  (  u. R(u, y)…))

73 Query Complexity Note: it suffices to investigate boolean queries only –If non-boolean, do this: for a 1 in D, …, a k in D if (a 1, …, a k ) in  (D) /* this is a boolean query */ then output (a 1, …, a k ) for a 1 in D, …, a k in D if (a 1, …, a k ) in  (D) /* this is a boolean query */ then output (a 1, …, a k )

74 Computational Complexity Classes Recall computational complexity classes: AC 0 LOGSPACE NLOGSPACE PTIME NP PSPACE EXPTIME EXPSPACE (Kalmar) Elementary Functions Turing Computable functions

75 Data Complexity of Query Languages AC 0 PTIME PSPACE FO = non-rec datalog  FO(LFP) = datalog  FO(PFP) = datalog ,* The more complex a QL, the harder it is to optimize