CSE 544 Relational Calculus Lecture #2 January 11 th, 2011 1Dan Suciu -- 544, Winter 2011.

Slides:



Advertisements
Similar presentations
Relational Calculus and Datalog
Advertisements

CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
1 541: Relational Calculus. 2 Relational Calculus  Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).  Calculus.
Predicate Calculus Formal Methods in Verification of Computer Systems Jeremy Johnson.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Efficient Query Evaluation on Probabilistic Databases
Computability and Complexity 9-1 Computability and Complexity Andrei Bulatov Logic Reminder (Cnt’d)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
Relational Calculus. Another Theoretical QL-Relational Calculus n Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus.
Relational Calculus CS 186, Spring 2007, Lecture 6 R&G, Chapter 4 Mary Roth   We will occasionally use this arrow notation unless there is danger of.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Relational Calculus CS 186, Fall 2003, Lecture 6 R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion.
Rutgers University Relational Calculus 198:541 Rutgers University.
CSE 544 Theory of Query Languages Tuesday, February 22 nd, 2011 Dan Suciu , Winter
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
The Relational Model: Relational Calculus
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
1 CS 430 Database Theory Winter 2005 Lecture 6: Relational Calculus.
1st-order Predicate Logic (FOL)
CSE 311 Foundations of Computing I Lecture 8 Proofs and Set Theory Spring
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4.
Computing & Information Sciences Kansas State University Thursday, 08 Feb 2007CIS 560: Database System Concepts Lecture 11 of 42 Thursday, 08 February.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
Relational Calculus R&G, Chapter 4. Relational Calculus Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus (DRC). Calculus.
Relational Calculus CS 186, Spring 2005, Lecture 9 R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
Relational Algebra.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Database Systems Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module A: Formal Relational.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
1 First order theories (Chapter 1, Sections 1.4 – 1.5) From the slides for the book “Decision procedures” by D.Kroening and O.Strichman.
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
Database Management Systems Course Faculty of Computer Science Technion – Israel Institute of Technology Lecture 5: Queries in Logic.
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L5_Relational Calculus 1 Relational Calculus Chapter 4 – Part B.
1 Finite Model Theory Lecture 16 L  1  Summary and 0/1 Laws.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
1 Finite Model Theory Lecture 5 Turing Machines and Finite Models.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Extensions of Datalog Wednesday, February 13, 2001.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
Relational Calculus Chapter 4, Section 4.3.
Basic Operations Algebra of Bags
Relational Calculus Chapter 4, Part B
Goal for this lecture Demonstrate how we can prove that one query language is more expressive than (i.e., “contained in” as described in the book) another.
CS 186, Spring 2007, Lecture 6 R&G, Chapter 4 Mary Roth
Motivation for Datalog
1st-order Predicate Logic (FOL)
Lecture 10: Query Complexity
CS 186, Fall 2002, Lecture 8 R&G, Chapter 4
Logic Based Query Languages
Chapter 6: Formal Relational Query Languages
CSE 544: Lecture 8 Theory.
Datalog Inspired by the impedance mismatch in relational databases.
CS 186, Spring 2007, Lecture 6 R&G, Chapter 4 Mary Roth
Relational Algebra & Calculus
CSE544 Wednesday, March 29, 2006.
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
1st-order Predicate Logic (FOL)
Presentation transcript:

CSE 544 Relational Calculus Lecture #2 January 11 th, Dan Suciu , Winter 2011

Announcements HW1 is posted First paper review due next week Friday: project groups are due Dan Suciu , Winter 20112

Announcements Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases Need to makeup one lecture this/next week. When ? –Friday 1/14, 12:30-2pm ? –Wednesday, 1/19, morning (9…12) ? –Friday, 1/21, late morning (11...1pm) ? Dan Suciu , Winter 20113

Today’s Agenda Relational calculus: Domain Relational Calculus Non-recursive datalog (a reasonable abstraction of SQL) Relational algebra 4Dan Suciu , Winter 2011 They are equivalent and why we care

Domain Relational Calculus Given: A vocabulary: R 1, …, R k An arity, ar(R i ), for each i=1,…,k An infinite supply of variables x 1, x 2, x 3, … Constants: c 1, c 2, c 3,... Dan Suciu , Winter 20115

Domain Relational Calculus Dan Suciu , Winter t ::= x | a /* variable or constant */  ::= R(t 1,..., t k ) | t i = t j |    ’ |    ’ |  ’ |  x.  |  x.  t ::= x | a /* variable or constant */  ::= R(t 1,..., t k ) | t i = t j |    ’ |    ’ |  ’ |  x.  |  x.  Terms t and Formulas  are: A query Q is any formula , usually written as: Q = { x |  } where x = (x 1,x 2,…) are the free variables in .

Example: Querying a Graph R encodes a graph R= {x |  y.R(x,y)} {x |  y.  z.  u.(R(x,y)  R(y,z)  R(z,u)) {(x,y) |  z.(R(x,z)  R(y,z)} What do these queries return ?

Example: Querying a Graph R= {x |  y.R(x,y)} {x |  y.  z.  u.(R(x,y)  R(y,z)  R(z,u)) {(x,y) |  z.(R(x,z)  R(y,z)} Nodes that have at least one child: {1,2,3} Nodes that have a great-grand-child: {1,2} Every child of x is a child of y: {(1,1),(2,2),(3,1),(3,3),(4,1), (4,2), (4,3)} R encodes a graphWhat do these queries return ?

Semantics of a Formula A database instance (or a model) is D = (D, R 1 D, …, R k D ) –D = a set, called domain, or universe –R i D  D  D ...  D, (ar(R i ) times) i = 1,...,k A valuation is a function s : {x 1, x 2,...}  D The semantics of a formula  says when D ⊨  [s] holds Dan Suciu , Winter The domain D is not part of the database, yet we need it to define the semantics. Where ?

Semantics of D ⊨  [s] Dan Suciu , Winter D ⊨ (R(t 1,..., t n )) [s] D ⊨ (t= t’) [s] If (s(t 1 ),..., s(t n ))  R D If s(t) = s(t’) D ⊨ (    ’) [s] D ⊨ (    ’) [s] D ⊨ (  ’) [s] If D ⊨  [s] and D ⊨ (  ’) [s] If D ⊨  [s] or D ⊨  ’[s] If it is not the case D ⊨  [s]

Semantics of D ⊨  [s] Dan Suciu , Winter D ⊨ (  x.  ) [s] D ⊨ (  x.  ) [s] If for every constant a in D, D ⊨  [s a/x ] If for some constant a in D, D ⊨  [s a/x ] Notation: s a/x = the valuation s modified to map x to a

Semantics of D ⊨  [s] Dan Suciu , Winter D ⊨ (  x.  ) [s] D ⊨ (  x.  ) [s] If for every constant a in D, D ⊨  [s a/x ] If for some constant a in D, D ⊨  [s a/x ] Notation: s a/x = the valuation s modified to map x to a Note: here we need the domain D

Semantics of a Query Dan Suciu , Winter The semantics of a query Q = { x |  } is Q(D) = { s(x) | for all s such that D ⊨  [s] }

The drinkers-bars-beers example 14 Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) x:  y.  z. Frequents(x, y)  Serves(y,z)  Likes(x,z) What does this query compute ?

The drinkers-bars-beers example 15 Find drinkers that frequent some bar that serves some beer they like. Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) x:  y.  z. Frequents(x, y)  Serves(y,z)  Likes(x,z) What does this query compute ?

The drinkers-bars-beers example Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) x:  y. Frequents(x, y)  (  z. Serves(y,z)  Likes(x,z)) What does this query compute ?

The drinkers-bars-beers example Find drinkers that frequent only bars that serves some beer they like. Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) x:  y. Frequents(x, y)  (  z. Serves(y,z)  Likes(x,z)) What does this query compute ?

The drinkers-bars-beers example Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) x:  y. Frequents(x, y)  z.(Serves(y,z)  Likes(x,z)) What does this query compute ?

The drinkers-bars-beers example Find drinkers that frequent some bar that serves only beers they like. Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) x:  y. Frequents(x, y)  z.(Serves(y,z)  Likes(x,z)) What does this query compute ?

The drinkers-bars-beers example Dan Suciu , Winter Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) x:  y.Frequents(x, y)   z.(Serves(y,z)  Likes(x,z)) What does this query compute ?

The drinkers-bars-beers example Dan Suciu , Winter Find drinkers that frequent only bars that serves only beer they like. Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) What does this query compute ? x:  y.Frequents(x, y)   z.(Serves(y,z)  Likes(x,z))

Summary of Relational Calculus Same as First Order Logic See book for the two variants: –Domain relational calculus (what we discussed) –Tuple relational calculus This is a powerful, concise language Dan Suciu , Winter

Datalog Rule L i = a literal, or atom, or subgoal; can be one of: R(x 1, x 2, …) = relational atom u = v, or u ≠ v, where u, v = variables or constants S = a new symbol x = (x 1, x 2, …) are head variables Dan Suciu , Winter Q(x ) :- L 1, L 2, …, L k Head Body Q(x) :- Likes(x,y),Likes(x,z),y≠z Example:

Semantics of Datalog Rule Dan Suciu , Winter Q = {x |  y 1.  y 2 … L 1  L 2  …  L k } Q(x ) :- L 1, L 2, …, L k y 1, y 2,… are all non-head variables Q(D) :- { s(x) | s = valuation s.t. D ⊨ L 1 [s],…,D ⊨ L k [s] } The semantics is: Note: a datalog rule is also called a conjunctive query

Non-Recursive Datalog Dan Suciu , Winter S 1 (x 1 ) :- body 1 S 2 (x 2 ) :- body 2 S 3 (x 3 ) :- body 3... S 1 (x 1 ) :- body 1 S 2 (x 2 ) :- body 2 S 3 (x 3 ) :- body 3... Each symbol S k may appear in body k+1, body k+2, …, but may not appear in body 1,…,body k. Example: A(x,y) :- R(x,z),R(z,y) B(x,y) :- A(x,z),A(z,y) C(x,y) :- B(x,z),B(z,y) Q(x,y) :- C(x,z),C(z,y) A(x,y) :- R(x,z),R(z,y) B(x,y) :- A(x,z),A(z,y) C(x,y) :- B(x,z),B(z,y) Q(x,y) :- C(x,z),C(z,y) What does Q compute ?

Non-Recursive Datalog Dan Suciu , Winter S 1 (x 1 ) :- body 1 S 2 (x 2 ) :- body 2 S 3 (x 3 ) :- body 3... S 1 (x 1 ) :- body 1 S 2 (x 2 ) :- body 2 S 3 (x 3 ) :- body 3... Each symbol S k may appear in body k+1, body k+2, …, but may not appear in body 1,…,body k. Example: A(x,y) :- R(x,z),R(z,y) B(x,y) :- A(x,z),A(z,y) C(x,y) :- B(x,z),B(z,y) Q(x,y) :- C(x,z),C(z,y) A(x,y) :- R(x,z),R(z,y) B(x,y) :- A(x,z),A(z,y) C(x,y) :- B(x,z),B(z,y) Q(x,y) :- C(x,z),C(z,y) What does Q compute ? A: all pairs (x,y) connected by a path of length 16.

Comparison Dan Suciu , Winter Is non-recursive datalog equivalent to the Relational Calculus ?

Comparison Dan Suciu , Winter Is non-recursive datalog equivalent to the Relational Calculus ? What about writing this query in non-recursive datalog ? Q = {x |  y. Frequents(x,y)  Serves(y,’Bud’)   z. Frequents(x,z)  Serves(z,’Miller’) }

Comparison Dan Suciu , Winter Is non-recursive datalog equivalent to the Relational Calculus ? What about writing this query in non-recursive datalog ? Q(x) :- Frequents(x,y), Serves(y,’Bud’) Q(x) :- Frequents(x,z), Serves(z,’Miller’) Q(x) :- Frequents(x,y), Serves(y,’Bud’) Q(x) :- Frequents(x,z), Serves(z,’Miller’) Q = {x |  y. Frequents(x,y)  Serves(y,’Bud’)   z. Frequents(x,z)  Serves(z,’Miller’) }

Comparison Dan Suciu , Winter Is non-recursive datalog equivalent to the Relational Calculus ? Q = {x |  y. Frequents(x,y)   z.Likes(x,z)  Serves(y,z) Now what about writing this query in non-recursive datalog ?

Comparison Dan Suciu , Winter Is non-recursive datalog equivalent to the Relational Calculus ? Now what about writing this query in non-recursive datalog ? Impossible ! Proof on next slides. Q = {x |  y. Frequents(x,y)   z.Likes(x,z)  Serves(y,z)

Monotone Queries Given two database instances D = (D, R 1, …, R k ) and D’ = (D’, R’ 1, …, R’ k ) we write D ⊆ D’ if D ⊆ D’, R 1 ⊆ R’ 1,…, R k ⊆ R’ k. In other words, D’ is obtained by adding tuples to D. We say that Q is monotone if whenever D ⊆ D’, Q(D) ⊆ Q(D’). Dan Suciu , Winter

Monotone Queries Dan Suciu , Winter Fact. Every datalog rule is monotone. Why ? Fact. Every datalog program is monotone. Why ? Fact. This query is not monotone: Q = {x |  y. Likes(x,y)   z.Frequents(x,z)  Serves(z,y) Why ?

Monotone Queries Dan Suciu , Winter Fact. Every datalog rule is monotone. Why ? Fact. Every datalog program is monotone. Why ? Fact. This query is not monotone: Q = {x |  y. Likes(x,y)   z.Frequents(x,z)  Serves(z,y) Recall the semantics: Q(D) = { s(x) | D ⊨ L 1 [s],…,D ⊨ L k [s] } If D ⊨ L i [s] then D’ ⊨ L i [s] hence Q(D) ⊆ Q(D’). Why ?

Monotone Queries Dan Suciu , Winter Fact. Every datalog rule is monotone. Why ? Fact. Every datalog program is monotone. Why ? Fact. This query is not monotone: Q = {x |  y. Likes(x,y)   z.Frequents(x,z)  Serves(z,y) Recall the semantics: Q(D) = { s(x) | D ⊨ L 1 [s],…,D ⊨ L k [s] } If D ⊨ L i [s] then D’ ⊨ L i [s] hence Q(D) ⊆ Q(D’). By induction on the number of rules Why ?

Monotone Queries Dan Suciu , Winter Fact. Every datalog rule is monotone. Why ? Fact. Every datalog program is monotone. Why ? Fact. This query is not monotone: Q = {x |  y. Likes(x,y)   z.Frequents(x,z)  Serves(z,y) Recall the semantics: Q(D) = { s(x) | D ⊨ L 1 [s],…,D ⊨ L k [s] } If D ⊨ L i [s] then D’ ⊨ L i [s] hence Q(D) ⊆ Q(D’). By induction on the number of rules By adding a tuple to Frequents we may remove some answer Why ?

Monotone Queries Dan Suciu , Winter t ::= x | a /* variable or constant */  ::= R(t 1,..., t k ) | t i = t j |    ’ |    ’ |  ’ |  x.  |  x.  t ::= x | a /* variable or constant */  ::= R(t 1,..., t k ) | t i = t j |    ’ |    ’ |  ’ |  x.  |  x.  What fragment of the relational calculus is monotone ? Recall the relational calculus:

Monotone Queries Dan Suciu , Winter t ::= x | a /* variable or constant */  ::= R(t 1,..., t k ) | t i = t j |    ’ |    ’ |  ’ |  x.  |  x.  t ::= x | a /* variable or constant */  ::= R(t 1,..., t k ) | t i = t j |    ’ |    ’ |  ’ |  x.  |  x.  What fragment of the relational calculus is monotone ? t ::= x | a /* variable or constant */  ::= R(t 1,..., t k ) | t i = t j |    ’ |    ’ |  x.  t ::= x | a /* variable or constant */  ::= R(t 1,..., t k ) | t i = t j |    ’ |    ’ |  x.  Recall the relational calculus:

Non-Recursive Datalog  L i = may also be  R(x1, x2, …) R(x1, x2, …) = relational atom u = v, or u ≠ v, where u, v = variables or constants Dan Suciu , Winter Q(x ) :- L 1, L 2, …, L k

Non-Recursive Datalog  Dan Suciu , Winter Q = {x |  y. Likes(x,y)   z.Frequents(x,z)  Serves(z,y) Example:

Non-Recursive Datalog  Dan Suciu , Winter FS(x,y) :- Frequents(x,z),Serves(z,y) Q(x) :- Likes(x,y),  FS(x,y) FS(x,y) :- Frequents(x,z),Serves(z,y) Q(x) :- Likes(x,y),  FS(x,y) Q = {x |  y. Likes(x,y)   z.Frequents(x,z)  Serves(z,y) Example:

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter Theorem. Every query in the Relational Calculus can be translated to non-recursive Datalog 

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter Proof First remove  by using De Morgan:  x.  =  x.  Then, inductively, for each subformula  create a datalog rule F(x):-body that computes Q = { x |  }

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter R(t 1,..., t k ) F(x) :- R(t 1,..., t k )

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter ? ? ? 1  2 1  2 1  2 1  2 F 1 (x 1 ) :- … /* for  1 */ F 2 (x 2 ) :- … /* for  2 */

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter F(x) :- F 1 (x 1 ),F 2 (x 2 ) 1  2 1  2 1  2 1  2 F 1 (x 1 ) :- … /* for  1 */ F 2 (x 2 ) :- … /* for  2 */

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter 1  2 1  2 1  2 1  2 ? ? ? F 1 (x 1 ) :- … /* for  1 */ F 2 (x 2 ) :- … /* for  2 */

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter 1  2 1  2 1  2 1  2 F(x) :- F 1 (x 1 ) F(x) :- F 2 (x 2 ) F(x) :- F 1 (x 1 ) F(x) :- F 2 (x 2 ) F 1 (x 1 ) :- … /* for  1 */ F 2 (x 2 ) :- … /* for  2 */

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter  x.  1 ? ? ? F 1 (x) :- … /* for  1 */

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter  x.  1 F(y) :- F 1 (x) F 1 (x) :- … /* for  1 */ where y = x – {x}

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter  1 F(x) :-  F 1 (x) F 1 (x) :- … /* for  1 */ End of Proof

Unsafe Queries Dan Suciu , Winter Q(x) :- x=y Q(x) :-  R(x,y) What’s wrong ?

Unsafe Queries Dan Suciu , Winter Definition. A datalog  rule S :- L 1, …, L k is called safe, if every variable x appears in at least one positive relational atom. Q(x) :- T(x), S(y), x=y Q(x) :- T(x), S(y),  R(x,y) Examples We require that all rules of a non-recursive datalog  program to be safe.

Unsafe Queries Dan Suciu , Winter But what about Relational Calculus ? What does it mean for a query Q to be safe ? Q = {x |  y. Frequents(x,y)} Q = {(x,y) | Faculty(x)  Student(y)} Are these queries safe ?

Unsafe Queries Dan Suciu , Winter Definition. A query Q is domain independent, if for any two database instances D, D’, such that they have the same relations R1, …,Rk but possibly different domains D, D’, Q(D) = Q(D’). A safe datalog  rule is domain independent (obvious). Does the converse hold ? Note: domain independent queries are also called safe queries, somewhat confusingly.

Unsafe Queries Dan Suciu , Winter Q = {x |  y. Frequents(x,y)} Q = {(x,y) | Faculty(x)  Student(y)} Explain why these queries are not domain independent:

Unsafe Queries 57 Theorem. The problem: “given a relational query Q decide whether Q is safe” is undecidable. Bummer !... Why doesn’t the following work ? Translate Q to a datalog  program, then check that each rule is safe. Is this a decision procedure for domain independence ?

Active Domain Semantics Dan Suciu , Winter Definition. The active domain ADom of a database instance D = (D, R 1 D, …, R k D ), is the set of all constants in R 1 D, …, R k D ADom(x) :- Frequents(x,y) ADom(y) :- Frequents(x,y) ADom(x) :- Likes(x,y) ADom(y) :- Likes(x,y) ADom(x) :- Serves(x,y) ADom(y) :- Serves(x,y) ADom(x) :- Frequents(x,y) ADom(y) :- Frequents(x,y) ADom(x) :- Likes(x,y) ADom(y) :- Likes(x,y) ADom(x) :- Serves(x,y) ADom(y) :- Serves(x,y) The active domain can be computed by a boring relational query:

Active Domain Semantics Dan Suciu , Winter D ⊨ (  x.  ) [s] D ⊨ (  x.  ) [s] If for every constant a in ADom, D ⊨  [s a/x ] If for some constant a in ADom, D ⊨  [s a/x ] Make the following two changes to standard semantics:

Active Domain Semantics Dan Suciu , Winter Theorem. If Q is domain-independent, then its standard semantics coincides with the active domain semantics Why ?

Active Domain Semantics Dan Suciu , Winter Theorem. If Q is domain-independent, then its standard semantics coincides with the active domain semantics Why ? Let D = (D, R 1 D, …, R k D ), be a database instance. Denote D’ = (ADom, R 1 D, …, R k D ) Since Q is d.i. we have Q(D) = Q(D’) Q(D’) is the same as the active-domain semantics on D

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter  1 F(x) :-  F 1 (x) This rule is unsafe ! How do we make it safe ? If Q is domain independent, then we want to translate it into a safe non-rec. datalog  program

Rel. Calculus  non-rec. Datalog  Dan Suciu , Winter  1 F(x) :- Adom(x 1 ), Adom(x 2 ),…, Adom(x k ),  F 1 (x) Where x = (x 1, x 2, … x k ) If Q is domain independent, then we want to translate it into a safe non-rec. datalog  program

Discussion Non-recursive datalog  is a simple, friendly, very popular notation for queries Naturally compositional: sequence of rules Exposes a hierarchy of queries –Single rule datalog = conjunctive queries –Non-rec. datalog program = monotone-RC –Non-rec. datalog  program = RC No quantifiers ! Because all are  Dan Suciu , Winter

Why We Care Dan Suciu , Winter Q(x) :- R(..), S(..), T(..) Subqueries in the FROM clause SQL is non-recursive datalog  ! SELECT x FROM R, S, T WHERE … Q1(x) :- … Q2(y) :- … Q(z) :- Q1(x), Q2(y) Q(x) :- R(..), S(..),  T(..) Subquery in the WHERE clause

Why We Care When we use a database system we need to write some complex queries –The Masochists: Find drinkers that frequent only bars that server no beer that they like Write it first in Relational Calculus –Easy exercise Translate it to Non-recursive datalog  –This is subtle, but we have seen it in detail Translate Non-recursive datalog  to SQL –Easy exercise Dan Suciu , Winter

67 Relational Algebra The Basic Five operators: Union:  Difference: - Selection:  Projection:  Join: Plus: cartesian product ×, renaming ρ Dan Suciu -- CSEP544 Fall 2010 We will study RA in detail when we discuss Query Execution

Complex RA Expressions Dan Suciu -- CSEP544 Fall Person x Purchase y Person z Product u  name=fred  name=gizmo  pid  ssn y.seller-ssn=z.ssny.pid=u.pidx.ssn=y.buyer-ssn  u.name

Equivalence Theorem 69 Theorem. Every domain independent query in RC can be translated to safe non-recursive Datalog  Done that. Theorem. Every safe non-recursive Datalog  can be translated to Relational Algebra. Easy exercise – to do at home Theorem. Relational Algebra expressions can be translated to a domain independent RC query Easy exercise – to do at home

Discussion What is the monotone fragment of RA ? – What are the safe queries in RA ? – Where do we use RA (applications) ? – Dan Suciu , Winter

Discussion What is the monotone fragment of RA ? –No difference: , , , What are the safe queries in RA ? –All RA queries are safe Where do we use RA (applications) ? –Translating SQL (WHAT  HOW) –Some new query languages (Pig-Latin) Dan Suciu , Winter