CSE 544: Lecture 8 Theory.

Slides:



Advertisements
Similar presentations
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
Advertisements

2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Part B.
The Relational Calculus
Lecture 07: Relational Algebra
1 541: Relational Calculus. 2 Relational Calculus  Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).  Calculus.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
2005conjunctive1 Query languages, equivalence & containment  conjunctive queries – CQ’s  More expressive languages.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
1 Lecture 07: Relational Algebra. 2 Outline Relational Algebra (Section 6.1)
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Rutgers University Relational Calculus 198:541 Rutgers University.
CSE 544 Theory of Query Languages Tuesday, February 22 nd, 2011 Dan Suciu , Winter
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
The Relational Model: Relational Calculus
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
CSE 544 Relational Calculus Lecture #2 January 11 th, Dan Suciu , Winter 2011.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
Relational Algebra 2. Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declartive query language.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
1 First order theories (Chapter 1, Sections 1.4 – 1.5) From the slides for the book “Decision procedures” by D.Kroening and O.Strichman.
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
1 Finite Model Theory Lecture 16 L  1  Summary and 0/1 Laws.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
1 Finite Model Theory Lecture 5 Turing Machines and Finite Models.
Extensions of Datalog Wednesday, February 13, 2001.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Predicate Calculus CS 270 Math Foundations of Computer Science Jeremy Johnson Presentation uses material from Huth and Ryan, Logic in Computer Science:
1 Finite Model Theory Lecture 9 Logics and Complexity Classes (cont’d)
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
1 CMPS 277 – Principles of Database Systems Lecture #8.
Relational Calculus Chapter 4, Section 4.3.
Relational Algebra & Calculus
CSE202 Database Management Systems
Relational Algebra.
Quick Course Overview Quick review of logic and computational problems
Finite Model Theory Lecture 8
Relational Calculus Chapter 4, Part B
The Relational Algebra and Relational Calculus
NP-Completeness Yin Tat Lee
Logics for Data and Knowledge Representation
Database Applications (15-415) Relational Calculus Lecture 6, September 6, 2016 Mohammad Hammoud.
Alternating tree Automata and Parity games
Cse 344 January 29th – Datalog.
Finite Model Theory Lecture 6
Lecture 10: Query Complexity
Semantics of FO Formulas
Lecture 33: The Relational Model 2
Relational Algebra Friday, 11/14/2003.
CS 186, Fall 2002, Lecture 8 R&G, Chapter 4
Logic Based Query Languages
CS639: Data Management for Data Science
CSE 544: Lecture 5-6 Theory Wednesday, April 10/12, 2006.
Datalog Inspired by the impedance mismatch in relational databases.
Conjunctive Queries, Views, Datalog Monday, 4/29/2002
Finite Model Theory Lecture 7
CSE 544: Lecture 11 Theory Monday, May 3, 2004.
Relational Algebra & Calculus
Properties of Relational Logic
Relational Calculus Chapter 4, Part B 7/1/2019.
CS589 Principles of DB Systems Fall 2008 Lecture 4e: Logic (Model-theoretic view of a DB) Lois Delcambre
CSE544 Wednesday, March 29, 2006.
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
Relational Calculus Chapter 4, Part B
Lecture 23 NP-Hard Problems
Presentation transcript:

CSE 544: Lecture 8 Theory

Review of Relational Algebra Five basic RA operators: , -, s, p, 

Find all employees with salary more than $40,000. s Salary > 40000 (Employee)

P SSN, Name (Employee)

LastName, SocSocNo (Employee) Renaming Example Employee Name SSN John 999999999 Tony 777777777 LastName, SocSocNo (Employee) LastName SocSocNo John 999999999 Tony 777777777

Natural Join Example Employee Name SSN John 999999999 Tony 777777777 Dependents SSN Dname 999999999 Emily 777777777 Joe Employee Dependents = PName, SSN, Dname(s SSN=SSN2(Employee  rSSN2, Dname(Dependents)) Name SSN Dname John 999999999 Emily Tony 777777777 Joe

Natural Join R= S= R S= A B X Y Z V B C Z U V W A B C

Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? Given R(A, B, C), S(D, E), what is R S ? Given R(A, B), S(A, B), what is R S ?

Today’s Outline First Order Logic as a Query Language Reading assignment: 4.3 Query languages and Complexity Classes Conjunctive queries

First Order Logic: Syntax Given: A vocabulary: R1, …, Rk An arity, ar(Ri), for each i=1,…,k An infinite supply of variables x1, x2, x3, … Constants: c1, c2, c3, ... FO formulas, , are:  ::= R(t1, ..., tar(R)) | ti = tj |   ’ |   ’ | ’ x. | x. t ::= x | c

Examples of Formulas Most interesting case: Vocabulary = one binary relation R (encodes a graph) 1 4 2 3 1 2 3 4 R=

Examples of Closed Formulas Does there exists a loop in the graph ? Are there paths of length >2 ? Is there a “sink” node ?   x.R(x,x)   x.y.z.u.(R(x,y)  R(y,z)  R(z,u))   x.y.R(x,y)

Examples of Closed Formulas Is there a clique of size 4 ? Here x1x2 stands for x1=x2, etc. Is the graph transitively closed ? Here A  B stands for A  B   x1. x2. x3. x4. (x1x2  x1x3  ...  x3x4  R(x1,x2)  R(x1,x3)  ...  R(x3,x4))   x.y.z.(R(x,y)  R(y,z)  R(x,z))

Examples of Open Formulas Find all nodes connected by a path of length 2: Find all nodes without outgoing edges: (x,y)  u.(R(x,u)  R(u,y)) (x)  y.R(x,y)

More Examples Vocabulary (= schema): Queries: Employee(name, office, mgr) Manager(name, office) Queries: Find offices: Find offices with at least two employees: Find managers that share office with all their employees: [to do in class] (y)  (x.z.Employee(x,y,z)  x.Manager(x,y)) (y)  x.z.x’.z’.(Employee(x,y,z)  Employee(x’,y,z’)  xx’)

First Order Logic: Semantics Given a vocabulary R1, …, Rk A model is D = (D, R1D, …, RkD) D = a set, called domain, or universe RiD  D  D  ...  D, (ar(Ri) times) i = 1,...,k

First Order Logic: Semantics Given: A model D = (D, R1D, ..., RkD) A formula  A substitution s : {x1, x2, ...}  D We define next the relation: meaning “D satisfies with s” D |= [s]

First Order Logic: Semantics D |= (R(t1, ..., tn)) [s] If (s(t1), ..., s(tn))  RD D |= (t= t’) [s] If s(t) = s(t’)

First Order Logic: Semantics D |= (  ’) [s] If D |= ()[s] and D |= (’) [s] D |= (  ’) [s] If D |= ()[s] or D |= (’) [s] D |= (’) [s] If not D |= ()[s]

First Order Logic: Semantics D |= (x.) [s] If for all s’ s.t. s(y) = s’(y) for all variables y other than x, D |= ()[s’] D |= (x.) [s] If for some s’ s.t. s(y) = s’(y) for all variables y other than x, D |= ()[s’]

FO and Databases FO Databases Vocabulary: R1, ..., Rn Database schema: Model: D = (D, R1D, …, RkD) Database instance: D = (D, R1D, …, RkD) Formulas are true or false Formulas compute queries

FO and Databases FO: a closed formula  is true in D if D |=  Databases: a formula  with free variables x1, ..., xn defines the query: (D) = {(s(x1), ..., s(xn)) | D |= [s]}

FO and Databases The Relational Calculus The Tuple Calculus The query language consisting of FO The Tuple Calculus A minor variation on the relational calculus Uses tuple variables instead of atomic variables Reading assignment 4.3 But some “queries” in these languages make no sense Define safe queries next

Safe Queries A model D = (D, R1D, …, RkD) In FO: In databases: both D and R1D, …, RkD may be infinite In databases: D may infinite (int, string, etc) R1D, …, RkD are always finite We call this a finite model

Safe Queries  is a finite query if for every finite model D, (D) is finite  is a domain independent query if for every two finite models D, D’ having the same relations: D = (D, R1D, …, RkD), D’ = (D’, R1D, …, RkD) we have (D) = (D’) Domain independent query aka safe query Notice: book has different but equivalent definition

Unsafe Relational Queries Find all nodes that are not in the graph: Find all nodes that are connected to “everything”: Find all pairs of employees or offices: We don’t want such queries ! Finite, but not safe

Safe Queries Definition. Given D = (D, R1D, …, RkD), the active domain is Da = the set of all constants in R1D, …, RkD Example. Given a graph D = (D, R) Da = { x | y.R(x,y)  z.R(z,x)} Property. If a query is safe, it suffices to range quantifiers only over the active domain (why ?) Hence we can compute safe queries

Safe Queries The safe relational calculus consists only of safe queries. However: Theorem It is undecidable if a given a FO query is safe. Work around: Define a subset of FO queries that we know are safe = range restricted queries (rr-query) Any safe query is equivalent to some rr-query

Range-restriction A syntactic condition on queries See [AHU] for a definition. The intuition: Range Restricted Queries Non-Range Restricted S(x) R(x,x) R(x,x) S(x)  T(x) S(x)  T(y) x.(S(x) R(x,x)) x.(R(x,x)) x.(S(x) R(x,x)) x.(R(x,x))

Range-restriction Theorem. Every safe query is equivalent to a range-restricted query “Proof”. Translate as follows: R(x, y, ...)  x  Da  y  Da  ....  R(x, y, ...) x.  x.(x  Da  ) x.  x.(x  Da  ) From now on we assume that all queries are safe

FO = Relational Algebra Recall the 5 operators in the relational algebra: , -, , s, P Theorem. A query can be defined in the safe relational calculus iff it can be defined in the relational algebra

FO = Relational Algebra Proof [in class]

Limited Expressive Power Vocabulary: binary relation R The following queries cannot be expressed in FO: Transitive closure: x.y. there exists x1, ..., xn s.t. R(x,x1)  R(x1,x2)  ...  R(xn-1,xn)  R(xn,y) Parity: the number of edges in R is even

Extensions of FO FO(LFP) = FO extended with least fixpoint: Example: define transitive closure like: T(x,y) = R(x,y)  z.(R(x,z)  T(z,y)) Meaning. Define: T0 :=  Tn+1 = {(x,y) | R(x,y)  z.(R(x,z)  Tn(z,y))} T0  T1  T2  . . .  Tk-1 = Tk stop. The answer is: Tk Q: How many steps do we need in the iteration ?

Computational Complexity Classes Recall computational complexity classes: AC0 LOGSPACE NLOGSPACE PTIME NP PSPACE EXPTIME EXPSPACE (Kalmar) Elementary Functions Turing Computable functions We care mostly about these

Query Languages and Complexity Classes Paper: On the Unusual Effectiveness of Logic in Computer Science PSPACE FO(PFP) PTIME FO(LFP AC0 FO Important: the more complex a QL, the harder it is to optimize

Conjunctive Queries Definition A conjunctive query is a FO restricted to R(t1, ..., tn), ,  (missing are , , ) CQ = all conjunctive queries Any CQ query can be written as: x1. x2... xn.(R1(t11,...,t1m)  ...  Rk(tk1,...,tkm)) Same in Datalog notation: A(x1,...,xn) :- R1(t11,...,t1m), ... , Rk(tk1,...,tkm)

Examples Employee(x), ManagedBy(x,y), Manager(y) Find all employees having the same manager as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)

Examples Employee(x), ManagedBy(x,y), Manager(y) Find all employees having the same director as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z)

Equivalent Formulations Relational Algebra: Conjunctive queries correspond precisely to sC, PA,  (missing: , –) A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) P$2.name $1.manager=$2.manager sname=“Smith” ManagedBy ManagedBy

Equivalent Formulations SQL: Conjunctive queries correspond to single select-disticnt-from-where blocks with equality conditions in the WHERE clause select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager

Conjunctive Queries Most useful class of queries Also enjoys remarkable, positive properties Focus of research during 70’s, 80’s Still focus of research in the 00’s We discuss the most celebrated property of conjunctive queries: containment is decidable

Query Containment Definition Given two queries q1, q2, we say that q1 is contained in q2 if for every database D, q1(D)  q2(D). Notation: q1  q2 Obviously: if q1  q2 and q2  q1 then q1 = q2.

Examples of Query Containments q1(x) :- R(x,u), R(u,v), R(v,w) q2(x) :- R(x,u), R(u,v) q1(x) :- R(x,u), R(u,”Smith”) q1(x) :- R(x,u), R(u,u) In all cases: q1  q2

Examples of Query Containments q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) Then q1  q2 (why ?)

Examples of Query Containments q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) q2(x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u) Then q1  q2 (why ?)

Query Containment Theorem Query containment for FO is undecidable Theorem Query containment for CQ is decidable and NP-complete.

Query Containment The most interesting part: how we check q1  q2 The canonical database and the canonical tuple for q1: Canonical database: Dq1 = (D, R1, …, Rk) where: D = all variables and constants in q1 R1, …, Rk = the body of q1 Canonical tuple: tq1 = the head of q1

Examples of Canonical Databases q1(x,y) :- R(x,u),R(v,u),R(v,y) Dq1 = (D, R) D={x,y,u,v} R = tq1 = (x,y) x y v u

Examples of Canonical Databases q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) Dq1 = (D, R) D={x,u,”Smith”,”Fred”} R = tq1 = (x) x u “Smith” “Fred”

Checking Containment Theorem: q1  q2 iff tq1 q2(Dq1). Example: q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) D={x,y,u,v} R = tq1 = (x,y) Yes, q1  q2 x y v u

Query Homeomorphisms How do we evaluate q2 on Dq1 ? A homeomorphism f : q2  q1 is a function f: var(q2)  var(q1) U const(q1) such that: f(body(q2))  body(q1) f(canonicalTuple(q2)) = canonicalTuple(q1)

Example of Query Homeomorphism var(q1) = {x, u, v, y} var(q2) = {x, u, v, w, t, y} q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y)

Example of Query Homeomorphism var(q1) U const(q1) = {x,u, “Smith”} var(q2) = {x,u,v,w} q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) q2(x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u)

The Homeomorphism Theorem Theorem q1  q2 iff there exists a homeomorphism from q2 to q1. Theorem Conjunctive query containment is: (1) decidable (why ?) (2) in NP (why ?) (3) NP-hard Short: it is NP-complete