Download presentation
Presentation is loading. Please wait.
1
CSE 544: Lecture 8 Theory
2
Review of Relational Algebra
Five basic RA operators: , -, s, p,
3
Find all employees with salary more than $40,000.
s Salary > (Employee)
4
P SSN, Name (Employee)
6
LastName, SocSocNo (Employee)
Renaming Example Employee Name SSN John Tony LastName, SocSocNo (Employee) LastName SocSocNo John Tony
7
Natural Join Example Employee Name SSN John Tony Dependents SSN Dname Emily Joe Employee Dependents = PName, SSN, Dname(s SSN=SSN2(Employee rSSN2, Dname(Dependents)) Name SSN Dname John Emily Tony Joe
8
Natural Join R= S= R S= A B X Y Z V B C Z U V W A B C
9
Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? Given R(A, B, C), S(D, E), what is R S ? Given R(A, B), S(A, B), what is R S ?
10
Today’s Outline First Order Logic as a Query Language
Reading assignment: 4.3 Query languages and Complexity Classes Conjunctive queries
11
First Order Logic: Syntax
Given: A vocabulary: R1, …, Rk An arity, ar(Ri), for each i=1,…,k An infinite supply of variables x1, x2, x3, … Constants: c1, c2, c3, ... FO formulas, , are: ::= R(t1, ..., tar(R)) | ti = tj | ’ | ’ | ’ x. | x. t ::= x | c
12
Examples of Formulas Most interesting case:
Vocabulary = one binary relation R (encodes a graph) 1 4 2 3 1 2 3 4 R=
13
Examples of Closed Formulas
Does there exists a loop in the graph ? Are there paths of length >2 ? Is there a “sink” node ? x.R(x,x) x.y.z.u.(R(x,y) R(y,z) R(z,u)) x.y.R(x,y)
14
Examples of Closed Formulas
Is there a clique of size 4 ? Here x1x2 stands for x1=x2, etc. Is the graph transitively closed ? Here A B stands for A B x1. x2. x3. x (x1x2 x1x3 ... x3x4 R(x1,x2) R(x1,x3) ... R(x3,x4)) x.y.z.(R(x,y) R(y,z) R(x,z))
15
Examples of Open Formulas
Find all nodes connected by a path of length 2: Find all nodes without outgoing edges: (x,y) u.(R(x,u) R(u,y)) (x) y.R(x,y)
16
More Examples Vocabulary (= schema): Queries:
Employee(name, office, mgr) Manager(name, office) Queries: Find offices: Find offices with at least two employees: Find managers that share office with all their employees: [to do in class] (y) (x.z.Employee(x,y,z) x.Manager(x,y)) (y) x.z.x’.z’.(Employee(x,y,z) Employee(x’,y,z’) xx’)
17
First Order Logic: Semantics
Given a vocabulary R1, …, Rk A model is D = (D, R1D, …, RkD) D = a set, called domain, or universe RiD D D ... D, (ar(Ri) times) i = 1,...,k
18
First Order Logic: Semantics
Given: A model D = (D, R1D, ..., RkD) A formula A substitution s : {x1, x2, ...} D We define next the relation: meaning “D satisfies with s” D |= [s]
19
First Order Logic: Semantics
D |= (R(t1, ..., tn)) [s] If (s(t1), ..., s(tn)) RD D |= (t= t’) [s] If s(t) = s(t’)
20
First Order Logic: Semantics
D |= ( ’) [s] If D |= ()[s] and D |= (’) [s] D |= ( ’) [s] If D |= ()[s] or D |= (’) [s] D |= (’) [s] If not D |= ()[s]
21
First Order Logic: Semantics
D |= (x.) [s] If for all s’ s.t. s(y) = s’(y) for all variables y other than x, D |= ()[s’] D |= (x.) [s] If for some s’ s.t. s(y) = s’(y) for all variables y other than x, D |= ()[s’]
22
FO and Databases FO Databases Vocabulary: R1, ..., Rn Database schema:
Model: D = (D, R1D, …, RkD) Database instance: D = (D, R1D, …, RkD) Formulas are true or false Formulas compute queries
23
FO and Databases FO: a closed formula is true in D if D |=
Databases: a formula with free variables x1, ..., xn defines the query: (D) = {(s(x1), ..., s(xn)) | D |= [s]}
24
FO and Databases The Relational Calculus The Tuple Calculus
The query language consisting of FO The Tuple Calculus A minor variation on the relational calculus Uses tuple variables instead of atomic variables Reading assignment 4.3 But some “queries” in these languages make no sense Define safe queries next
25
Safe Queries A model D = (D, R1D, …, RkD) In FO: In databases:
both D and R1D, …, RkD may be infinite In databases: D may infinite (int, string, etc) R1D, …, RkD are always finite We call this a finite model
26
Safe Queries is a finite query if for every finite model D, (D) is finite is a domain independent query if for every two finite models D, D’ having the same relations: D = (D, R1D, …, RkD), D’ = (D’, R1D, …, RkD) we have (D) = (D’) Domain independent query aka safe query Notice: book has different but equivalent definition
27
Unsafe Relational Queries
Find all nodes that are not in the graph: Find all nodes that are connected to “everything”: Find all pairs of employees or offices: We don’t want such queries ! Finite, but not safe
28
Safe Queries Definition. Given D = (D, R1D, …, RkD), the active domain is Da = the set of all constants in R1D, …, RkD Example. Given a graph D = (D, R) Da = { x | y.R(x,y) z.R(z,x)} Property. If a query is safe, it suffices to range quantifiers only over the active domain (why ?) Hence we can compute safe queries
29
Safe Queries The safe relational calculus consists only of safe queries. However: Theorem It is undecidable if a given a FO query is safe. Work around: Define a subset of FO queries that we know are safe = range restricted queries (rr-query) Any safe query is equivalent to some rr-query
30
Range-restriction A syntactic condition on queries
See [AHU] for a definition. The intuition: Range Restricted Queries Non-Range Restricted S(x) R(x,x) R(x,x) S(x) T(x) S(x) T(y) x.(S(x) R(x,x)) x.(R(x,x)) x.(S(x) R(x,x)) x.(R(x,x))
31
Range-restriction Theorem. Every safe query is equivalent to a range-restricted query “Proof”. Translate as follows: R(x, y, ...) x Da y Da R(x, y, ...) x. x.(x Da ) x. x.(x Da ) From now on we assume that all queries are safe
32
FO = Relational Algebra
Recall the 5 operators in the relational algebra: , -, , s, P Theorem. A query can be defined in the safe relational calculus iff it can be defined in the relational algebra
33
FO = Relational Algebra
Proof [in class]
34
Limited Expressive Power
Vocabulary: binary relation R The following queries cannot be expressed in FO: Transitive closure: x.y. there exists x1, ..., xn s.t. R(x,x1) R(x1,x2) ... R(xn-1,xn) R(xn,y) Parity: the number of edges in R is even
35
Extensions of FO FO(LFP) = FO extended with least fixpoint:
Example: define transitive closure like: T(x,y) = R(x,y) z.(R(x,z) T(z,y)) Meaning. Define: T0 := Tn+1 = {(x,y) | R(x,y) z.(R(x,z) Tn(z,y))} T0 T1 T2 Tk-1 = Tk stop. The answer is: Tk Q: How many steps do we need in the iteration ?
36
Computational Complexity Classes
Recall computational complexity classes: AC0 LOGSPACE NLOGSPACE PTIME NP PSPACE EXPTIME EXPSPACE (Kalmar) Elementary Functions Turing Computable functions We care mostly about these
37
Query Languages and Complexity Classes
Paper: On the Unusual Effectiveness of Logic in Computer Science PSPACE FO(PFP) PTIME FO(LFP AC0 FO Important: the more complex a QL, the harder it is to optimize
38
Conjunctive Queries Definition A conjunctive query is a FO restricted to R(t1, ..., tn), , (missing are , , ) CQ = all conjunctive queries Any CQ query can be written as: x1. x2... xn.(R1(t11,...,t1m) ... Rk(tk1,...,tkm)) Same in Datalog notation: A(x1,...,xn) :- R1(t11,...,t1m), ... , Rk(tk1,...,tkm)
39
Examples Employee(x), ManagedBy(x,y), Manager(y)
Find all employees having the same manager as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)
40
Examples Employee(x), ManagedBy(x,y), Manager(y)
Find all employees having the same director as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z)
41
Equivalent Formulations
Relational Algebra: Conjunctive queries correspond precisely to sC, PA, (missing: , –) A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) P$2.name $1.manager=$2.manager sname=“Smith” ManagedBy ManagedBy
42
Equivalent Formulations
SQL: Conjunctive queries correspond to single select-disticnt-from-where blocks with equality conditions in the WHERE clause select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager
43
Conjunctive Queries Most useful class of queries
Also enjoys remarkable, positive properties Focus of research during 70’s, 80’s Still focus of research in the 00’s We discuss the most celebrated property of conjunctive queries: containment is decidable
44
Query Containment Definition Given two queries q1, q2, we say that q1 is contained in q2 if for every database D, q1(D) q2(D). Notation: q1 q2 Obviously: if q1 q2 and q2 q1 then q1 = q2.
45
Examples of Query Containments
q1(x) :- R(x,u), R(u,v), R(v,w) q2(x) :- R(x,u), R(u,v) q1(x) :- R(x,u), R(u,”Smith”) q1(x) :- R(x,u), R(u,u) In all cases: q1 q2
46
Examples of Query Containments
q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) Then q1 q2 (why ?)
47
Examples of Query Containments
q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) q2(x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u) Then q1 q2 (why ?)
48
Query Containment Theorem Query containment for FO is undecidable
Theorem Query containment for CQ is decidable and NP-complete.
49
Query Containment The most interesting part: how we check q1 q2
The canonical database and the canonical tuple for q1: Canonical database: Dq1 = (D, R1, …, Rk) where: D = all variables and constants in q1 R1, …, Rk = the body of q1 Canonical tuple: tq1 = the head of q1
50
Examples of Canonical Databases
q1(x,y) :- R(x,u),R(v,u),R(v,y) Dq1 = (D, R) D={x,y,u,v} R = tq1 = (x,y) x y v u
51
Examples of Canonical Databases
q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) Dq1 = (D, R) D={x,u,”Smith”,”Fred”} R = tq1 = (x) x u “Smith” “Fred”
52
Checking Containment Theorem: q1 q2 iff tq1 q2(Dq1). Example:
q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) D={x,y,u,v} R = tq1 = (x,y) Yes, q1 q2 x y v u
53
Query Homeomorphisms How do we evaluate q2 on Dq1 ?
A homeomorphism f : q2 q1 is a function f: var(q2) var(q1) U const(q1) such that: f(body(q2)) body(q1) f(canonicalTuple(q2)) = canonicalTuple(q1)
54
Example of Query Homeomorphism
var(q1) = {x, u, v, y} var(q2) = {x, u, v, w, t, y} q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y)
55
Example of Query Homeomorphism
var(q1) U const(q1) = {x,u, “Smith”} var(q2) = {x,u,v,w} q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) q2(x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u)
56
The Homeomorphism Theorem
Theorem q1 q2 iff there exists a homeomorphism from q2 to q1. Theorem Conjunctive query containment is: (1) decidable (why ?) (2) in NP (why ?) (3) NP-hard Short: it is NP-complete
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.