Download presentation
Presentation is loading. Please wait.
Published byTapani Mäkinen Modified over 5 years ago
1
Conjunctive Queries, Views, Datalog Monday, 4/29/2002
CSE 544: Lecture 9 Conjunctive Queries, Views, Datalog Monday, 4/29/2002
2
Conjunctive Queries A conjunctive query is an FO formula containing:
R(t1, ..., tn), , (missing are , , ) CQ = set of conjunctive queries Example: q(x,y) = z.(R(x,z) u.(R(z,u) R(u,y)))
3
Conjunctive Queries Any CQ query can be written as: (why ?)
Same in Datalog notation: q(x1,...,xn) = y1. y2... yp.(R1(t11,...,t1m) ... Rk(tk1,...,tkm)) body Datalog rule q(x1,...,xn) :- R1(t11,...,t1m), ... , Rk(tk1,...,tkm)) head
4
Examples Employee(x), ManagedBy(x,y), Manager(y)
Find all employees having the same manager as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)
5
Examples Employee(x), ManagedBy(x,y), Manager(y)
Find all employees having the same director as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z)
6
CQ and Relational Algebra
Conjunctive queries correspond precisely to sC, PA, (missing: , –) A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) P$2.name $1.manager=$2.manager sname=“Smith” ManagedBy ManagedBy
7
CQ and SQL SQL: Conjunctive queries correspond to single select-distinct-from-where blocks with equality conditions in the WHERE clause select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager
8
Conjunctive Queries Main focus of optimization techniques
Focus of research during 70’s, 80’s Still focus of research in the 00’s Properties of CQ: Containment is decidable [Chandra&Merlin’77] Query rewriting using views [Levy et al.’95, Ullman’99]
9
Query Containment Query q1 is contained in q2 if for every database D, q1(D) q2(D). Notation: q1 q2 Obviously: if q1 q2 and q2 q1 then q1 = q2.
10
Examples of Query Containments
In which cases is q1 q2 ? q1(x) :- R(x,u), R(u,v), R(v,w) q2(x) :- R(x,u), R(u,v) q1(x) :- R(x,u), R(u,v), R(v,x) q2(x) :- R(x,u), R(u,x) q1(x) :- R(x,u), R(u,u) q2(x) :- R(x,u), R(u,v) q1(x) :- R(x,u), R(u,”Smith”) q2(x) :- R(x,u), R(u,v)
11
Query Containment Theorem Query containment for FO is undecidable
Theorem Query containment for CQ is decidable and NP-complete.
12
Query Containment Algorithm
How to check q1 q2 Canonical database for q1 is: Dq1 = (D, R1D, …, RkD) D = all variables and constants in q1 R1D, …, RkD = the body of q1 Canonical tuple for q1 is: tq1 (the head of q1)
13
Examples of Canonical Databases
Canonical database: Dq1 = (D, RD) D={x,y,u,v} RD = Canonical tuple: tq1 = (x,y) q1(x,y) :- R(x,u),R(v,u),R(v,y) x u v y
14
Examples of Canonical Databases
Dq1 = (D, R) D={x,u,”Smith”,”Fred”} R = tq1 = (x) q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) x u “Smith” “Fred”
15
Checking Containment Theorem: q1 q2 iff tq1 q2(Dq1). Example:
q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) D={x,y,u,v} R = tq1 = (x,y) Yes, q1 q2 x u v y
16
Query Homomorphisms A homomorphism f : q2 q1 is a function f: var(q2) var(q1) const(q1) such that: f(body(q2)) body(q1) f(tq1) = tq2 The Homomorphism Theorem q1 q2 iff there exists a homomorphism f : q2 q1
17
Example of Query Homeomorphism
var(q1) = {x, u, v, y} var(q2) = {x, u, v, w, t, y} q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y)
18
Example of Query Homeomorphism
var(q1) const(q1) = {x,u, “Smith”} var(q2) = {x,u,v,w} q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) q2(x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u)
19
The Homeomorphism Theorem
Theorem q1 q2 iff there exists a homeomorphism from q2 to q1. Theorem Conjunctive query containment is: (1) decidable (why ?) (2) in NP (why ?) (3) NP-hard Short: it is NP-complete
20
Views Employee(x), ManagedBy(x,y), Manager(y) Views
L(x,y) :- ManagedBy(x,u), ManagedBy(u,y) E(x,y) :- ManagedBy(x,y), Employee(y) Query Q(x,y) :- ManagedBy(x,u), ManagedBy(u,v), ManagedBy(v,w), ManagedBy(w,y), Employee(y) How can we answer Q if we only have L and E ?
21
Views Query rewriting using views (when possible): Query answering:
Sometimes we cannot express it in CQ or FO, but we can still answer it Q(x,y) :- L(x,u), L(u,y), E(v,y)
22
Views Applications: Using advanced indexes Using replicated data
Data integration [Ullman’99]
23
Expressive Power Vocabulary: binary relation R
The following queries cannot be expressed in FO: Transitive closure: x.y. there exists x1, ..., xn s.t. R(x,x1) R(x1,x2) ... R(xn-1,xn) R(xn,y) Parity: the number of edges in R is even
24
Datalog Adds recursion, so we can compute transitive closure
A datalog program (query) consists of several datalog rules: P1(t1) :- body1 P2(t2) :- body Pn(tn) :- bodyn
25
Datalog Terminology: EDB = extensional database predicates
The database predicates IDB = intentional database predicates The new predicates constructed by the program
26
Datalog Employee(x), ManagedBy(x,y), Manager(y) EDBs
All higher level managers that are employees: HMngr(x) :- Manager(x), ManagedBy(y,x), ManagedBy(z,y) Answer(x) :- HMngr(x), Employee(x) IDBs
27
Datalog Employee(x), ManagedBy(x,y), Manager(y) All persons:
Person(x) :- Manager(x) Person(x) :- Employee(x) Manger Employee
28
Datalog Graph: R(x,y) P(x,y) :- R(x,u), R(u,v), R(v,y)
A(x,y) :- P(x,u), P(u,y) Can “unfold” it into: A(x,y) :- R(x,u), R(u,v), R(v,w), R(w,m), R(m,n), R(n,y)
29
Recursion in Datalog Graph: R(x,y) Transitive closure:
P(x,y) :- R(x,y) P(x,y) :- P(x,u), R(u,y) Transitive closure: P(x,y) :- R(x,y) P(x,y) :- P(x,u), P(u,y)
30
Recursion in Datalog Boolean trees: Leaf0(x), Leaf1(x), AND(x, y1, y2), OR(x, y1, y2), Root(x) Find out if the tree value is 0 or 1 One(x) :- Leaf1(x) One(x) :- AND(x, y1, y2), One(y1), One(y2) One(x) :- OR(x, y1, y2), One(y1) One(x) :- OR(x, y1, y2), One(y2) Answer() :- Root(x), One(x)
31
Exercise Boolean trees: Leaf0(x), Leaf1(x), AND(x, y1, y2), OR(x, y1, y2), Not(x,y), Root(x) Hint: compute both One(x) and Zero(x) here you need to use Leaf0
32
Variants of Datalog without recursion with recursion without
Non-recursive Datalog = union of CQ (why ?) Datalog with Non-recursive Datalog = FO Datalog
33
Computational Complexity Classes
Recall computational complexity classes: AC0 LOGSPACE NLOGSPACE PTIME NP PSPACE EXPTIME EXPSPACE (Kalmar) Elementary Functions Turing Computable functions We care mostly about these
34
Query Languages and Complexity Classes
Paper: On the Unusual Effectiveness of Logic in Computer Science PSPACE FO(PFP) = datalog,* PTIME FO(LFP) = datalog AC0 FO = non-rec datalog Important: the more complex a QL, the harder it is to optimize
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.