Presentation is loading. Please wait.

Presentation is loading. Please wait.

Conjunctive Queries, Views, Datalog Monday, 4/29/2002

Similar presentations


Presentation on theme: "Conjunctive Queries, Views, Datalog Monday, 4/29/2002"— Presentation transcript:

1 Conjunctive Queries, Views, Datalog Monday, 4/29/2002
CSE 544: Lecture 9 Conjunctive Queries, Views, Datalog Monday, 4/29/2002

2 Conjunctive Queries A conjunctive query is an FO formula containing:
R(t1, ..., tn), ,  (missing are , , ) CQ = set of conjunctive queries Example: q(x,y) = z.(R(x,z)  u.(R(z,u)  R(u,y)))

3 Conjunctive Queries Any CQ query can be written as: (why ?)
Same in Datalog notation: q(x1,...,xn) = y1. y2... yp.(R1(t11,...,t1m)  ...  Rk(tk1,...,tkm)) body Datalog rule q(x1,...,xn) :- R1(t11,...,t1m), ... , Rk(tk1,...,tkm)) head

4 Examples Employee(x), ManagedBy(x,y), Manager(y)
Find all employees having the same manager as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)

5 Examples Employee(x), ManagedBy(x,y), Manager(y)
Find all employees having the same director as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z)

6 CQ and Relational Algebra
Conjunctive queries correspond precisely to sC, PA,  (missing: , –) A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) P$2.name $1.manager=$2.manager sname=“Smith” ManagedBy ManagedBy

7 CQ and SQL SQL: Conjunctive queries correspond to single select-distinct-from-where blocks with equality conditions in the WHERE clause select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager

8 Conjunctive Queries Main focus of optimization techniques
Focus of research during 70’s, 80’s Still focus of research in the 00’s Properties of CQ: Containment is decidable [Chandra&Merlin’77] Query rewriting using views [Levy et al.’95, Ullman’99]

9 Query Containment Query q1 is contained in q2 if for every database D, q1(D)  q2(D). Notation: q1  q2 Obviously: if q1  q2 and q2  q1 then q1 = q2.

10 Examples of Query Containments
In which cases is q1  q2 ? q1(x) :- R(x,u), R(u,v), R(v,w) q2(x) :- R(x,u), R(u,v) q1(x) :- R(x,u), R(u,v), R(v,x) q2(x) :- R(x,u), R(u,x) q1(x) :- R(x,u), R(u,u) q2(x) :- R(x,u), R(u,v) q1(x) :- R(x,u), R(u,”Smith”) q2(x) :- R(x,u), R(u,v)

11 Query Containment Theorem Query containment for FO is undecidable
Theorem Query containment for CQ is decidable and NP-complete.

12 Query Containment Algorithm
How to check q1  q2 Canonical database for q1 is: Dq1 = (D, R1D, …, RkD) D = all variables and constants in q1 R1D, …, RkD = the body of q1 Canonical tuple for q1 is: tq1 (the head of q1)

13 Examples of Canonical Databases
Canonical database: Dq1 = (D, RD) D={x,y,u,v} RD = Canonical tuple: tq1 = (x,y) q1(x,y) :- R(x,u),R(v,u),R(v,y) x u v y

14 Examples of Canonical Databases
Dq1 = (D, R) D={x,u,”Smith”,”Fred”} R = tq1 = (x) q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) x u “Smith” “Fred”

15 Checking Containment Theorem: q1  q2 iff tq1 q2(Dq1). Example:
q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y) D={x,y,u,v} R = tq1 = (x,y) Yes, q1  q2 x u v y

16 Query Homomorphisms A homomorphism f : q2  q1 is a function f: var(q2)  var(q1)  const(q1) such that: f(body(q2))  body(q1) f(tq1) = tq2 The Homomorphism Theorem q1  q2 iff there exists a homomorphism f : q2  q1

17 Example of Query Homeomorphism
var(q1) = {x, u, v, y} var(q2) = {x, u, v, w, t, y} q1(x,y) :- R(x,u),R(v,u),R(v,y) q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y)

18 Example of Query Homeomorphism
var(q1)  const(q1) = {x,u, “Smith”} var(q2) = {x,u,v,w} q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) q2(x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u)

19 The Homeomorphism Theorem
Theorem q1  q2 iff there exists a homeomorphism from q2 to q1. Theorem Conjunctive query containment is: (1) decidable (why ?) (2) in NP (why ?) (3) NP-hard Short: it is NP-complete

20 Views Employee(x), ManagedBy(x,y), Manager(y) Views
L(x,y) :- ManagedBy(x,u), ManagedBy(u,y) E(x,y) :- ManagedBy(x,y), Employee(y) Query Q(x,y) :- ManagedBy(x,u), ManagedBy(u,v), ManagedBy(v,w), ManagedBy(w,y), Employee(y) How can we answer Q if we only have L and E ?

21 Views Query rewriting using views (when possible): Query answering:
Sometimes we cannot express it in CQ or FO, but we can still answer it Q(x,y) :- L(x,u), L(u,y), E(v,y)

22 Views Applications: Using advanced indexes Using replicated data
Data integration [Ullman’99]

23 Expressive Power Vocabulary: binary relation R
The following queries cannot be expressed in FO: Transitive closure: x.y. there exists x1, ..., xn s.t. R(x,x1)  R(x1,x2)  ...  R(xn-1,xn)  R(xn,y) Parity: the number of edges in R is even

24 Datalog Adds recursion, so we can compute transitive closure
A datalog program (query) consists of several datalog rules: P1(t1) :- body1 P2(t2) :- body Pn(tn) :- bodyn

25 Datalog Terminology: EDB = extensional database predicates
The database predicates IDB = intentional database predicates The new predicates constructed by the program

26 Datalog Employee(x), ManagedBy(x,y), Manager(y) EDBs
All higher level managers that are employees: HMngr(x) :- Manager(x), ManagedBy(y,x), ManagedBy(z,y) Answer(x) :- HMngr(x), Employee(x) IDBs

27 Datalog Employee(x), ManagedBy(x,y), Manager(y) All persons:
Person(x) :- Manager(x) Person(x) :- Employee(x) Manger  Employee

28 Datalog Graph: R(x,y) P(x,y) :- R(x,u), R(u,v), R(v,y)
A(x,y) :- P(x,u), P(u,y) Can “unfold” it into: A(x,y) :- R(x,u), R(u,v), R(v,w), R(w,m), R(m,n), R(n,y)

29 Recursion in Datalog Graph: R(x,y) Transitive closure:
P(x,y) :- R(x,y) P(x,y) :- P(x,u), R(u,y) Transitive closure: P(x,y) :- R(x,y) P(x,y) :- P(x,u), P(u,y)

30 Recursion in Datalog Boolean trees: Leaf0(x), Leaf1(x), AND(x, y1, y2), OR(x, y1, y2), Root(x) Find out if the tree value is 0 or 1 One(x) :- Leaf1(x) One(x) :- AND(x, y1, y2), One(y1), One(y2) One(x) :- OR(x, y1, y2), One(y1) One(x) :- OR(x, y1, y2), One(y2) Answer() :- Root(x), One(x)

31 Exercise Boolean trees: Leaf0(x), Leaf1(x), AND(x, y1, y2), OR(x, y1, y2), Not(x,y), Root(x) Hint: compute both One(x) and Zero(x) here you need to use Leaf0

32 Variants of Datalog without recursion with recursion without 
Non-recursive Datalog = union of CQ (why ?) Datalog with  Non-recursive Datalog = FO Datalog

33 Computational Complexity Classes
Recall computational complexity classes: AC0 LOGSPACE NLOGSPACE PTIME NP PSPACE EXPTIME EXPSPACE (Kalmar) Elementary Functions Turing Computable functions We care mostly about these

34 Query Languages and Complexity Classes
Paper: On the Unusual Effectiveness of Logic in Computer Science PSPACE FO(PFP) = datalog,* PTIME FO(LFP) = datalog AC0 FO = non-rec datalog Important: the more complex a QL, the harder it is to optimize


Download ppt "Conjunctive Queries, Views, Datalog Monday, 4/29/2002"

Similar presentations


Ads by Google