CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.

CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides from Dave Maier)

Goals for this Unit Study recursive query processing using Datalog
(Note: there are other data languages with recursion: QBE, G-Whiz, SQL:1999, LDL) Models of a Datalog program Evaluation methods Negation and recursion Applications (If time) Constraints and Typing

Example Recursive Program
Three predicates, that talk about a distributed algebra expression: Op1(Id, In, Loc): unary operation Id with input In is performed at location Loc Op2(Id, In1, In2, Loc): binary operation Id with inputs In1 and In2 is performed at location Loc Local(Id, Loc): expression with root Id can be evaluated completely at location Loc j2 Node 1 j1 s2 Node 2 s1 u1 u r s q

Clauses Local(r, node1). Local(s, node1). Local(q, node1). Local(u, node2). Local(Id, Loc) :- Op1(Id, In, Loc), Local(In, Loc). Local(Id, Loc) :- Op2(Id, In1, In2, Loc), Local(In1, Loc), Local(In2, Loc). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

Clauses – Run this program to find the operations that are Local.
Local(r, node1). Local(s, node1). Local(q, node1). Local(u, node2). Local(Id, Loc) :- Op1(Id, In, Loc), Local(In, Loc). Local(Id, Loc) :- Op2(Id, In1, In2, Loc), Local(In1, Loc), Local(In2, Loc). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

Clauses – Run this program to find the operations that are Local
Clauses – Run this program to find the operations that are Local. Answer. Local(r, node1). Local(s, node1). Local(q, node1). Local(u, node2). Local(Id, Loc) :- Op1(Id, In, Loc), Local(In, Loc). Local(Id, Loc) :- Op2(Id, In1, In2, Loc), Local(In1, Loc), Local(In2, Loc). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). For this query: :- Local(x, y). we get: Local(s1, node1). Local(s2, node2). Local(u1, node1). Local(j1, node1). plus the first four facts in the program.

Facts versus Rules It will be useful if we can separate predicates into Extensional: defined by facts Intensional: defined by rules Notice that in our program the Local predicate has both facts and rules. How could we modify the program to get this separation?

Clauses – use new name for extensional “local” facts – shown here as Local0
Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Local(Id, Loc) :- Op1(Id, In, Loc), Local(In, Loc). Local(Id, Loc) :- Op2(Id, In1, In2, Loc), Local(In1, Loc), Local(In2, Loc). Local(Id, Loc) :- Local0(Id, Loc). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). Add this rule here.

Conventions Assume predicates are extensional or intensional
If I write just the intensional rules, we will assume that predicates without rules are extensional predicates. Can think of them as the database Local(Id, Loc) :- Local0(Id, Loc) Local(Id, Loc) :- Op1(Id, In, Loc), Local(In, Loc). Local(Id, Loc) :- Op2(Id, In1, In2, Loc), Local(In1, Loc), Local(In2, Loc).

Computed Predicates Bancilhon and Ramakrishnan allow for computed predicates (An Amateur’s …) Sum(X, Y, Z): true if X + Y = Z as integers Can think of Sum as an infinite extensional predicate: Sum(1,1,2), Sum(1,2,3), Sum(1,3,4), … Sum(2,1,3), Sum(2,2,4), Sum(2,3,5), … Sum(3,1,4), … … But … this is a big problem for the evaluation engine – if we ever need to compute the full extension.

When are computed (evaluable) predicates safe
In the paper (An Amateur’s …), they define a rule to be strongly safe iff: every variable that appears in the head appears in the body and every variable that appears in an evaluable predicate also appears in at least one base predicate. Note: the paper (An Amateur’s …) uses recursive Datalog WITHOUT negation. So, this definition of safe datalog is simpler.

(extended) Example with Computed Predicate Note: this Local has 3 arguments! We’ll use it from now on. Local(Id, Loc, Num): expression with root Id has Num elements and can be evaluated completely at location Loc Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). query is :- Local(j1, L, C). query answer is Local(j1, node1, 6).

Are these rules strongly safe
Are these rules strongly safe? Are they evaluable – even if infinite domains? Local(Id, Loc, Num): expression with root Id has Num elements and can be evaluated completely at location Loc Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). query is :- Local(j1, L, C). query answer is Local(j1, node1, 6).

Are these rules strongly safe. No
Are these rules strongly safe? No. variables M and K aren’t in base predicates. Are they evaluable – even if infinite domains? Yes, we hope so. (We’ll see.) Local(Id, Loc, Num): expression with root Id has Num elements and can be evaluated completely at location Loc Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). query is :- Local(j1, L, C). query answer is Local(j1, node1, 6).

Ground Instance of a Clause
Given a Datalog clause C, we get a ground instance of C by replacing all variables with constants from the program Clause: Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Possible ground instances: Local(s2, node2, 2) :- Op1(s2, u, node2), Local(u, node2, 1), Sum(1, 1, 2). Local(s2, node2, 2) :- Op1(s2, r, node2), Local(r, node2, 1), Sum(1, 1, 2). Marked ones are not true.

Model of a Datalog Program
A database (aka interpretation) M for a program P is a set of facts over the predicates in P. A database M is a model for program P if It assigns each computable predicate its natural interpretation For any ground instance of a clause H :- L1, L2, …, Ln. if {L1, L2, …, Ln}  M then H  M. Note that all facts of P must be in M.

Aside: what do we mean by interpretation? First-Order theory
An infinite set of variables A (possibly empty) set of constant symbols plus {true, false} A (possibly empty) set of n-ary predicates, for n  0 A (possibly empty) set of n-ary functions, for n  1. (We don’t use functions when we formalize a DB.) Sometimes equality is included as one of the predicates, sometimes not. A set of axioms.

Aside: Example of a First-order theory
No constant symbols No functions One predicate, p Two proper axioms: (x)p(x,x) (x)(y)(z)p(x,y)p(y,z)  p(x,z)

Interpretation for a 1st Order Theory
Note: I added this slide after class; it explains what an interpretation is. An interpretation I = (U, C, P, F) where U is a nonempty set of elements (the domain) C is a function that maps the constants of the theory into elements of U P maps each n-ary predicate (from the theory) into an n-place relation over U F maps each n-ary function (from the theory) into an function from Un  U Variables range over the set U An interpretation is a model (of the theory) iff all of the axioms are true. This is where the DB comes in; we’ll find out when the predicates are true by looking in the DB.

Example interpretations
Axioms: (x)p(x,x) (x)(y)p(x,y)p(y,z)  p(x,z) Interpretation 1: U = non-negative integers, p is the predicate not-equal. Interpretation 2: U = non-negative integers, p is less-than-or-equal-to. Interpretation 3: U = integers, p is less-than. Which of these interpretations are models? Define another model for this theory.

Example interpretations - answers
Axioms: (x)p(x,x) (x)(y)p(x,y)p(y,z)  p(x,z) Interpretation 1: U = non-negative integers, p is the predicate not-equal. No. This interpretation violates the 2nd axiom. Interpretation 2: U = non-negative integers, p is less-than-or-equal-to. No. This interpretation violates the 1st axiom. Interpretation 3: U = integers, p is less-than. Yes. This interpretation is a model.

Another first-order theory
One constant symbol {zero} One function, + One predicate, equality Proper axioms: (x)(y)(z) x+(y+z) = (x+y)+z (x) x+zero = x (x)(y) x+y=zero (x)(y) x=y  y=x (x) x=x (x)(y)(z) x=y  (x=z  y=z) (x)(y)(z) (x=y)  ((x+z=y+z) AND (z+y=z+x))

Which ones are models of the theory?
Interpretations for theory (on previous slide) (Exercise to do at home) Interpretation 1: U = integers zero is mapped to 0 + is regular addition = is regular equality Interpretation 2: U = non-negative integers Interpretation 3: U = integers modulo 5 zero is mapped to 0 + is modulo 5 addition = is regular equality Interpretation 4: U = integers zero is mapped to 1 + is regular addition Which ones are models of the theory?

Interpretations for theory (on previous slide) (Exercise to do at home) Answers
U = integers zero is mapped to 0 + is regular addition = is regular equality Yes. This is a model. Interpretation 2: U = non-negative integers No. This fails 3rd axiom. Interpretation 3: U = integers modulo 5 zero is mapped to 0 + is modulo 5 addition = is regular equality Yes. This is a model. Interpretation 4: U = integers zero is mapped to 1 + is regular addition No. This fails 2nd axiom.

Model of a Datalog Program (repeated)
A database (aka interpretation) M for a program P is a set of facts over the predicates in P. A database M is a model for program P if It assigns each computable predicate its natural interpretation For any ground instance of a clause H :- L1, L2, …, Ln. if {L1, L2, …, Ln}  M then H  M. Note that all facts of P must be in M. To be a model, if body of rule is true, head must be true.

CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.

Similar presentations

Presentation on theme: "CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.

Similar presentations

Presentation on theme: "CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides."— Presentation transcript:

Similar presentations

About project

Feedback