Lois Delcambre (slides from Dave Maier)

Slides:



Advertisements
Similar presentations
Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)
Advertisements

CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Great Theoretical Ideas in Computer Science for Some.
Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara
1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Deductive Databases Chapter 25.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
Winter 2004/5Pls – inductive – Catriel Beeri1 Inductive Definitions (our meta-language for specifications)  Examples  Syntax  Semantics  Proof Trees.
Deductive Databases Chapter 25
CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Introduction to Logic and Databases R. Ramakrishnan Database Management Systems, Gehrke- Ramakrishnan,
Copyright © 2013, 2009, 2005 Pearson Education, Inc. 1 2 Graphs and Functions Copyright © 2013, 2009, 2005 Pearson Education, Inc.
UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Steffen Staab Advanced Data Modeling 1 of 32 WeST Häufungspunkte Bifurkation: x n+1 = r x n (1-x n ) Startwert x 0 = 0,25.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
Slide 1 Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs Jim Little UBC CS 322 – CSP October 20, 2014.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
1 Relational Algebra and Calculas Chapter 4, Part A.
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
CSE (c) S. Tanimoto, 2008 Predicate Calculus II 1 Predicate Calculus 2 Outline: Unification: definitions, algorithm Formal interpretations and satisfiability.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
Semantics of Predicate Calculus For the propositional calculus, an interpretation was simply an assignment of truth values to the proposition letters of.
Lecture 17 Undecidability Topics:  TM variations  Undecidability June 25, 2015 CSCE 355 Foundations of Computation.
Logics for Data and Knowledge Representation ClassL (part 1): syntax and semantics.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Extensions of Datalog Wednesday, February 13, 2001.
1 Datalog with negation Adapted from slides by Jeff Ullman.
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
Relational Calculus Chapter 4, Section 4.3.
CSE202 Database Management Systems
CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.
Introduction to Logic for Artificial Intelligence Lecture 2
Computer Science cpsc322, Lecture 20
Relational Calculus Chapter 4, Part B
Great Theoretical Ideas in Computer Science
Semantics of Datalog With Negation
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Discrete Mathematics for Computer Science
Logics for Data and Knowledge Representation
Alternating tree Automata and Parity games
Logics for Data and Knowledge Representation
Lectures on Graph Algorithms: searching, testing and sorting
Coverage Approximation Algorithms
Logic: Top-down proof procedure and Datalog
Finite Automata Reading: Chapter 2.
Copyright © Cengage Learning. All rights reserved.
Graphs, Linear Equations, and Functions
Computer Security Foundations
Logic Based Query Languages
Union-Find.
Computer Science cpsc322, Lecture 20
Datalog Inspired by the impedance mismatch in relational databases.
This Lecture Substitution model
Numerical Representation of Strings
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Part 7. Phantoms: Legal States That Cannot Be Constructed
Representations & Reasoning Systems (RRS) (2.2)
CS589 Principles of DB Systems Fall 2008 Lecture 4a: Introduction to Datalog Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4a: Introduction to Datalog Lois Delcambre
Concepts of Computation
CS589 Principles of DB Systems Fall 2008 Lecture 4e: Logic (Model-theoretic view of a DB) Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Relational Calculus Chapter 4, Part B
Minimum Spanning Trees
Presentation transcript:

Lois Delcambre (slides from Dave Maier) Recursive Query Processing Naïve and semi-naïve query processing Lecture 2-2 Lois Delcambre (slides from Dave Maier)

Dependency Graphs for Datalog Programs

Datalog program no recursion with negation: dependency graph Given a DB with two relations: Topics(Topic) and Interests(Person,Topic) What is this query computing? Diff(a) :- Prod(a,b), Interests(a,b). Prod(a,b) :- Interests(a,c), Topics(b). Create one node for each body predicate. Draw an arrow from body predicate to corresponding head predicate. Acyclic dependency graph ≡ not recursive. It’s useful to label arrows if predicate is negative. Topics Interests Prod  Diff

Datalog program no recursion with negation: dependency graph Given a DB with two relations: Topics(Topic) and Interests(Person,Topic) What is this query computing? Diff(a) :- Prod(a,b), Interests(a,b). Prod(a,b) :- Interests(a,c), Topics(b). Evaluate predicates completely – according to the order of operations in the dependency graph. Topics Interests Prod  Diff

Datalog with recursion without negation: dependency graph Anc(x,y) :- Parent-of(x,y). Anc(x,z) :- Anc(x,y), Parent-of(y,z). Recursive Datalog program has a cycle in the dependency graph. Parent-of Anc

Datalog with recursion without negation Needed(D, M, C) :- MajAll(M, C), Offered(D, M). Needed(D, M, C) :- DegAll(D, C), Offered(D, M). MajAll(M, C) :- MajReq(M, C). MajAll(M, C) :- MajAll(M, C1), Prereq(C, C1). DegAll(D, C) :- DegReq(D, C). DegAll(D, C) :- DegAll(D, C1), Prereq(C, C1). Needed MajAll DegAll Offered MajReq Prereq DegReq

Here too … you can evaluate in order of dependency graph Compute DegAll from DegReq and Prereq and DegAll Compute MajorAll from MajReq and Prereq Compute Needed from MajAll, DegAll, Offered Needed MajAll DegAll Offered MajReq Prereq DegReq

Naïve Evaluation of Recursive Datalog (without negation)

Naïve Evaluation of P A “bottom-up” or “right-to-left” method. Have a current set of facts f (a partial model). Use clauses in P in a right-to-left manner to derive additional facts that must also be in a minimum model for P.

Immediate Consequence Operator If P is a Datalog program, the immediate consequence operator IP(f) computes all facts derivable from the set of facts f by a single application of the clauses in P. Note: If f is a subset of the minimum model of P, then so is IP(f), IP(IP(f)), IP(IP(IP(f))), … So start the process with f = , because we know we have a subset of the minimum model. Inflationary semantics of program P: Start with empty set of facts, apply IP until no change.

Naïve Evaluation Naïve(P, q) fold =  f = IP(fold) while f  fold do fold = f f = IP(f) return match(q, f) Naïve always halts on programs that don’t use computed predicates, with f as the minimum model.

Working through naïve evaluation (1) f0 =  Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

Working through naïve evaluation (2) f1 = IP(f0) = Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

Working through naïve evaluation (3) f2 = IP(f1) = Local(r, node1, 1). Local(s, node1, 1). Local(q, node1, 1). Local(u, node2, 1). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

Working through naïve evaluation (4) f3 = IP(f2) = Local(r, node1, 1). Local(s, node1, 1). Local(q, node1, 1). Local(u, node2, 1). Local(s1, node1, 2). Local(u1, node1, 3). Local(s2, node2, 2). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

Working through naïve evaluation (5) f4 = IP(f3) = Local(r, node1, 1). Local(s, node1, 1). Local(q, node1, 1). Local(u, node2, 1). Local(s1, node1, 2). Local(u1, node1, 3). Local(j1, node1, 6). Local(s2, node2, 2). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

Working through naïve evaluation (6) f5 = IP(f4) = f4 Local(r, node1, 1). Local(s, node1, 1). Local(q, node1, 1). Local(u, node2, 1). Local(s1, node1, 2). Local(u1, node1, 3). Local(j1, node1, 6). Local(s2, node2, 2). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

Semi-Naïve Evaluation of Datalog

Semi-Naïve Evaluation Naïve evaluation discovers the same fact over and over. E.g., derive local(r, node1, 1) repeatedly Idea: When evaluating a rule, make sure we use at least one newly discovered fact. Otherwise, we derive a fact we’ve already seen. Notice: after f1, only new facts are local facts. (Only need to work on recursive intensional predicates.)

Example: rediscovering facts vs. discovering new facts (using Naïve) Consider the rule Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). When computing f4 = IP(f3). If we use facts Op2(u1, s, q, node1), Local(s, node1, 1), Local(q, node1, 1), we derive Local(u1, node1, 3), which we already had If we use Op2(j1, s1, u1, node1), Local(s1, node1, 2), Local(u1, node1, 3), we get Local (j1, node2, 6), which is new

Implementing Semi-Naïve: use deltas (new facts from this round) Suppose at each step in Naïve we determine NewLocal(Id, Loc, K) if Local(Id, Loc, K) in f – fold. Replace this rule: Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). With these rules: Local(Id, Loc, K) :- Op1(Id, In, Loc), NewLocal(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), NewLocal(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), NewLocal(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). Local(id, Loc, K) :- Op2(Id, In1, In2, Loc), NewLocal(In1, Loc, M), NewLocal(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K).

One Other Modification for Semi-Naive If P’ is the modified program, then instead of f = IP(f) we need f = IP’(f)  f Note: Semi-Naïve doesn’t guarantee we won’t generate a fact at multiple steps, if there are distinct “proofs” of it, i.e., two ways to generate it. MajReq(eng, wr310). MajReq(eng, wr408). Prereq(wr309, wr310). Prereq(wr401, wr408). Prereq(wr309, wr401).

Evaluating Datalog with Recursion and Negation

Now consider Datalog with recursion & negation Person(Dan). (one tuple in base relation) Student(x) :- Person(x), Employee(x). Employee(x) :- Person(x), Student(x). What is the query answer? Either Student(Dan) or Employee(Dan), depending on which rule fires first. Two different query answer!

We have a problem – for any data. Student(x) :- Person(x), Employee(x). Employee(x) :- Person(x), Student(x). Imagine we have some number of persons in the database. They will ALL end up at Students or as Employees, depending on which rule fires first. Two different query answers!

What does the dependency graph look like? Person(Dan). (one tuple in base relation) Student(x) :- Person(x), Employee(x). Employee(x) :- Person(x), Student(x). This is a recursive program.  Employee Person Student 

Now consider recursion & negation Person(Dan). (one tuple in base relation) Student(x) :- Person(x), Employee(x). Employee(x) :- Person(x), Student(x). We have a problem when a cycle in a dependency graph includes (at least) one negative edge.  Employee Person Student

Cycle with one (or more) negative edges Person(Dan). (one tuple in base relation) Student(x) :- Person(x), Employee(x). Employee(x) :- Person(x), Student(x). Recursion (for Student, say) is attempting to add tuples to answer. But it is subtracting Employee tuples (to find the Students). And … Employees are being computed recursively at the same time.  Employee Person Student

Another example with negation & recursion Node(a). Node(b). Node(c). Node(d). Arc(a,b). Arc(c,d). Start(a). Reachable(z) :- Start(z). Reachable(y) :- Reachable(x), Arc(x,y). Unreachable(x) :- Node(x), Reachable(x). These are facts from the database. Draw the dependency graph; label the negative edges.

Dependency graph for this program; notice the negative edge Node(a). Node(b). Node(c). Node(d). Arc(a,b). Arc(c,d). Start(a). Reachable(z) :- Start(z). Reachable(y) :- Reachable(x), Arc(x,y). Unreachable(x) :- Node(x), Reachable(x). Start Arc Reachable Node Unreachable ¬

Stratification – for Datalog with negation and recursion A stratification of a Datalog program P is a partition of P into strata or layers where, for every rule, H :- A1, …, Am, B1, …, Bq The rules that define the predicate H are all in the same strata. For all positive predicates in this rule, their strata is less than or equal to the strata of this rule. Strata(Ai) ≤ Strata(H). For all negative predicates in this rule, their strata is strictly less than the strata of this rule. Strata(Bi) < Strata(H).

Intuition behind stratification In any strata, compute a predicate – in its entirety. If you EVER use a predicate negatively, then make sure it is ALREADY, COMPLETELY computed. So … stratification prevents you from using recursion at the same time you’re computing a predicate that is used negatively.

Exercise: Define a stratification for this datalog program. Node(a). Node(b). Node(c). Node(d). Arc(a,b). Arc(c,d). Start(a). Reachable(z) :- Start(a). Reachable(y) :- Reachable(x), Arc(x,y). Unreachable(x) :- Node(x), Reachable(x). Start Arc Reachable Node Unreachable ¬

Exercise: Define a stratification for this datalog program. Node(a). Node(b). Node(c). Node(d). Arc(a,b). Arc(c,d). Start(a). Reachable(z) :- Start(a). Reachable(y) :- Reachable(x), Arc(x,y). Unreachable(x) :- Node(x), Reachable(x). Start Arc Reachable Node Unreachable ¬ Compute all Reachable facts (positively) in Strata 1. Then, use Reachable negatively in Strata 2. Strata 1 Strata 2

Can you define a stratification for this program? Person(Dan). (one tuple in base relation) Student(x) :- Person(x), Employee(x). Employee(x) :- Person(x), Student(x).  Employee Person Student

Compare these two dependency graphs Person(Dan). (one tuple in base relation) Student(x) :- Person(x), Employee(x). Employee(x) :- Person(x), Student(x). The one on left has a cycle with negation; the one on right doesn’t. The on left can’t be stratified.  Employee Person Student Start Arc Reachable Node Unreachable ¬

Comments Stratified Datalog programs always have a unique answer. Note that Stratification forces the choice of one of various possible answers (models). Not all Datalog programs can be stratified. Some programs can be stratified in more than one way. If so, then all stratifications will produce the result.

Models and Interpretations

How do we use relational databases for relational calculus and for Datalog? An interpretation I = (U, C, P, F) where U is a nonempty set of elements (the domain) C is a function that maps the constants of the theory into elements of U P maps each n-ary predicate (from the theory) into an n-place relation over U F maps each n-ary function (from the theory) into an function from Un  U Variables range over the set U An interpretation is a model (of the theory) iff all of the axioms are true. Union of all domains. No constant symbols needed in the theory. Function-free

How do we use relational databases for relational calculus and for Datalog? An interpretation I = (U, C, P, F) where U is a nonempty set of elements (the domain) C is a function that maps the constants of the theory into elements of U P maps each n-ary predicate (from the theory) into an n-place relation over U F maps each n-ary function (from the theory) into an function from Un  U Variables range over the set U An interpretation is a model (of the theory) iff all of the axioms are true. Union of all domains. In order to have an interpretation, we need to describe (we need to write down) all of the tuples where a given predicate is true. No constant symbols needed in the theory. Function-free Think about a relational database. What does it store for us? It (exactly) stores the tuples that belong in a particular table. A relational database is (exactly) an interpretation of a very simple 1st order theory with no functions and no constant symbols.

Relational DB as an interpretation “In [domain calculus] a relational database is considered to be an interpretation of a first-order theory. The domain of the interpretation is a superset of the active domain of the database and the relations in the database are the extensions of the relation symbols of the database schema. A query in the domain relational calculus is essentially checking [i.e., finding] whether [i.e., where] the database is a model of the first-order formula in the query.” P. 108, textbook

Model of a Datalog Program (repeated) A database (aka interpretation) M for a program P is a set of facts over the predicates in P. A database M is a model for program P if It assigns each computable predicate its natural interpretation For any ground instance of a clause H :- L1, L2, …, Ln. if {L1, L2, …, Ln}  M then H  M. Note that all facts of P must be in M.

A Program Can Have Many Models (extra facts … but still a model) Local(r, node1, 1). Local(r, node2, 1). Local(s, node1, 1). Local(q, node1, 1). Local(u, node2, 1). Local(s1, node1, 2). Local(u1, node1, 3). Local(j1, node1, 6). Local(s2, node2, 2). Local0(r, node1). Local0(r, node2). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). Program: Local(Id, Loc, 1) :- Local0(Id, Loc) Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

A Second Model Model: Local(r, node1, 1). Local(s, node1, 1). Local(q, node1, 1). Local(u, node2, 1). Local(s1, node1, 2). Local(u1, node1, 3). Local(j1, node1, 6). Local(s2, node2, 2). Local(j2, node2, 9). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). Program: Local(Id, Loc, 1) :- Local0(Id, Loc) Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

Closure Under Intersection Lemma: If M1 and M2 are both models for Datalog program P, then so is M1  M2. Proof: Let M = M1  M2. Both M1 and M2 give any computed predicate its natural interpretation, so part 1. of model definition is covered. How could M fail to be a model of P? Must have a ground instance of a clause: H :- L1, L2, …, Ln. Where {L1, L2, …, Ln}  M but H  M. So H must be missing from M1 or M2 (or both).

Example Intersection M1 M2 Local(r, node1, 1). Local(r, node1, 1). Local(s, node1, 1). Local(q, node1, 1). Local(u, node2, 1). Local(s1, node1, 2). Local(u1, node1, 3). Local(j1, node1, 6). Local(s2, node2, 2). Local0(r, node1). Local0(r, node2). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). Local(r, node1, 1). Local(s, node1, 1). Local(q, node1, 1). Local(u, node2, 1). Local(s1, node1, 2). Local(u1, node1, 3). Local(j1, node1, 6). Local(s2, node2, 2). Local(j2, node2, 9). Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). M1 M2

Minimal Model A model M for program P is minimal if no proper subset of M is also a model. Every Datalog program P (without negation) has a unique minimal model. Consider two different minimal models N1 and N2 for P. Then N = N1  N2 is also a model, and is a proper subset of at least one of them. Minimum model of P: Unique minimal model.

Minimum Model and Queries When we answer queries against Datalog program P, we want to answer them against the minimal model. (Closed-World Assumption) Here’s a query: :- Local(j1, L, C). We want all Local facts in the minimum model where the first component is j1.

Immediate Consequence Operator If P is a Datalog program, the immediate consequence operator IP(f) computes all facts derivable from the set of facts f by a single application of the clauses in P. Note: If f is a subset of the minimum model of P, then so is IP(f), IP(IP(f)), IP(IP(IP(f))), … So start the process with f = , because we know we have a subset of the minimum model. Inflationary semantics of program P: Start with empty set of facts, apply IP until no change.

Comment on Immediate Consequence Algorithm 3.4 (Meaning) This algorithm give the meaning for unsafe, recursive Datalog programs. The algorithm uses what is called “inflationary” semantics. This means that if you ever derive a fact in the answer, then it stays there. Temp-21(x) :- Person(x, 21, y). Temp-male (x) :- Person(x, z, “male”). Answer(x) :- Temp-21(x), Temp-male(x). Algorithm 3.4 tells us that if we fire a rule (no matter what order), then the new tuples are added to Result. If the first rule fires and then the third, we’ll have some folks in Answer who are not male. (And this is not regarded as a problem.) A program may have more than one fixed point.

Comment on Immediate Consequence Algorithm 3.5 (New Meaning) This algorithm is for safe, non-recursive Datalog programs. Since the program is not recursive, we can use the dependency graph to induce an ordering on the relations. Basically, compute a relation before it’s used in the body of other rules. Algorithm 3.5, computes the relations according to that order. Temp-21(x) :- Person(x, 21, y). Temp-male (x) :- Person(x, z, “male”). Answer(x) :- Temp-21(x), Temp-male(x). Order the relations: Either Temp-21 or Temp-male is first. Then the other one is second. Answer is computed third. This follows from the dependency graph. Person Temp-21 Temp-male Answer

Comments on Datalog without recursion with negation For safe Datalog without recursion, with negation, if we compute the relations one by one, according to the dependency graph, then there is a unique fixed point. And it is the same as the minimal model. Note that for this language, all rules only need to be fired once.

Comments on Datalog with recursion NO negation For Datalog with recursion, but NO negation, then: The minimal model is unique. The minimal model is always the intersection of all the models. The minimal model is the same as the fixed point of the immediate consequence operator. This language is monotonic (rules only add facts)

Semantics of a Datalog Program with recursion and negation (slide repeated) Stratified Datalog programs always have a unique answer. Note that Stratification forces the choice of one of various possible answers. Not all Datalog programs can be stratified. Some programs can be stratified in more than one way. If so, then all stratifications will produce the result.