1 Datalog Rules Programs Negation. 2 Review of Logical If-Then Rules h(X,…) :- a(Y,…) & b(Z,…) & … head body subgoals “The head is true if all the subgoals.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Interprocedural Analysis. Currently, we only perform data-flow analysis on procedures one at a time. Such analyses are called intraprocedural analyses.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
1 Extended Conjunctive Queries Unions Arithmetic Negation.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Introduction to PROLOG ME 409 Lab - 1. Introduction to PROLOG.
Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara
1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Deductive Databases Chapter 25.
1 Datalog Logical Rules Recursion SQL-99 Recursion.
Lecture 8 Recursively enumerable (r.e.) languages
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
1 Datalog Logical Rules Recursion SQL-99 Recursion.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Using Definite Knowledge Notes for Ch.3 of Poole et al. CSCE 580 Marco Valtorta.
1 Decidability Turing Machines Coded as Binary Strings Diagonalizing over Turing Machines Problems as Languages Undecidable Problems.
1 Semantics of Datalog With Negation Local Stratification Stable Models Well-Founded Models.
Discrete Mathematics Recursion and Sequences
1 SQL/PSM Procedures Stored in the Database General-Purpose Programming.
Embedded SQL Direct SQL is rarely used: usually, SQL is embedded in some application code. We need some method to reference SQL statements. But: there.
Decision Properties of Regular Languages
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Logical Rules Recursion
1 Datalog Logical Rules Recursion. 2 Logic As a Query Language uIf-then logical rules have been used in many systems. wMost important today: EII (Enterprise.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Deductive Databases Chapter 25
CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Introduction to Logic and Databases R. Ramakrishnan Database Management Systems, Gehrke- Ramakrishnan,
Defining Polynomials p 1 (n) is the bound on the length of an input pair p 2 (n) is the bound on the running time of f p 3 (n) is a bound on the number.
Database Systems Logical Query Languages assoc. prof., dr. Vladimir Dimitrov web: is.fmi.uni-sofia.bg.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
DEDUCTIVE DATABASE.
Discrete Mathematics CS 2610 March 26, 2009 Skip: structural induction generalized induction Skip section 4.5.
Databases 1 8th lecture. Topics of the lecture Multivalued Dependencies Fourth Normal Form Datalog 2.
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti.
The Relational Model: Relational Calculus
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
1 Cluster Computing and Datalog Recursion Via Map-Reduce Seminaïve Evaluation Re-engineering Map-Reduce for Recursion.
Defining Programs, Specifications, fault-tolerance, etc.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
Reading and Writing Mathematical Proofs Spring 2015 Lecture 4: Beyond Basic Induction.
Induction Proof. Well-ordering A set S is well ordered if every subset has a least element. [0, 1] is not well ordered since (0,1] has no least element.
CS Introduction to AI Tutorial 8 Resolution Tutorial 8 Resolution.
Datalog –Another query language –cleaner – closer to a “logic” notation, prolog – more convenient for analysis – can express queries that are not expressible.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
MATH 224 – Discrete Mathematics
Lecture 4: Predicates and Quantifiers; Sets.
Tries Data Structure. Tries  Trie is a special structure to represent sets of character strings.  Can also be used to represent data types that are.
Row Types in SQL-3 Row types define types for tuples, and they can be nested. CREATE ROW TYPE AddressType{ street CHAR(50), city CHAR(25), zipcode CHAR(10)
Introduction to Finite Automata
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
ICS 321 Fall 2011 Algebraic and Logical Query Languages (ii) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at.
Deterministic Finite Automata
PREDICATES AND QUANTIFIERS COSC-1321 Discrete Structures 1.
Today’s Agenda  Quiz 4  Temporal Logic Formal Methods in Software Engineering1.
1 Part 5. Permission Rules for Two-Level Systems Controlling access (visibility or scope) of static references. Analogous to “private” in C/C++/Java.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Extensions of Datalog Wednesday, February 13, 2001.
1 Datalog with negation Adapted from slides by Jeff Ullman.
1 Finite Model Theory Lecture 9 Logics and Complexity Classes (cont’d)
CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.
Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman
Tests, Backtracking, and Recursion
Semantics of Datalog With Negation
Motivation for Datalog
Logic Based Query Languages
Datalog Inspired by the impedance mismatch in relational databases.
Rules Programs Negation
Presentation transcript:

1 Datalog Rules Programs Negation

2 Review of Logical If-Then Rules h(X,…) :- a(Y,…) & b(Z,…) & … head body subgoals “The head is true if all the subgoals are true.”

3 Terminology uHead and subgoals are atoms. uAn atom consists of a predicate (lower case) applied to zero or more arguments (upper case letters or constants).

4 Semantics uPredicates represent relations. uAn atom is true for given values of its variables iff the arguments form a tuple of the relation. uWhenever an assignment of values to all variables makes all subgoals true, the rule asserts that the resulting head is also true.

5 Example uWe shall develop rules that describe what is necessary to “make” a file. uThe predicates/relations:  source(F) = F is a “source” file.  includes(F,G) = F #includes G.  create(F,P,G) = F is created by applying process P to file G.

6 Example --- Continued uRules to define “view” req(X,Y) = file Y is required to create file X : req(F,F) :- source(F) req(F,G) :- includes(F,G) req(F,G) :- create(F,P,G) req(F,G) :- req(F,H) & req(H,G) G is required for F if there is some process P that creates F from G. G is required for F if there is some H such that H is required for F and G is required for H.

7 Why Not Just Use SQL? 1.Recursion is much easier to express in Datalog.  Viz. last rule for req. 2.Rules express things that go on in both FROM and WHERE clauses, and let us state some general principles (e.g., containment of rules) that are almost impossible to state correctly in SQL.

8 IDB/EDB uA predicate representing a stored relation is called EDB (extensional database). uA predicate representing a “view,” i.e., a defined relation that does not exist in the database is called IDB (intesional database). uHead is always IDB; subgoals may be IDB or EDB.

9 Datalog Programs uA collection of rules is a (Datalog) program. uEach program has a distinguished IDB predicate that represents the result of the program.  E.g., req in our example.

10 Extensions 1.Negated subgoals. 2.Constants as arguments. 3.Arithmetic subgoals.

11 Negated Subgoals uNOT in front of a subgoal means that an assignment of values to variables must make it false in order for the body to be true. uExample: cycle(F) :- req(F,F) & NOT source(F)

12 Constants as Arguments uWe use numbers, lower-case letters, or quoted strings to indicate a constant.  Example: req(“foo.c”, “stdio.h”) :- wNote that empty body is OK. wMixed constants and variables also OK.

13 Arithmetic Subgoals uComparisons like < may be thought of as infinite, binary relations. wHere, the set of all tuples (x,y) such that x<y. uUse infix notation for these predicates.  Example: composite(A) :- divides(B,A) & B > 1 & B != A

14 Evaluating Datalog Programs 1.Nonrecursive programs. 2.Naïve evaluation of recursive programs without IDB negation. 3.Seminaïve evaluation of recursive programs without IDB negation. uEliminates some redundant computation.

15 Safety uWhen we apply a rule to finite relations, we need to get a finite result. uSimple guarantee: safety = all variables appear in some nonnegated, relational (not arithmetic) subgoal of the body. wStart with the join of the nonnegated, relational subgoals and select/delete from there.

16 Examples: Nonsafety p(X) :- q(Y) bachelor(X) :- NOT married(X,Y) bachelor(X) :- person(X) & NOT married(X,Y) X is the problem Both X and Y are problems Y is still a problem

17 Nonrecursive Evaluation uIf (and only if!) a Datalog program is not recursive, then we can order the IDB predicates so that in any rule for p (i.e., p is the head predicate), the only IDB predicates in the body precede p.

18 Why? uConsider the dependency graph with: wNodes = IDB predicates. wArc p -> q iff there is a rule for p with q in the body. uCycle involving node p means p is recursive. uNo cycles: use topological order to evaluate predicates.

19 Applying Rules uTo evaluate an IDB predicate p : 1.Apply each rule for p to the current relations corresponding to its subgoals. u“Apply” = If an assignment of values to variables makes the body true, insert the tuple that the head becomes into the relation for p (no duplicates). 2.Take the union of the result for each p- rule.

20 Example p(X,Y) :- q(X,Z) & r(Z,Y) & Y<10 Q = {(1,2), (3,4)}; R = {(2,5), (4,9), (4,10), (6,7)} uAssignments making the body true: (X,Y,Z) = (1,5,2), (3,9,4) uSo P = {(1,5), (3,9)}.

21 Algorithm for Nonrecursive FOR (each predicate p in topological order) DO apply the rules for p to previously computed relations to compute relation P for p;

22 Naïve Evaluation for Recursive make all IDB relations empty; WHILE (changes to IDB) DO FOR (each IDB predicate p) DO evaluate p using current values of all relations;

23 Important Points uAs long as there is no negation of IDB subgoals, then each IDB relation “grows,” i.e., on each round it contains at least what it used to contain. uSince relations are finite, the loop must eventually terminate. uResult is the least fixedpoint (minimal model ) of rules.

24 Seminaïve Evaluation uKey idea: to get a new tuple for relation P on one round, the evaluation must use some tuple for some relation of the body that was obtained on the previous round. uMaintain  P = new tuples added to P on previous round. u“Differentiate” rule bodies to be union of bodies with one IDB subgoal made “ .”

25 Example (“make files”) r(F,F) :- s(F) r(F,G) :- i(F,G)) r(F,G) :- c(F,P,G) r(F,G) :- r(F,H) & r(H,F) uAssume EDB predicates s, i, c have relations S, I, C.

26 Example --- Continued uInitialize: R =  R =  #1=#2 (S  S)  I   1,3 (C) uRepeat until  R =  : 1.  R =  1,3 (R ⋈  R   R ⋈ R) 2.  R =  R - R 3.R = R   R

27 Function Symbols in Rules uExtends Datalog by allowing arguments built from constants, variables, and function names, recursively applied. uFunction names look like predicate names, but are allowed only within the arguments of atoms. wPredicates return true/false; functions return arbitrary values.

28 Example  Instead of a string argument like “101 Maple” we could use a term like addr(street(“Maple”), number(101)) uCompare with the XML term Maple 101

29 Another Example uPredicates: 1.isTree(X) = X is a binary tree. 2.label(L) = L is a node label. uFunctions: 1.node(A,L,R) = a tree with root labeled A, left subtree L, and right subtree R. 2.null = 0-ary function (constant) representing the empty tree.

30 Example --- Continued uThe rules: isTree(null) :- isTree(node(L,T1,T2)) :- label(L) & isTree(T1) & isTree(T2)

31 Example --- Concluded uAssume label(a) and label(b) are true. wI.e., a and b are in the relation for label. uInfer isTree(node(a,null,null)). uInfer isTree(node(b,null,node(a,null,null))). baba

32 Evaluation of Rules With Function Symbols uNaïve, seminaïve still work when there are no negated IDB subgoals. uThey both lead to the unique least fixedpoint (minimal model). uBut… this fixedpoint may not be reached in any finite number of rounds.  The isTree rules are an example.

33 Problems With IDB Negation uWhen rules have negated IDB subgoals, there can be several minimal models. uRecall: model = set of IDB facts, plus the given EDB facts, that make the rules true for every assignment of values to variables. wRule is true unless body is true and head is false.

34 Example: EDB red(X,Y) = the Red bus line runs from X to Y. green(X,Y) = the Green bus line runs from X to Y.

35 Example: IDB greenPath(X,Y) = you can get from X to Y using only Green buses. monopoly(X,Y) = Red has a bus from X to Y, but you can’t get there on Green, even changing buses.

36 Example: Rules greenPath(X,Y) :- green(X,Y) greenPath(X,Y) :- greenPath(X,Z) & greenPath(Z,Y) monopoly(X,Y) :- red(X,Y) & NOT greenPath(X,Y)

37 EDB Data red(1,2), red(2,3), green(1,2) 123

38 Two Minimal Models 1.EDB + greenPath(1,2) + monopoly(2,3) 2.EDB + greenPath(1,2) + greenPath(2,3) + greenPath(1,3) greenPath(X,Y) :- green(X,Y) greenPath(X,Y) :- greenPath(X,Z) & greenPath(Z,Y) monopoly(X,Y) :- red(X,Y) & NOT greenPath(X,Y) 123

39 Stratified Models 1.Dependency graph describes how IDB predicates depend negatively on each other. 2.Stratified Datalog = no recursion involving negation. 3.Stratified model is a particular model that “makes sense” for stratified Datalog programs.

40 Dependency Graph uNodes = IDB predicates. uArc p -> q iff there is a rule for p that has a subgoal with predicate q. uArc p -> q labeled – iff there is a subgoal with predicate q that is negated.

41 Monopoly Example monopoly greenPath --

42 Another Example: “Win” win(X) :- move(X,Y) & NOT win(Y) uRepresents games like Nim where you win by forcing your opponent to a position where they have no move.

43 Dependency Graph for “Win” win --

44 Strata uThe stratum of an IDB predicate is the largest number of –’s on a path from that predicate, in the dependency graph. uExamples: monopoly greenPath -- win -- Infinite stratum Stratum 0 Stratum 1

45 Stratified Programs uIf all IDB predicates have finite strata, then the Datalog program is stratified. uIf any IDB predicate has the infinite stratum, then the program is unstratified, and no stratified model exists.

46 Stratified Model uEvaluate strata 0, 1,… in order. uIf the program is stratified, then any negated IDB subgoal has already had its relation evaluated. wSafety assures that we can “subtract it from something.” wTreat it as EDB. uResult is the stratified model.

47 Examples  For “Monopoly,” greenPath is in stratum 0: compute it (the transitive closure of green ).  Then, monopoly is in stratum 1: compute it by taking the difference of red and greenPath. uResult is first model proposed. u“Win” is not stratified, thus no stratified model.