1 Datalog Logical Rules Recursion SQL-99 Recursion.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Union, Intersection, Difference (subquery) UNION (subquery) produces the union of the two relations. Similarly for INTERSECT, EXCEPT = intersection and.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
1 Extended Conjunctive Queries Unions Arithmetic Negation.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
1 Introduction to SQL Select-From-Where Statements Multirelation Queries Subqueries.
Engineering IgTechnology Maggioli Informatica Micron Technology Neta Nous Informatica Nikesoft SED Siemens CNX Taiprora Telespazio Datalog S. Costantini.
1 541: Relational Calculus. 2 Relational Calculus  Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).  Calculus.
1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Deductive Databases Chapter 25.
SQL CSET 3300.
1 Database Systems Relations as Bags Grouping and Aggregation Database Modification.
1 Introduction to SQL Multirelation Queries Subqueries Slides are reused by the approval of Jeffrey Ullman’s.
1 Datalog Logical Rules Recursion SQL-99 Recursion.
Winter 2002Arthur Keller – CS 18015–1 Schedule Today: Feb. 28 (TH) u Datalog and SQL Recursion, ODL. u Read Sections , Project Part 6.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
Winter 2002Arthur Keller – CS 1809–1 Schedule Today: Jan. 31 (TH) u Constraints. u Read Sections , Project Part 3 due. Feb. 5 (T) u Triggers,
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #3.
1 Datalog Rules Programs Negation. 2 Review of Logical If-Then Rules h(X,…) :- a(Y,…) & b(Z,…) & … head body subgoals “The head is true if all the subgoals.
1 Semantics of Datalog With Negation Local Stratification Stable Models Well-Founded Models.
1 SQL/PSM Procedures Stored in the Database General-Purpose Programming.
Embedded SQL Direct SQL is rarely used: usually, SQL is embedded in some application code. We need some method to reference SQL statements. But: there.
Winter 2002Arthur Keller – CS 1807–1 Schedule Today: Jan. 24 (TH) u Subqueries, Grouping and Aggregation. u Read Sections Project Part 2 due.
Logical Rules Recursion
1 More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
1 Datalog Logical Rules Recursion. 2 Logic As a Query Language uIf-then logical rules have been used in many systems. wMost important today: EII (Enterprise.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Deductive Databases Chapter 25
Rutgers University Relational Calculus 198:541 Rutgers University.
Database Systems Logical Query Languages assoc. prof., dr. Vladimir Dimitrov web: is.fmi.uni-sofia.bg.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
SCUHolliday - coen 1789–1 Schedule Today: u Constraints, assertions, triggers u Read Sections , 7.4. Next u Triggers, PL/SQL, embedded SQL, JDBC.
Databases 1 8th lecture. Topics of the lecture Multivalued Dependencies Fourth Normal Form Datalog 2.
Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti.
The Relational Model: Relational Calculus
Winter 2006Keller, Ullman, Cushing9–1 Constraints Commercial relational systems allow much more “fine-tuning” of constraints than do the modeling languages.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Databases 1 Second lecture.
Himanshu GuptaCSE 532-SQL-1 SQL. Himanshu GuptaCSE 532-SQL-2 Why SQL? SQL is a very-high-level language, in which the programmer is able to avoid specifying.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
SCUHolliday - coen 1787–1 Schedule Today: u Subqueries, Grouping and Aggregation. u Read Sections Next u Modifications, Schemas, Views. u Read.
More SQL (and Relational Algebra). More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
Security and User Authorization in SQL. Lu Chaojun, SJTU 2 Security Two aspects: –Users only see the data they’re supposed to; –Guard against malicious.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
SCUHolliday - coen 1789–1 Schedule Today: u Constraints, assertions, triggers u Read Sections , 7.4. Next u Embedded SQL, JDBC. u Read Sections.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Extensions of Datalog Wednesday, February 13, 2001.
1 Datalog with negation Adapted from slides by Jeff Ullman.
1 Data Modification with SQL CREATE TABLE, INSERT, DELETE, UPDATE Slides from
1 Introduction to Database Systems, CS420 SQL JOIN, Aggregate, Grouping, HAVING and DML Clauses.
Slides are reused by the approval of Jeffrey Ullman’s
Outerjoins, Grouping/Aggregation Insert/Delete/Update
Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman
Databases : More about SQL
CPSC-310 Database Systems
Schedule Today: Next After that Subqueries, Grouping and Aggregation.
CPSC-608 Database Systems
Semantics of Datalog With Negation
CMSC-461 Database Management Systems
Logic Based Query Languages
Datalog Inspired by the impedance mismatch in relational databases.
CPSC-608 Database Systems
More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation
Instructor: Zhe He Department of Computer Science
Select-From-Where Statements Multirelation Queries Subqueries
Rules Programs Negation
Presentation transcript:

1 Datalog Logical Rules Recursion SQL-99 Recursion

2 Logic As a Query Language uIf-then logical rules have been used in many systems. wMost important today: EII (Enterprise Information Integration). uNonrecursive rules are equivalent to the core relational algebra. uRecursive rules extend relational algebra --- have been used to add recursion to SQL-99.

3 A Logical Rule uOur first example of a rule uses the relations Frequents(drinker,bar), Likes(drinker,beer), and Sells(bar,beer,price). uThe rule is a query asking for “happy” drinkers --- those that frequent a bar that serves a beer that they like.

4 Anatomy of a Rule Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) Body = “antecedent” = AND of subgoals. Head = “consequent,” a single subgoal Read this symbol “if”

5 Subgoals Are Atoms uAn atom is a predicate, or relation name with variables or constants as arguments. uThe head is an atom; the body is the AND of one or more atoms. uConvention: Predicates begin with a capital, variables begin with lower-case.

6 Example: Atom Sells(bar, beer, p) The predicate = name of a relation Arguments are variables

7 Interpreting Rules uA variable appearing in the head is called distinguished ; otherwise it is nondistinguished. uRule meaning: The head is true of the distinguished variables if there exist values of the nondistinguished variables that make all subgoals of the body true.

8 Example: Interpretation Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) Distinguished variable Nondistinguished variables Interpretation: drinker d is happy if there exist a bar, a beer, and a price p such that d frequents the bar, likes the beer, and the bar sells the beer at price p.

9 Arithmetic Subgoals uIn addition to relations as predicates, a predicate for a subgoal of the body can be an arithmetic comparison. wWe write such subgoals in the usual way, e.g.: x < y.

10 Example: Arithmetic uA beer is “cheap” if there are at least two bars that sell it for under $2. Cheap(beer) <- Sells(bar1,beer,p1) AND Sells(bar2,beer,p2) AND p1 < 2.00 AND p2 bar2

11 Negated Subgoals uWe may put NOT in front of a subgoal, to negate its meaning. uExample: Think of Arc(a,b) as arcs in a graph. wS(x,y) says the graph is not transitive from x to y ; i.e., there is a path of length 2 from x to y, but no arc from x to y. S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y)

12 Safe Rules uA rule is safe if: 1.Each distinguished variable, 2.Each variable in an arithmetic subgoal, 3.Each variable in a negated subgoal, also appears in a nonnegated, relational subgoal. uWe allow only safe rules.

13 Example: Unsafe Rules uEach of the following is unsafe and not allowed: 1.S(x) <- R(y) 2.S(x) <- R(y) AND NOT R(x) 3.S(x) <- R(y) AND x < y uIn each case, an infinity of x ’s can satisfy the rule, even if R is a finite relation.

14 Algorithms for Applying Rules uTwo approaches: 1.Variable-based : Consider all possible assignments to the variables of the body. If the assignment makes the body true, add that tuple for the head to the result. 2.Tuple-based : Consider all assignments of tuples from the non-negated, relational subgoals. If the body becomes true, add the head’s tuple to the result.

15 Example: Variable-Based S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) uArc(1,2) and Arc(2,3) are the only tuples in the Arc relation. uOnly assignments to make the first subgoal Arc(x,z) true are: 1.x = 1; z = 2 2.x = 2; z = 3

16 Example: Variable-Based; x=1, z=2 S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) is the only value of y that makes all three subgoals true. Makes S(1,3) a tuple of the answer

17 Example: Variable-Based; x=2, z=3 S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) No value of y makes Arc(3,y) true. Thus, no contribution to the head tuples; S = {(1,3)}

18 Tuple-Based Assignment uStart with the non-negated, relational subgoals only. uConsider all assignments of tuples to these subgoals. wChoose tuples only from the corresponding relations. uIf the assigned tuples give a consistent value to all variables and make the other subgoals true, add the head tuple to the result.

19 Example: Tuple-Based S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) Arc(1,2), Arc(2,3) uFour possible assignments to first two subgoals: Arc(x,z)Arc(z,y) (1,2) (1,2) (2,3) (2,3) (1,2) (2,3) Only assignment with consistent z-value. Since it also makes NOT Arc(x,y) true, add S(1,3) to result.

20 Datalog Programs uA Datalog program is a collection of rules. uIn a program, predicates can be either 1.EDB = Extensional Database = stored table. 2.IDB = Intensional Database = relation defined by rules. wNever both! No EDB in heads.

21 Evaluating Datalog Programs uAs long as there is no recursion, we can pick an order to evaluate the IDB predicates, so that all the predicates in the body of its rules have already been evaluated. uIf an IDB predicate has more than one rule, each rule contributes tuples to its relation.

22 Example: Datalog Program uUsing EDB Sells(bar, beer, price) and Beers(name, manf), find the manufacturers of beers Joe doesn’t sell. JoeSells(b) <- Sells(’Joe’’s Bar’, b, p) Answer(m) <- Beers(b,m) AND NOT JoeSells(b)

23 Expressive Power of Datalog uWithout recursion, Datalog can express all and only the queries of core relational algebra. wThe same as SQL select-from-where, without aggregation and grouping. uBut with recurson, Datalog can express more than these languages. uYet still not Turing-complete.

24 Recursive Example uEDB: Par(c,p) = p is a parent of c. uGeneralized cousins: people with common ancestors one or more generations back: Sib(x,y) y Cousin(x,y) <- Sib(x,y) Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp)

25 Definition of Recursion uForm a dependency graph whose nodes = IDB predicates. uArc X ->Y if and only if there is a rule with X in the head and Y in the body. uCycle = recursion; no cycle = no recursion.

26 Example: Dependency Graphs Cousin Sib Answer JoeSells Recursive Nonrecursive

27 Evaluating Recursive Rules uThe following works when there is no negation: 1.Start by assuming all IDB relations are empty. 2.Repeatedly evaluate the rules using the EDB and the previous IDB, to get a new IDB. 3.End when no change to IDB.

28 The “Naïve” Evaluation Algorithm Start: IDB = 0 Apply rules to IDB, EDB Change to IDB? no yes done

29 Example: Evaluation of Cousin uWe’ll proceed in rounds to infer Sib facts (red) and Cousin facts (green). uRemember the rules: Sib(x,y) y Cousin(x,y) <- Sib(x,y) Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp)

30 Seminaive Evaluation uSince the EDB never changes, on each round we only get new IDB tuples if we use at least one IDB tuple that was obtained on the previous round. uSaves work; lets us avoid rediscovering most known facts. wA fact could still be derived in a second way.

31 Par Data: Parent Above Child adbcefghjkiadbcefghjki Round 1 Round 2 Round 4 Round 3

32 Recursion Plus Negation u“Naïve” evaluation doesn’t work when there are negated subgoals. uIn fact, negation wrapped in a recursion makes no sense in general. uEven when recursion and negation are separate, we can have ambiguity about the correct IDB relations.

33 Stratified Negation uStratification is a constraint usually placed on Datalog with recursion and negation. uIt rules out negation wrapped inside recursion. uGives the sensible IDB relations when negation and recursion are separate.

34 Problematic Recursive Negation P(x) <- Q(x) AND NOT P(x) EDB: Q(1), Q(2) Initial:P = { } Round 1:P = {(1), (2)} Round 2:P = { } Round 3:P = {(1), (2)}, etc., etc. …

35 Strata uIntuitively, the stratum of an IDB predicate P is the maximum number of negations that can be applied to an IDB predicate used in evaluating P. uStratified negation = “finite strata.” uNotice in P(x) <- Q(x) AND NOT P(x), we can negate P an infinite number of times deriving P(x).

36 Stratum Graph uTo formalize strata use the stratum graph : wNodes = IDB predicates. wArc A ->B if predicate A depends on B. wLabel this arc “–” if the B subgoal is negated.

37 Stratified Negation Definition uThe stratum of a node (predicate) is the maximum number of – arcs on a path leading from that node. uA Datalog program is stratified if all its IDB predicates have finite strata.

38 Example P(x) <- Q(x) AND NOT P(x) --P

39 Another Example uEDB = Source(x), Target(x), Arc(x,y). uRules for “targets not reached from any source”: Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y,x) NoReach(x) <- Target(x) AND NOT Reach(x)

40 The Stratum Graph NoReach Reach -- Stratum 0: No – arcs on any path out. Stratum 1: <= 1 arc on any path out.

41 Models uA model is a choice of IDB relations that, with the given EDB relations makes all rules true regardless of what values are substituted for the variables. wRemember: a rule is true whenever its body is false. wBut if the body is true, then the head must be true as well.

42 Minimal Models uWhen there is no negation, a Datalog program has a unique minimal model (one that does not contain any other model). uBut with negation, there can be several minimal models. uThe stratified model is the one that “makes sense.”

43 The Stratified Model uWhen the Datalog program is stratified, we can evaluate IDB predicates lowest- stratum-first. uOnce evaluated, treat it as EDB for higher strata.

44 Example: Multiple Models Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y,x) NoReach(x) <- Target(x) AND NOT Reach(x) 1234 Source Target Target Arc Stratum 0: Reach(1), Reach(2) Stratum 1: NoReach(3)

45 Example: Multiple Models Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y,x) NoReach(x) <- Target(x) AND NOT Reach(x) 1234 Source Target Target Arc Another model! Reach(1), Reach(2), Reach(3), Reach(4); NoReach is empty.

46 SQL-99 Recursion uDatalog recursion has inspired the addition of recursion to the SQL-99 standard. uTrickier, because SQL allows grouping- and-aggregation, which behaves like negation and requires a more complex notion of stratification.

47 Form of SQL Recursive Queries WITH Rule = [RECURSIVE] ( ) AS

48 Example: SQL Recursion uFind Sally’s cousins, using SQL like the recursive Datalog example. uPar(child,parent) is the EDB. WITH Sib(x,y) AS SELECT p1.child, p2.child FROM Par p1, Par p2 WHERE p1.parent = p2.parent AND p1.child <> p2.child; Like Sib(x,y) <- Par(x,p) AND Par(y,p) AND x <> y

49 Example: SQL Recursion WITH … RECURSIVE Cousin(x,y) AS (SELECT * FROM Sib) UNION (SELECT p1.child, p2.child FROM Par p1, Par p2, Cousin WHERE p1.parent = Cousin.x AND p2.parent = Cousin.y); Reflects Cousin(x,y) <- Sib(x,y) Reflects Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp)

50 Example: SQL Recursion uWith those definitions, we can add the query, which is about the “temporary view” Cousin(x,y): SELECT y FROM Cousin WHERE x = ‘Sally’;

51 Plan to Explain Legal SQL Recursion 1.Define “monotone,” a generalization of “stratified.” 2.Generalize stratum graph to apply to SQL. 3.Define proper SQL recursions in terms of the stratum graph.

52 Monotonicity uIf relation P is a function of relation Q (and perhaps other relations), we say P is monotone in Q if inserting tuples into Q cannot cause any tuple to be deleted from P. uExamples: wP = Q UNION R. wP = SELECT a =10 (Q ).

53 Example: Nonmonotonicity uIf Sells(bar,beer,price) is our usual relation, then the result of the query: SELECT AVG(price) FROM Sells WHERE bar = ’Joe’’s Bar’; is not monotone in Sells. uInserting a Joe’s-Bar tuple into Sells usually changes the average price and thus deletes the old average price.

54 SQL Stratum Graph uNodes = 1.IDB relations declared in WITH clause. 2.Subqueries in the body of the “rules.” wIncludes subqueries at any level of nesting.

55 SQL Stratum Graph uArcs P ->Q : 1.P is a rule head and Q is a relation in the FROM list (not of a subquery). 2.P is a rule head and Q is an immediate subquery of that rule. 3.P is a subquery, and Q is a relation in its FROM or an immediate subquery (like 1 and 2). wPut “–” on an arc if P is not monotone in Q. wStratified SQL = finite #’s of –’s on paths.

56 Example: Stratum Graph uIn our Cousin example, the structure of the rules was: Sib = … Cousin = ( … FROM Sib ) UNION ( … FROM Cousin … ) Subquery S1 Subquery S2

57 The Graph Sib S2S1 Cousin No “–” at all, so surely stratified.

58 Nonmonotone Example uChange the UNION in the Cousin example to EXCEPT: Sib = … Cousin = ( … FROM Sib ) EXCEPT ( … FROM Cousin … ) Subquery S1 Subquery S2 Inserting a tuple into S2 Can delete a tuple from Cousin

59 The Graph Sib S2S1 Cousin -- An infinite number of –’s exist on cycles involving Cousin and S2.

60 NOT Doesn’t Mean Nonmonotone uNot every NOT means the query is nonmonotone. wWe need to consider each case separately. uExample: Negating a condition in a WHERE clause just changes the selection condition. wBut all selections are monotone.

61 Example: Revised Cousin RECURSIVE Cousin AS (SELECT * FROM Sib) UNION (SELECT p1.child, p2.child FROM Par p1, Par p2, Cousin WHERE p1.parent = Cousin.x AND NOT (p2.parent = Cousin.y) ); Revised subquery S2 The only difference

62 S2 Still Monotone in Cousin uIntuitively, adding a tuple to Cousin cannot delete from S2. uAll former tuples in Cousin can still work with Par tuples to form S2 tuples. uIn addition, the new Cousin tuple might even join with Par tuples to add to S2.