Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Union, Intersection, Difference (subquery) UNION (subquery) produces the union of the two relations. Similarly for INTERSECT, EXCEPT = intersection and.
Relational Calculus and Datalog
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Winter 2002Arthur Keller – CS 1806–1 Schedule Today: Jan. 22 (T) u SQL Queries. u Read Sections Assignment 2 due. Jan. 24 (TH) u Subqueries, Grouping.
SQL CSET 3300.
1 Database Systems Relations as Bags Grouping and Aggregation Database Modification.
1 Datalog Logical Rules Recursion SQL-99 Recursion.
Winter 2002Arthur Keller – CS 18015–1 Schedule Today: Feb. 28 (TH) u Datalog and SQL Recursion, ODL. u Read Sections , Project Part 6.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
Winter 2002Arthur Keller – CS 1809–1 Schedule Today: Jan. 31 (TH) u Constraints. u Read Sections , Project Part 3 due. Feb. 5 (T) u Triggers,
Local-as-View Mediators Priya Gangaraju(Class Id:203)
1 Datalog Logical Rules Recursion SQL-99 Recursion.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #3.
Embedded SQL Direct SQL is rarely used: usually, SQL is embedded in some application code. We need some method to reference SQL statements. But: there.
Winter 2002Arthur Keller – CS 1807–1 Schedule Today: Jan. 24 (TH) u Subqueries, Grouping and Aggregation. u Read Sections Project Part 2 due.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Logical Rules Recursion
1 Datalog Logical Rules Recursion. 2 Logic As a Query Language uIf-then logical rules have been used in many systems. wMost important today: EII (Enterprise.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Deductive Databases Chapter 25
Database Systems Logical Query Languages assoc. prof., dr. Vladimir Dimitrov web: is.fmi.uni-sofia.bg.
Databases 1 8th lecture. Topics of the lecture Multivalued Dependencies Fourth Normal Form Datalog 2.
SCUHolliday6–1 Schedule Today: u SQL Queries. u Read Sections Next time u Subqueries, Grouping and Aggregation. u Read Sections And then.
Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti.
The Relational Model: Relational Calculus
Winter 2006Keller, Ullman, Cushing9–1 Constraints Commercial relational systems allow much more “fine-tuning” of constraints than do the modeling languages.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
CSE 544 Relational Calculus Lecture #2 January 11 th, Dan Suciu , Winter 2011.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
Datalog –Another query language –cleaner – closer to a “logic” notation, prolog – more convenient for analysis – can express queries that are not expressible.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Databases 1 Second lecture.
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
1 Introduction to SQL Database Systems. 2 Why SQL? SQL is a very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation.
Himanshu GuptaCSE 532-SQL-1 SQL. Himanshu GuptaCSE 532-SQL-2 Why SQL? SQL is a very-high-level language, in which the programmer is able to avoid specifying.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
SCUHolliday - coen 1787–1 Schedule Today: u Subqueries, Grouping and Aggregation. u Read Sections Next u Modifications, Schemas, Views. u Read.
More SQL (and Relational Algebra). More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
ICS 321 Fall 2011 Algebraic and Logical Query Languages (ii) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at.
Security and User Authorization in SQL. Lu Chaojun, SJTU 2 Security Two aspects: –Users only see the data they’re supposed to; –Guard against malicious.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Extensions of Datalog Wednesday, February 13, 2001.
1 Datalog with negation Adapted from slides by Jeff Ullman.
1 Introduction to Database Systems, CS420 SQL JOIN, Aggregate, Grouping, HAVING and DML Clauses.
Slides are reused by the approval of Jeffrey Ullman’s
CPSC-310 Database Systems
Outerjoins, Grouping/Aggregation Insert/Delete/Update
Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman
Databases : More about SQL
CPSC-310 Database Systems
Schedule Today: Next After that Subqueries, Grouping and Aggregation.
CPSC-608 Database Systems
Semantics of Datalog With Negation
CPSC-310 Database Systems
Cse 344 January 29th – Datalog.
CMSC-461 Database Management Systems
Logic Based Query Languages
Datalog Inspired by the impedance mismatch in relational databases.
CPSC-608 Database Systems
CPSC-608 Database Systems
More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation
Instructor: Zhe He Department of Computer Science
Select-From-Where Statements Multirelation Queries Subqueries
Rules Programs Negation
Presentation transcript:

Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical rules form the basis for many information-integration systems and applications.

Datalog First-order predicate logic to represent knowledge and also as a language for expressing operations on relations. Example: boss(E,M) :- manages(E,M) boss(E,M) :- boss(E,N) & manages(N,M) Substitute constant for the variables E,N,M and if the substitution makes the right side true, then the left side must also be true.

Datalog Example Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar) Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) Above is a rule. Left side = head. Right side = body = AND of subgoals. Head and subgoals are atoms. u Atom = predicate and arguments. u Predicate = relation name or arithmetic predicate, e.g. <. u Arguments are variables or constants. Subgoals (not head) may optionally be negated by NOT.

Meaning of Rules Head is true of its arguments if there exist values for local variables (those in body, not in head) that make all of the subgoals true. If no negation or arithmetic comparisons, just natural join the subgoals and project onto the head variables. Example Above rule equivalent to Happy(d) = π drinker (Frequents Likes Sells)

Evaluation of Rules Two, dual, approaches: 1. Variable-based: Consider all possible assignments of values to variables. If all subgoals are true, add the head to the result relation. 2. Tuple-based: Consider all assignments of tuples to subgoals that make each subgoal true. If the variables are assigned consistent values, add the head to the result. Example: Variable-Based Assignment S(x,y) <- R(x,z) AND R(z,y) AND NOT R(x,y) R = AB 12 23

Only assignments that make first subgoal true: 1. x  1, z  x  2, z  3. In case (1), y  3 makes second subgoal true. Since (1,3) is not in R, the third subgoal is also true. u Thus, add (x,y) = (1,3) to relation S. In case (2), no value of y makes the second subgoal true. Thus, S = AB 13

Example: Tuple-Based Assignment Trick: start with the positive (not negated), relational (not arithmetic) subgoals only. S(x,y) <- R(x,z) AND R(z,y) AND NOT R(x,y) R = AB Four assignments of tuples to subgoals: R(x,z)R(z,y)(1,2) (1,2)(2,3) (2,3)(1,2)(2,3) Only the second gives a consistent value to z. That assignment also makes NOT R(x,y) true. Thus, (1,3) is the only tuple for the head.

Datalog Programs A collection of rules is a Datalog program. Predicates/relations divide into two classes: u EDB = extensional database = relation stored in DB. u IDB = intensional database = relation defined by one or more rules. A predicate must be IDB or EDB, not both. u Thus, an IDB predicate can appear in the body or head of a rule; EDB only in the body.

Example Convert the following SQL (Find the manufacturers of the beers Joe sells): Beers(name, manf) Sells(bar, beer, price) SELECT manf FROM Beers WHERE name IN( SELECT beer FROM Sells WHERE bar = 'Joe''s Bar' ); to a Datalog program. JoeSells(b) <- Sells('Joe''s Bar', b, p) Answer(m) <- JoeSells(b) AND Beers(b,m) Note: Beers, Sells = EDB; JoeSells, Answer = IDB.

sibling(X,Y) :- parent(X,Z) & parent(Y,Z) & X notequalto Y. cousin(X,Y) :- parent(X,Xp) & parent(Y,Yp) & sibling(Xp,Yp). cousin(X,Y) :- parent(X,Xp) & parent(Y,Yp) & cousin(Xp,Yp). related(X,Y) :- sibling(X,Y). related(X,Y) :- related(X,Z) & parent(Y,Z). related(X,Y) :- related(Z,Y) & parent(X,Z).

Safety A rule can make no sense if variables appear in funny ways. Examples S(x) <- R(y) S(x) <- NOT R(x) S(x) <- R(y) AND x < y In each of these cases, the result is infinite, even if the relation R is finite. To make sense as a database operation, we need to require three things of a variable x (= definition of safety). If x appears in either 1. The head, 2. A negated subgoal, or 3. An arithmetic comparison, then x must also appear in a nonnegated, “ordinary” (relational) subgoal of the body. We insist that rules be safe, henceforth.

Safety (Contd.) Avoid rules that create infinite relations from finite ones by insisting that each variable appearing in the rule be “limited.” Formally define limited variable as: 1.Any variable that appears as an argument in an ordinary predicate of the body; 2.Any variable X that appears in a subgoal X = a or a = X, where a is a constant; 3.Variable X is limited if it appears in a subgoal X = Y or Y = X, where Y is a variable already to be limited.

Safety (Contd.) P(X,Y) :- q(X,Z) & W = a & Y = W. X and Z are limited by rule (1) because of the first subgoal in the body. W is limited by the rule (2) because of the second subgoal, and therefore (3) tell us Y is limited because of the third subgoal.

Evaluating Nonrecursive Rules Involves two steps: 1.Compute the relation defined by a rule body 2.Compute the relation for the nonrecursive predicate (head of the rule body) Algorithm 3.1: compute a relation for a rule body using relational algebra operations. Algorithm 3.2: evaluating nonrecursive rules using relational algebra operations. (refer to handouts)

Rectified Rules Before applying Alg. 3.2, we rectify the rules. The purpose of rectifying the rules is to represent the rule head of predicate p to be identical and of the form p(X1,.. Xk) for distinct variables X1,.., Xk. Consider all rules with p in the head, compute the relations for these rules, project onto the variables appearing in the heads and take the union.

Rectification (Contd.) Example: consider the predicate p defined by the rules p(a,X,Y) :- r(X,Y). p(X,Y,X) :- r(Y,X). We rectify these rules by making both heads be p(U,V,W) and adding subgoals as follows. p(U,V,W) :- r(X,Y) & U=a & V=X & W=Y. p(U,V,W) :- r(Y,X) & U=X & V=Y & W=X. Next, substituting for X, Y one of the new variables U,V, or W, as appropriate, we get p(U,V,W) :- r(V,W) & U=a. p(U,V,W) :- r(V,U) & W=U.

Expressive Power of Datalog Nonrecursive Datalog = (classical) relational algebra. u See discussion in text. Datalog simulates SQL select-from-where without aggregation and grouping. Recursive Datalog expresses queries that cannot be expressed in SQL. But none of these languages have full expressive power (Turing completeness).

Recursion IDB predicate P depends on predicate Q if there is a rule with P in the head and Q in a subgoal. Draw a graph: nodes = IDB predicates, arc P  Q means P depends on Q. Cycles if and only if recursive. Recursive Example Sib(x,y) <- Par(x,p) AND Par(y,p) AND x <> y Cousin(x,y) <- Sib(x,y) Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp)

Iterative Fixed-Point Evaluates Recursive Rules Start IDB = ø Change to IDB? Apply rules to IDB, EDB yes no done

Example EDB Par = Note, because of symmetry, Sib and Cousin facts appear in pairs, so we shall mention only (x,y) when both (x,y) and (y,x) are meant. ad ecb hgf i kj

SibCousin Initial  Round 1(b,c), (c,e)  add:(g,h), (j,k) Round 2(b,c), (c,e) add:(g,h), (j,k) Round 3(f,g), (f,h) add:(g,i), (h,i) (i,k) Round 4(k,k) add:(i,j)

Another example path(X,Y) :- arc(X,Y). path(X,Y) :- path(X,Z) & path(Z,Y). Datalog equation for the relation P corresponding to the path predicate: P(X,Y) = A(X,Y) union π X,Y (P(X,Z) natural join P(Z,Y)) Find a solution to the equation if A ={(1,2), (2,3)}.

Stratified Negation Negation wrapped inside a recursion makes no sense. Even when negation and recursion are separated, there can be ambiguity about what the rules mean, and some one meaning must be selected. Stratified negation is an additional restraint on recursive rules (like safety) that solves both problems: 1. It rules out negation wrapped in recursion. 2. When negation is separate from recursion, it yields the intuitively correct meaning of rules (the stratified model).

Problem with Recursive Negation Consider: P(x) <- Q(x) AND NOT P(x) Q = EDB = {1,2}. Compute IDB P iteratively? u Initially, P = . u Round 1: P = {1,2}. u Round 2: P = , etc., etc.

Problem (Contd.) p(X) :- r(X) & NOT q(X). P = R-Q q(X) :- r(X) & NOT p(X). Q = R-P Suppose R consists of a single tuple 1, R = {1}, S1 : P = 0 and Q = {1}. S2 : P = {1} and Q = 0. Both S1 and S2 are solutions to the equations P = R- Q and Q = R-P. Both are minimal fixed points and the rules don’t have a least fixed point.

Strata Intuitively: stratum of an IDB predicate = maximum number of negations you can pass through on the way to an EDB predicate. Must not be  in “stratified” rules. Define stratum graph: u Nodes = IDB predicates. u Arc P  Q if Q appears in the body of a rule with head P. u Label that arc “–” if Q is in a negated subgoal. Example P(x) <- Q(x) AND NOT P(x) P –

Example Which target nodes cannot be reached from any source node? Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y,x) NoReach(x) <- Target(x) AND NOT Reach(x) NoReach Reach –

Computing Strata Stratum of an IDB predicate A = maximum number of “–” arcs on any path from A in the stratum graph. Examples For first example, stratum of P is . For second example, stratum of Reach is 0; stratum of NoReach is 1. Stratified Negation A Datalog program is stratified if every IDB predicate has a finite stratum. Stratified Model If a Datalog program is stratified, we can compute the relations for the IDB predicates lowest-stratum-first.

Example Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y,x) NoReach(x) <- Target(x) AND NOT Reach(x) EDB:  Source = {1}.  Arc = {(1,2), (3,4), (4,3)}.  Target = {2,3}. First compute Reach = {1,2} (stratum 0). Next compute NoReach = {3} source target

SQL Recursion WITH stuff that looks like Datalog rules an SQL query about EDB, IDB Rule = [RECURSIVE] R( ) AS SQL query

Example Find Sally’s cousins, using EDB Par(child, parent). WITH Sib(x,y) AS SELECT p1.child, p2,child FROM Par p1, Par p2 WHERE p1.parent = p2.parent AND p1.child <> p2.child, RECURSIVE Cousin(x,y) AS Sib UNION (SELECT p1.child, p2.child FROM Par p1, Par p2, Cousin WHERE p1.parent = Cousin.x AND p2.parent = Cousin.y ) SELECT y FROM Cousin WHERE x = 'Sally';

Plan for Describing Legal SQL Recursion Define “monotonicity,” a property that generalizes “stratification.” Generalize stratum graph to apply to SQL queries instead of Datalog rules.  (Non)monotonicity replaces NOT in subgoals. Define semantically correct SQL recursions in terms of stratum graph. Monotonicity If relation P is a function of relation Q (and perhaps other things), we say P is monotone in Q if adding tuples to Q cannot cause any tuple of P to be deleted.

Monotonicity Example In addition to certain negations, an aggregation can cause nonmonotonicity. Sells(bar, beer, price) SELECT AVG(price) FROM Sells WHERE bar = 'Joe''s Bar'; Adding to Sells a tuple that gives a new beer Joe sells will usually change the average price of beer at Joe’s. Thus, the former result, which might be a single tuple like (2.78) becomes another single tuple like (2.81), and the old tuple is lost.

Generalizing Stratum Graph to SQL Node for each relation defined by a “rule.” Node for each subquery in the “body” of a rule. Arc P  Q if u P is “head” of a rule, and Q is a relation appearing in the FROM list of the rule (not in the FROM list of a subquery), as argument of a UNION, etc. u P is head of a rule, and Q is a subquery directly used in that rule (not nested within some larger subquery). u P is a subquery, and Q is a relation or subquery used directly within P [analogous to (a) and (b) for rule heads]. Label the arc – if P is not monotone in Q. Requirement for legal SQL recursion: finite strata only.

Example For the Sib/Cousin example, there are three nodes: Sib, Cousin, and SQ (the second term of the union in the rule for Cousin ). No nonmonotonicity, hence legal. SibCousin SQ

A Nonmonotonic Example Change the UNION to EXCEPT in the rule for Cousin. RECURSIVE Cousin(x,y) AS Sib EXCEPT (SELECT p1.child, p2.child FROM Par p1, Par p2, Cousin WHERE p1.parent = Cousin.x AND p2.parent = Cousin.y ) Now, adding to the result of the subquery can delete Cousin facts; i.e., Cousin is nonmonotone in SQ. Infinite number of –’s in cycle, so illegal in SQL. SibCousin SQ

Another Example: NOT Doesn’t Mean Nonmonotone Leave Cousin as it was, but negate one of the conditions in the where-clause. RECURSIVE Cousin(x,y) AS Sib UNION (SELECT p1.child, p2.child FROM Par p1, Par p2, Cousin WHERE p1.parent = Cousin.x AND NOT (p2.parent = Cousin.y) ) You might think that SQ depends negatively on Cousin, but it doesn’t.  If I add a new tuple to Cousin, all the old tuples still exist and yield whatever tuples in SQ they used to yield.  In addition, the new Cousin tuple might combine with old p1 and p2 tuples to yield something new.