Presentation is loading. Please wait.

Presentation is loading. Please wait.

Handout 2CIS 550, Fall 20011 CIS 550, Fall 2001 Handout 2. SQL, Relational Calculus and Datalog.

Similar presentations


Presentation on theme: "Handout 2CIS 550, Fall 20011 CIS 550, Fall 2001 Handout 2. SQL, Relational Calculus and Datalog."— Presentation transcript:

1 Handout 2CIS 550, Fall 20011 CIS 550, Fall 2001 Handout 2. SQL, Relational Calculus and Datalog

2 Handout 2CIS 550, Fall 20012 What we cannot compute with RA Recursive queries. Given a relation Parent(Parent, Child) compute the Ancestor relation. (Can do this in Datalog.) Aggregate operations. E.g. ``The number of climbers who have climbed `Last Tango' '' or ``The average age of Computing with non 1NF relations e.g. lists, arrays, multisets, nested relations.

3 Handout 2CIS 550, Fall 20013 Basic Query relation-list A list of relation names (possibly with a range-variable after each name). target-list A list of attributes of relations in relation-list. * can be used to denote all atts. qualification Comparisons (Attr op const or Attr1 op Attr2, where op is one of combined using AND, OR and NOT. DISTINCT (optional) keyword indicates that the answer should not contain duplicates. Default is that duplicates are not eliminated! SELECT [DISTINCT] target-list FROM relation-list WHERE qualification

4 Handout 2CIS 550, Fall 20014 Conceptual Evaluation Strategy Compute the product of relation-list Discard tuples that fail qualification Project over attributes in target-list If DISTINCT then eliminate duplicates This is probably a very bad way of executing the query, and a good query optimizer will use all sorts of tricks to find efficient strategies to compute the same answer.

5 Handout 2CIS 550, Fall 20015 Sample tables Routes: RId RName Grade Rating Height 1 Last Tango II 12 100 2 Garden Path I 2 60 3 The Sluice I 8 60 4 Picnic III 3 400 Climbers: Climbs: Cid CName Skill Age CId RId Date Duration 123 Edmund EXP 80 123 1 10/10/88 5 214 Arnold BEG 25 123 3 11/08/87 1 313 Bridget EXP 33 313 1 12/08/89 5 212 James MED 27 214 2 08/07/92 2 313 1 06/07/94 3

6 Handout 2CIS 550, Fall 20016 Select/project queries SELECT * FROM Routes WHERE Height < 200; RID RNAME GRADE RATING HEIGHT 1 Last Tango II 12 100 2 Garden Path I 2 60 3 The Sluice I 8 60 SELECT Grade, Height FROM Routes; GRADE HEIGHT I 100 I 60 III 400

7 Handout 2CIS 550, Fall 20017 Distinct Note that SQL did not eliminate duplicates. We need to request this explicitly. SELECT DISTINCT Grade, Height FROM Routes; GRAD HEIGHT I 60 II 100 III 400

8 Handout 2CIS 550, Fall 20018 String Matching Can be used in where clause. “_” denotes any character, “%” 0 or more characters. SELECT * FROM Routes WHERE RName LIKE 'L_%o' RId RName Grade Rating Height 1 Last Tango II 12 100

9 Handout 2CIS 550, Fall 20019 Arithmetic “as” can be used to label columns in the output; arithmetic can be used to compute results SELECT DISTINCT Grade, Height/10 as H FROM Routes; Grade H II 10 I 6 III 40

10 Handout 2CIS 550, Fall 200110 Set operations -- union SELECT CId FROM Climbers WHERE Age < 40 UNION SELECT CId FROM Climbs WHERE RID = 1 ; CID 123 212 214 313 Duplicates do not occur in the union.

11 Handout 2CIS 550, Fall 200111 The UNION ALL operator preserves duplicates SELECT Cid FROM Climbers WHERE Age < 40 UNION ALL SELECT Cid FROM Climbs WHERE RID = 1 ; CID 214 313 212 123 313

12 Handout 2CIS 550, Fall 200112 What does “union compatible” mean? SELECT CId FROM Climbers UNION SELECT RId FROM Routes; Ok SELECT CName FROM Climbers UNION SELECT RId FROM Routes; Error

13 Handout 2CIS 550, Fall 200113 Intersection and difference SELECT CId FROM Climbers WHERE Age > 40 INTERSECT SELECT CId FROM Climbs WHERE RId = 1 ; SELECT CId FROM Climbers WHERE Age < 40 MINUS SELECT CId FROM Climbs WHERE RId = 1 ; CID 212 214 CID 123

14 Handout 2CIS 550, Fall 200114 Nested queries We could also have written the previous queries as follows: SELECT CId FROM Climbers WHERE Age > 40 AND CId IN (SELECT CId FROM Climbs WHERE RId = 1) ; SELECT CId FROM Climbers WHERE Age < 40 AND CId NOT IN (SELECT CId FROM Climbs WHERE RId = 1) ;

15 Handout 2CIS 550, Fall 200115 Nested queries with correlation SELECT CId FROM Climbers c WHERE EXISTS (SELECT * FROM Climbs b WHERE c.CId=b.CId AND b.RID = 1); SELECT CId FROM Climbers c WHERE NOT EXISTS (SELECT * FROM Climbs b WHERE c.CId=b.CId); SELECT CId FROM Climbers c WHERE EXISTS UNIQUE (SELECT * FROM Climbs b WHERE c.CId=b.CId AND RID = 1);

16 Handout 2CIS 550, Fall 200116 More on set comparison ops Besides IN, NOT IN, EXISTS, NOT EXISTS, UNIQUE and NOT UNIQUE we can also say: ANY, ALL, where is any of What does the following mean in English? SELECT CName, Age FROM Climbers WHERE Age >= ALL (SELECT Age FROM Climbers) CName Age Edmund 80

17 Handout 2CIS 550, Fall 200117 Set comparison ops, cont. SELECT CName, Age FROM Climbers WHERE Age > ANY (SELECT Age FROM Climbers WHERE CName='Arnold') Cid CName Skill Age 123 Edmund EXP 80 313 Bridget EXP 33 212 James MED 27 What does the following mean in English?

18 Handout 2CIS 550, Fall 200118 Using expressions for relation names Consider the following query: “Find the names of climbers who have not climbed any route.” SELECT CName FROM (SELECT CId FROM Climbers MINUS SELECT CId FROM Climbs) Temp, Climbers WHERE Temp.CId = Climbers.CId; CNAME James

19 Handout 2CIS 550, Fall 200119 Products Note that the CID column name is duplicated in the output. SELECT * FROM Climbers,Climbs; CID CNAME SKILL AGE CID RID DAY DURATION 123 Edmund EXP 80 123 1 10-OCT-88 5 214 Arnold BEG 25 123 1 10-OCT-88 5 313 Bridget EXP 33 123 1 10-OCT-88 5 212 James MED 27 123 1 10-OCT-88 5 123 Edmund EXP 80 123 3 08-NOV-87 1 214 Arnold BEG 25 123 3 08-NOV-87 1...

20 Handout 2CIS 550, Fall 200120 Conditional join SELECT * FROM Climbers,Climbs WHERE Climbers.CId = Climbs.CId; CID CNAME SKIL AGE CID RID DAY DURATION 123 Edmund EXP 80 123 1 10-OCT-88 5 123 Edmund EXP 80 123 3 08-NOV-87 1 313 Bridget EXP 33 313 1 08-DEC-89 5 214 Arnold BEG 25 214 2 07-AUG-92 2 313 Bridget EXP 33 313 1 07-JUN-94 3

21 Handout 2CIS 550, Fall 200121 Example 1 The names of climbers who have climbed route 1. SELECT CName FROM Climbers, Climbs WHERE Climbers.CId = Climbs.CId AND RId= 1; CNAME Edmund Bridget

22 Handout 2CIS 550, Fall 200122 Example 2 The names of climbers who have climbed the route named “Last Tango”. SELECT CName FROM Climbers, Climbs, Routes WHERE Climbers.CId = Climbs.CId AND Routes.RId = Climbs.RID AND RName = 'Last Tango'; CNAME Edmund Bridget

23 Handout 2CIS 550, Fall 200123 Example 3 The IDs of climbers who have climbed the same route twice. Note the use of aliases for relations. SELECT C1.CId FROM Climbs C1, Climbs C2 WHERE C1.CId = C2.CId AND C1.RId = C2.RId AND (C1.Day <> C2.Day OR C1.DURATION <> C2.DURATION)); CID 313

24 Handout 2CIS 550, Fall 200124 Example 4 Recall: The names of climbers who have not climbed any route SELECT CName FROM (SELECT CId FROM Climbers MINUS SELECT CId FROM Climbs) Temp, Climbers WHERE Temp.CId = Climbers.CId; CNAME James

25 Handout 2CIS 550, Fall 200125 Example 4, cont. A simpler alternative: SELECT CName FROM Climbers WHERE CId NOT IN (SELECT CId FROM Climbs); CNAME James

26 Handout 2CIS 550, Fall 200126 Universal Quantification SELECT CId FROM Climbs c1 WHERE NOT EXISTS (SELECT RId  Routes not climbed FROM Routes r by c1. WHERE NOT EXISTS (SELECT * FROM Climbs c2 WHERE c1.CId=c2.CId and c2.RId=r.RId) The IDs of climbers who have climbed all routes.

27 Handout 2CIS 550, Fall 200127 Non-algebraic operations SQL has a number of operations that cannot be expressed in relational algebra. The first is to express arithmetic in queries. SELECT RName, Rating * Height AS Difficulty FROM Routes; RNAME DIFFICULTY Last Tango 1200 Garden Path 120 The Sluice 480 Picnic 1200

28 Handout 2CIS 550, Fall 200128 Arithmetic, cont Arithmetic (and other expressions) cannot be used at the top level. E.g. 2+2 isn't an SQL query. Question -- how would you get SQL to compute 2+2?

29 Handout 2CIS 550, Fall 200129 Counting Surprisingly, the answer to both of these is the following: SELECT COUNT(RId) FROM Routes; SELECT COUNT(Grade) FROM Routes; COUNT(GRADE) 4

30 Handout 2CIS 550, Fall 200130 Counting, cont. To fix this, we use the keyword “DISTINCT”: Can also use SUM, AVG, MIN and MAX. SELECT COUNT(DISTINCT Grade) FROM Routes; COUNT(GRADE) 3

31 Handout 2CIS 550, Fall 200131 Group by So far, these aggregate operators have been applied to all qualifying tuples. Sometimes we want to apply them to each of several groups of tuples. For example: “Print the number of routes in each grade.”

32 Handout 2CIS 550, Fall 200132 Group by Note that only the columns that appear in the GROUP BY statement and “aggregated” columns can appear in the output. So the following would generate an error. SELECT Grade, COUNT(*) FROM Routes GROUP BY Grade; GRADE COUNT(*) I 2 II 1 III 1 SELECT Grade, RName, COUNT(*) FROM Routes GROUP BY Grade;

33 Handout 2CIS 550, Fall 200133 Group by … having HAVING is to GROUP BY as WHERE is to FROM “HAVING” is used to restrict the groups that appear in the result. SELECT Height, AVG(Rating) FROM Routes GROUP BY Height HAVING Height < 300; HEIGHT AVG(RATING) 60 5 100 12

34 Handout 2CIS 550, Fall 200134 Another example SELECT Height, AVG(Rating) FROM Routes GROUP BY Height HAVING MAX(Rating) < 10; HEIGHT AVG(RATING) 60 5 400 3

35 Handout 2CIS 550, Fall 200135 Null Values The value of an attribute can be unknown (e.g., a rating has not been assigned) or inapplicable (e.g., no spouse). –SQL provides a special value null for such situations. The presence of null complicates many issues. E.g.: –Special operators needed to check if value is/is not null. –Is rating>8 true or false when rating is equal to null? What about AND, OR and NOT connectives? 3-valued logic (true, false and unknown). –Meaning of constructs must be defined carefully. (e.g., WHERE clause eliminates rows that don’t evaluate to true.)

36 Handout 2CIS 550, Fall 200136 Outer Join A variant of the join that relies on null values: Tuples of Climbers that do not match some tuple in Climbs would normally be excluded from the result; the “left” outer join preserves them with null values for the missing Climbs attributes. SELECT Climbers.CId, Climbs.RId FROM Climbers NATURAL LEFT OUTER JOIN Climbs

37 Handout 2CIS 550, Fall 200137 Result of left outer join CId CName Skill Age RId Date Duration 123 Edmund EXP 80 1 10/10/88 5 123 Edmund EXP 80 3 11/08/87 1 214 Arnold BEG 25 2 08/07/92 2 313 Bridget EXP 33 1 12/08/89 5 313 Bridget EXP 33 1 06/07/94 3 212 James MED 27    Null values can be disallowed in a query result by specifying NOT NULL.

38 Handout 2CIS 550, Fall 200138 Summary SQL is “relationally complete”: all of the operators of the relational algebra can be simulated. Additional features: string comparisons, set membership, arithmetic and grouping.

39 Handout 2CIS 550, Fall 200139 Views in SQL A view is a query with a name that can be used in SELECT statements. Note that ExpClimbers is not a stored relation! CREATE VIEW ExpClimbers AS SELECT CId, CName, Age FROM Climbers WHERE Skill=‘EXP’; SELECT CName FROM ExpClimbers WHERE Age<50;

40 Handout 2CIS 550, Fall 200140 Querying views The system would perform the following translation: This is done using the relational algebra “operator tree” representation of the query, and relational algebra equivalences. SELECT CName FROM ExpClimbers WHERE Age<50; is translated to SELECT CName FROM Climbers WHERE Skilll=‘EXP’ and Age<50;

41 Handout 2CIS 550, Fall 200141 The “how” of translation The operator tree for is expanded to  SELECT CName FROM ExpClimbers WHERE Age<50; ExpClimbers Climbers

42 Handout 2CIS 550, Fall 200142 Changing the database How do we initialize the database? How do we update and modify the database state? SQL supports an update language for insertions, deletions and modifications of tuples. –INSERT INTO R(A1,…,An) VALUES (V1,…,Vn); –DELETE FROM R WHERE ; –UPDATE R SET WHERE ;

43 Handout 2CIS 550, Fall 200143 Tuple insertion Recall our rock climbing database, with the following instance of Routes: To insert a new tuple into Routes: RId RName Grade Rating Height 1 Last Tango II 12 100 2 Garden Path I 2 60 3 The Sluice I 8 60 4 Picnic III 3 400 INSERT INTO Routes(RId, Rname, Grade, Rating, Height) VALUES (5, “Desperation”, III,12,600);

44 Handout 2CIS 550, Fall 200144 Tuple insertion, cont. Alternatively, we could omit the attributes since the order given matches the DDL for Routes: INSERT INTO Routes VALUES (5, “Desperation”, III,12,600); RId RName Grade Rating Height 1 Last Tango II 12 100 2 Garden Path I 2 60 3 The Sluice I 8 60 4 Picnic III 3 400 5 Desperation III 12 600

45 Handout 2CIS 550, Fall 200145 Set insertion Suppose we had the following relation and wanted to add all the routes with rating > 8: INSERT INTO HardClimbs(Route,Rating,FeetHigh) SELECT DISTINCT Rname, Grade, Rating, Height FROM Routes WHERE rating>8; HardClimbs: Route Rating FeetHigh SlimyClimb 9 200 The Sluice 8 60 Route Rating FeetHigh SlimyClimb 9 200 The Sluice 8 60 Last Tango 12 100

46 Handout 2CIS 550, Fall 200146 Deletion Deletion is set-oriented: the only way to delete a single tuple is to specify its key. Suppose we wanted to get rid of all tuples in HardClimbs that are in Routes: DELETE FROM HardClimbs WHERE Route in (SELECT Name FROM Routes) HardClimbs: Route Rating FeetHigh SlimyClimb 9 200

47 Handout 2CIS 550, Fall 200147 Modifying tuples Non-key values of a relation can be changed using UPDATE. Suppose we want to increase the age of all experienced climbers by 1: NOTE: SQL uses an “old-value” semantics. New values are calculated using the old state, not a partially modified state. UPDATE Climbers SET Age = Age+1 WHERE Skill = “EXP”;

48 Handout 2CIS 550, Fall 200148 Old-value semantics “Give a $1000 raise to every employee who earns less than their manager.” Old-value semantics: employees 1 and 3 are given a raise. Otherwise: employee 2 will get a raise if they are considered after employee 3 receives a raise! Emp Manager Salary 1 3 20,000 2 3 21,500 3  21,000

49 Handout 2CIS 550, Fall 200149 Modifying views Since the view definition is not stored, the view “changes” as the relations in the FROM clause change. We could also think of making changes to the view itself: Unfortunately, this particular view definition is not updatable! INSERT INTO ExpClimbers VALUES (7,‘Jean’, 48);

50 Handout 2CIS 550, Fall 200150 Modifying views, cont. This would imply the following insertion, since we are not given a value for skill: If the view were computed after this update, the new tuple would not appear because ‘EXP’=  does not evaluate to true! INSERT INTO ExpClimbers VALUES (7,‘Jean’, 48); INSERT INTO Climbers VALUES (7,‘Jean’, , 48);

51 Handout 2CIS 550, Fall 200151 An updatable view The problem with ExpClimbers was the projection which eliminated an attribute used to create the view. CREATE VIEW OldClimbers AS SELECT * FROM Climbers WHERE Age>40;

52 Handout 2CIS 550, Fall 200152 Deleting using views We may also want to delete a tuple in the view: What about views involving joins? DELETE FROM ExpClimbers WHERE Cname=‘Jeremy’; would translate to DELETE FROM Climbers WHERE Cname=‘Jeremy’; CREATE VIEW ClimbInfo AS SELECT B.Cid, B.Cname, RID, Date,Duration FROM Climbers B, Climbs C WHERE C.Cid=B.Cid

53 Handout 2CIS 550, Fall 200153 When is a view updatable? For a view to be updatable: –it must involve a single relation R –the WHERE clause must not involve R in a subquery –the SELECT clause must include enough attributes that the missing ones can be filled with  or default values.

54 Handout 2CIS 550, Fall 200154 Schema modification Requirements change over time, so it is useful to be able to add/delete columns, drop tables and drop views: –DROP TABLE Climbers; –DROP VIEW ExpClimbers; –ALTER TABLE Climbs ADD Weather CHAR(50); –ALTER TABLE Routes DROP Grade; Problem: Must validate changes against legacy applications and code! Views can be useful here.

55 Handout 2CIS 550, Fall 200155 Summary Views are useful for frequently executed queries and as a layer to shield applications from changes in the schema. SQL has an update language that allows set- oriented updates. Updates (insertions, deletions and modifications) change the database state.

56 Handout 2CIS 550, Fall 200156 Relational Calculus First-order logic (FOL) can also be thought of as a query language, and can be used in two ways: –Tuple relational calculus –Domain relational calculus The difference is the level at which variables are used: for attributes (domains) or for tuples. The calculus is non-procedural (declarative) as compared to the algebra.

57 Handout 2CIS 550, Fall 200157 Domain relational calculus Queries have form: { |p} where x 1,x 2, …, x n are domain variables and p is a predicate which may mention the variables x 1,x 2, …, x n Example: simple projection { |  RI,G,R.  Routes} Example: selection and projection: { |  RI,G,R.  Routes  G >5.5}

58 Handout 2CIS 550, Fall 200158 DRC examples, cont Join: { |  RI,RN,G,H,RI’,Da,Du.  Routes   Climbs  RI=RI’} We could also have written the above as: { |  RI,RN,G,H,Da,Du.  Routes   Climbs}

59 Handout 2CIS 550, Fall 200159 Predicate Logic - a quick review The syntax of predicate logic starts with variables, constants and predicates that can be built using a collection of boolean-valued operators (boolean expressions) Examples: 1=2, x  y, prime(x), contains(t,”Joe”). Precisely what operations are available depends on the domain and on the query language. For now we will assume the following boolean expressions: –  Rel, X op Y, X op constant, or constant op X, where op is , , , , ,  and X,Y,… are domain variables

60 Handout 2CIS 550, Fall 200160 Predicate Logic, cont. Starting with these basic predicates (also called atomic), we can build up new predicates by the following rules: –Logical connectives: If p and q are predicates, then so are p  q, p  q,  p, and p  q (x>2)  (x<4) (x>2)   (x>0) –Existential quantification: If p is a predicate, then so is  x.p  x. (x>2)  (x<4) –Universal quantification: If p is a predicate, then so is  x.p  x.x>2  x.  y.y>x

61 Handout 2CIS 550, Fall 200161 Logical Equivalences There are two logical equivalences that will be heavily used: –p  q   p  q (Whenever p is true, q must also be true.) –  x. p(x)   x.  p(x) (p is true for all x) The second will be especially important when we study SQL.

62 Handout 2CIS 550, Fall 200162 Free and bound variables A variable v is bound in a predicate p when p is of the form  v… or  v… A variable occurs free in p if it occurs in a position where it is not bound by an enclosing  or  Examples: –x is free in x>2 –x is bound in  x.x>y –x is free in (x>17)  (  x.x>2) Note that there are two occurrences of x in the last example.

63 Handout 2CIS 550, Fall 200163 Renaming variables When a variable is bound one can replace it with some other variable without altering the meaning of the expression, providing there are no name clashes Example:  x.x>2 is equivalent to  y.y>2

64 Handout 2CIS 550, Fall 200164 Some queries… Try the following examples: –The names and ages of climbers –The names and ages of climbers who have climbed route 214 –The names of climbers who have climbed “Last Tango” –The names of climbers who have climbed all routes with rating greater than 5.5 –The names of climbers who have climbed the same route twice

65 Handout 2CIS 550, Fall 200165 Safety There is a problem with what we have done so far. How should we treat a query like: { |   Climbers>} This presumably means the set of all tuples (of the appropriate type) that are not climbers, which is presumably an infinite set. A query is safe if no matter how we instantiate the relations, it always produces a finite answer. Unfortunately, safety (a semantic condition) is undecidable. That is, there is no program which can look at the syntax of a query and decide if it is safe. A more restrictive syntactic condition (domain independence) can be used.

66 Handout 2CIS 550, Fall 200166 Translating from RA to DRC Recall that the relational algebra consists of , , , x, -. We need to work our way through the structure of an RA expression, translating each possible form. Let TR[e] be the translation of RA expression e into DRC. Relation names: For the RA expression R, the DRC expression is { |  R}

67 Handout 2CIS 550, Fall 200167 Selection Suppose the RA expression is  c (e’), where e’ is another RA expression with TR[e’]= { | p} Then the translation of  c (e’) is { | p  C’}, where C’ is the condition obtained from C by replacing each attribute with the corresponding variable. Example: TR[  #1=#2  #4>2.5 R] (where R has arity 4) is { |  R  x 1 =x 2  x 4 >2.5}

68 Handout 2CIS 550, Fall 200168 Projection If TR[e]= { | p} then TR[  i 1,i 2,…,i m (e)]= { |  x j 1,x j 2, …, x j k.p}, where x j 1,x j 2, …, x j k are variables in x 1,x 2, …, x n that are not in x i 1,x i 2, …, x i m Example: With R as before,  #1,#3 (R)={ |  x 2,x 4.  R}

69 Handout 2CIS 550, Fall 200169 Union We know that R and S in R  S must be union compatible, so they must have the same arity. Therefore we can assume that for e 1  e 2, where e 1, e 2 are algebra expressions, TR[e 1 ]={ |p} and TR[e 2 ]={ |q}. Relabel the variables in the second so that TR[e 2 ]={ |q’}. This may involve relabeling bound variables in q to avoid clashes. Then TR[e1  e2]={ |p  q’}. Example: TR[R  S]= { |  R   S

70 Handout 2CIS 550, Fall 200170 Other binary operators Difference: The same conditions hold as for union. So TR[e 1 ]={ |p} and TR[e 2 ]={ |q}. Then TR[e 1 - e 2 ]= { |p  q} Product: If TR[e 1 ]={ |p} and TR[e 2 ]={ |q}, then TR[e 1  e 2 ]= { | p  q} Example: TR[R  S]= { |  R   S }

71 Handout 2CIS 550, Fall 200171 Summary We’ve seen how to translate relational algebra into (domain) relational calculus. There are various syntactic restrictions for guaranteeing the safety of a DRC query. From any of these we can translate back into relational algebra It was this correspondence between an (implementable and optimizable) algebra and first-order logic that was responsible for the initial development of relational databases – a prime example of some theory leading to highly successful practical developments!

72 Handout 2CIS 550, Fall 200172 What we cannot compute with relational calculus/algebra Aggregate operations, e.g. “The number of climbers who have climbed ‘Last Tango’” or “The average age of climbers.” These are possible in SQL. Recursive queries. Given a relation Parent(Parent, Child) compute the ancestor relation. This appears to call for an arbitrary number of joins. It is known that it cannot be expressed in first-order logic, hence it cannot be expressed in relational algebra.

73 Handout 2CIS 550, Fall 200173 What we cannot compute with relational algebra, cont Computing with complex structures that are not (1NF) relations, e.g. lists, arrays, multisets. Of course, we can always compute such things if we can “talk to” a database from a full-blown (Turing complete) programming language, and we’ll see how to do this later. However, communicating with a database in this way may well be inefficient, and adding computational power to a query language remains an important research topic.

74 Handout 2CIS 550, Fall 200174 Datalog The general idea behind Datalog is to use Horn-clauses -- “if-then” rules -- as a query language for relational databases. Relations are represented by predicates, e.g. Climbers, Climbs and Routes are interpreted as predicates with fixed arity. Positional interpretation to arguments; e.g. Climbers(X,”Bridget”, “EXP”,”33”). The arguments can be constants (e.g. “Bridget”, “EXP”, and “33”) or variables (e.g. X). –Will use upper case for var, lower case for constants

75 Handout 2CIS 550, Fall 200175 Truth values A predicate is ground if all of its arguments are constants. Ground predicates have truth values, which mirror whether or not the “tuple” is in the relation. Climbers(313,bridget, exp,33) is true. Climbers(518,jeremy,exp,17) is false. Predicates can also be negated: NOT Climbers(518,jeremy,exp,17) is true. Climbers CId Cname Skill Age 123 edmund exp 80 214 arnold beg 25 313 bridget exp 33 212 james med 27

76 Handout 2CIS 550, Fall 200176 “Arithmetic” Predicates We will want to mirror conditions, and will use the predicates. I.e. is true Note that in contrast to “relational” predicates, arithmetic predicates are infinite!

77 Handout 2CIS 550, Fall 200177 Datalog Rules A rule has form, where p is a relational predicate called the head and q is a conjunction of predicates (subgoals) called the body. Example: When a variable is not used, it can be replaced by “_” (anonymous variable). head EXPClimbers(I,N,A)  Climbers(I,N,S,A) AND S=exp body EXPClimbers(N)  Climbers(_,N,exp,_)

78 Handout 2CIS 550, Fall 200178 Some examples… The names of climbers older than 32. The names of climbers who have climbed route 1. The names of climbers with age less than 40 who have climbed a route with rating higher than 5. Note the positional interpretation of attributes! OLD(N)  Climbers(I,N,S,A) AND A>32 Route1(N)  Climbers(I,N,_,_) AND Climbs(I, 1,_,_) Rating5(N)  Climbers(I,N,_,A) AND Climbs(I, R,_,_) AND Routes(R,_,_,Ra,_) AND Ra>5 AND A<40

79 Handout 2CIS 550, Fall 200179 Safety A rule is safe if every variable occurs at least once in a positive relational predicate in the body. Some unsafe rules: Some safe rules: (and all the ones we have seen so far). Likes(X,Y)  Starved(X) Sedate(X)  NOT Climbers(_,X,_,_) Likes(X,Y)  Starved(X) AND Food(Y) Sedate(X)  Person(X) AND NOT Climbers(_,X,_,_)

80 Handout 2CIS 550, Fall 200180 Datalog Query A query is a collection of one or more rules. A rule with an empty body is called a fact (positive ground relational predicate). Student (123,j.smith,compsci) Student(456,k.tappet,french) Offers(cookery,baking) Offers(compsci,compilers) Enroll(123,baking) Enroll(012,compilers) InterestedIn(X,S)  Student(X,Y,S) InterestedIn(X,S)  Enroll(X,Z) AND Offers(S,Z)

81 Handout 2CIS 550, Fall 200181 The query in relational algebra The previous query corresponds to the following relational algebra expression: What would you expect the output of the query to be?

82 Handout 2CIS 550, Fall 200182 Intensional versus Extensional Predicates Extensional predicates are those whose relations are stored in the db; intensional predicates are those which are computed by applying one or more rules. –Student, Offers, and Enroll are extensional –InterestedIn is intensional Extensional predicates can never appear in the head of a rule.

83 Handout 2CIS 550, Fall 200183 Another example Can this be translated to relational algebra? What do you expect the output of the query to be? Which are EDB predicates and which are IDB? Parent(mary,jane) Parent(jane,fred) Parent(ed,bob) Parent(bob,fred) Parent(fred,jill) Ancestor(X,Y)  Parent(X,Y) Ancestor(X,Y)  Parent(X,Z) AND Ancestor(Z,Y)

84 Handout 2CIS 550, Fall 200184 Meaning of Datalog rules Consider every possible assignment of values to variables. For every such assignment which makes all the subgoals true, the tuple corresponding to the head is true and added to the result. Example: X=012, Z=compilers, S=compsci Offers(compsci,compilers) Enroll(012,compilers) InterestedIn(X,S)  Enroll(X,Z) AND Offers(S,Z) So InterestedIn(012,compsci) is added to result.

85 Handout 2CIS 550, Fall 200185 Another way to define meaning... The “assignment” method of defining meaning considers “meaningless” variable assignments. For example: X=compilers, Z=012, S=f.dunham Another method is to consider the set of tuples in each nonnegated relational subgoal, and look at “consistent” variable assignments. If all subgoals are true (negated as well as arithmentic), then the tuple in the head is added to the result. This will suggest an implementation using RA! InterestedIn(X,S)  Enroll(X,Z) AND Offers(Z,S)

86 Handout 2CIS 550, Fall 200186 An interesting “incorrectly” written query It is easy to write queries that do not express your intension. E.g. Single(X)  Person(X) AND NOT Married(X,Y) What does this query mean in English? If the intension was to get all people who are not married, how should the query have been written? The query also isn’t safe!

87 Handout 2CIS 550, Fall 200187 RA versus Datalog The Ancestor example is called recursive because the definition of ancestor depends on itself (directly). This cannot be simulated in RA, and we will need to add a fixpoint operator to the algebra to simulate it. If subgoals are not allowed to be negated, we cannot emulate set difference in Datalog. However, if subgoals can be negated we can simulate any RA expression in Datalog.

88 Handout 2CIS 550, Fall 200188 Simulating RA in Datalog Intersection: simulate by a rule with each relation as a subgoal. E.g. recall Climbers and Hikers from aprevious lecture. would be written as Difference: simulate by a rule with each relation as a subgoal, with second subgoal negated. So write Climbers - Hikers as CandH(I,N,S,A)  Climbers(I,N,S,A) AND Hikers(I,N,S,A) CnotH(I,N,S,A)  Climbers(I,N,S,A) AND NOT Hikers(I,N,S,A) Climbers Hikers

89 Handout 2CIS 550, Fall 200189 Simulating RA, contd. Union: simulate by two rules, each of which has a body consisting of one of the relations as its sole subgoal. So would be written as Projection: simulate by one rule, the head of which uses variables corresponding to the attributes being projected on. So would be written as Result(N,A)  Climbers(_,N,_,A) CorH(I,N,S,A)  Climbers(I,N,S,A) CorH(I,N,S,A)  Hikers(I,N,S,A)

90 Handout 2CIS 550, Fall 200190 Simulating Selection If the condition is a conjunction of arithmetic atoms, this is easy: append each conjunct as a subgoal. So becomes OR can be simulated using union; recall the equivalence Result(I,N,S,A)  Climbers(I,N,S,A) AND N= bridget AND Age>30

91 Handout 2CIS 550, Fall 200191 Simulating Selection, contd. Now, recall from logic that any expression involving and, or and not can be put into conjunctive normal form: an OR of conjuncts, each of which is the AND of comparisons.

92 Handout 2CIS 550, Fall 200192 Simulating Product and Join The product of two relations is expressed by a single rule with both relations as subgoals; all the variables in the relations appear in the head: R(A,B)  S(A) AND T(B) A join is just an equality selection and projection on a product; the equal terms can be expressed by reusing variables: R(A,B,C)  S(A,B) AND T(D,C) AND B=D or R(A,B,C)  S(A,B) AND T(B,C)

93 Handout 2CIS 550, Fall 200193 Simulating multiple operation RA expressions Create the “operator tree”: Create an IDB predicate for each interior node, and write the corresponding rule. IDB corresponding to root is the result. Climbers Climbs

94 Handout 2CIS 550, Fall 200194 Datalog: Depends-On Graph Nodes correspond to relational predicates; there is an edge from A to B if B appears as a subgoal (positive or negative) in a rule with head A. The edge is annotated with “-” for negated subgoals: InterestedIn(X,S)  Student(X,Y,S) InterestedIn(X,S)  Enroll(X,Z) AND Offers(Z,S) InterestedIn StudentEnroll Offers

95 Handout 2CIS 550, Fall 200195 Cyclic Depends-On Graphs A cyclic Depends-On graph indicates a recursive query. Recall the ancestor example: Ancestor(X,Y)  Parent(X,Y) Ancestor(X,Y)  Parent(X,Z) AND Ancestor(Z,Y) Parent Ancestor

96 Handout 2CIS 550, Fall 200196 Evaluating Datalog in RA Assume for now non-recursive Datalog programs with no negated predicates. Evaluation of IDB predicates proceeds “bottom-up” from the leaves of the Depends-On graph so that subgoals are completely evaluated by the time they are used. (Note that the EDB predicates must be leaves in this graph!)

97 Handout 2CIS 550, Fall 200197 Evaluating Datalog in RA, cont. Procedure to evaluate one rule (no negation): –Take the product of the relational (non-aritmetic) subgoals –Form a selection of the product with a condition that equates positions with same variable and captures all arithmetic predicates –Project over variables appearing in the head For each IDB predicate R, take the union of the expressions of rules with head R.

98 Handout 2CIS 550, Fall 200198 Examples Result(N)  Climbers(I,N,_,A) AND A 5 InterestedIn(X,S)  Student(X,Y,S) InterestedIn(X,S)  Enroll(X,Z) AND Offers(Z,S) NotSingle(X)  Married(X,Y) NotSingle(X)  Married(Y,X)

99 Handout 2CIS 550, Fall 200199 Handling negated subgoals Suppose NOT R(X,Y,Z) appears as a negated subgoal in some query. Let DOM={any symbol that appears in the rule} U {any symbol that appears in any relational instance appearing in a subgoal of the rule}. It is sufficient to “evaluate” NOT R(X,Y,Z) as (DOM X DOM X DOM)-R However, there are problems when this is combined with recursion ! (More later…)

100 Handout 2CIS 550, Fall 2001100 Example NotSingle(X)  Married(X,Y) NotSingle(X)  Married(Y,X) Single(X)  Person(X) AND NOT NotSingle(X) How do we evaluate this program? Let DOM={all symbols in Person}U{all symbols in NotSingle} The correct version of the example finding all single people in the database is:

101 Handout 2CIS 550, Fall 2001101 Recursive queries (no negation) - “Naïve” evaluation Let R, S... be IDB predicates occurring in a single cycle in the Depends-on graph. R= , S=  while there is a change to R, S,… do R=R  {evaluation of R} S=S  {evaluation of S}

102 Handout 2CIS 550, Fall 2001102 Example Parent(mary,jane) Ancestor(X,Y)  Parent(X,Y) Parent(jane,fred) Parent(ed,bob) Ancestor(X,Y)  Parent(X,Z) Parent(bob,fred) AND Ancestor(Z,Y) Parent(fred,jill) 1. Ancestor = ø 2. Ancestor = {(mary,jane),(jane,fred),(ed,bob),(bob,fred), (fred,jill)} Evaluation of Ancestor:

103 Handout 2CIS 550, Fall 2001103 Example, cont. 3. Ancestor = {(mary,jane),(jane,fred),(ed,bob),(bob,fred), (fred,jill), (mary, fred),(jane,jill),(ed.fred) (bob,jill)} 4. Ancestor = {(mary,jane),(jane,fred),(ed,bob),(bob,fred), (fred,jill), (mary, fred),(jane,jill),(ed,fred) (bob,jill), (mary,jill), (ed,jill)}

104 Handout 2CIS 550, Fall 2001104 Negation in Recursive Rules Problem: what is the semantics? For example, suppose an IDB of R(0): P(X)  R(X) AND NOT Q(X) Q(X)  R(X) AND NOT P(X) There are two “correct” answers: {R(0), P(0)} and {R(0),Q(0)}. Both are minimal in the sense that we cannot throw out anything and get a correct answer (i.e. {R(0)} is inconsistent).

105 Handout 2CIS 550, Fall 2001105 Stratified Negation: an overview A technique for assigning a single meaning to certain safe Datalog programs with negation. Works by dividing predicates up into “strata”, which are linearly ordered. Each strata must be completely evaluated before the next strata is evaluated. Would not be able to handle the previous example as the program has a cycle of “negative” arcs in the Depends-On graph.

106 Handout 2CIS 550, Fall 2001106 Stratified Negation: the Details A program is stratified iff whenever there is a rule with head p and q occurs as a negated predicate, there is no path from p to q in the Depends-On graph. NotSingle(X)  Married(X,Y) NotSingle(X)  Married(Y,X) Single(X)  Person(X) AND NOT NotSingle(X) Person NotSingle Married ¬ Single

107 Handout 2CIS 550, Fall 2001107 Strata-labeling Algorithm for each predicate p do stratum(p)=1 repeat until no changes to any stratum or some stratum exceeds the number of predicates for each rule r with head p do begin for each negated subgoal of r with predicate q do stratum(p):= max(stratum(p), 1+stratum(q)) for each positive subgoal or r with predicate q do stratum(p):= max(stratum(p), stratum(q)) end for end repeat Stratum(1): Person, Married, NotSingle Stratum(2): Single

108 Handout 2CIS 550, Fall 2001108 Summary: RA versus Datalog If subgoals can be negated we can simulate any RA expression in Datalog. (Without negated subgoals we cannot handle “-”.) To simulated “recursive” Datalog queries, we must add the fixpoint operator to the relational algebra. Datalog with negated subgoals AND recursion may yield programs with more than one “minimal model” -- stratified negation is a technique for evaluating some of these programs.


Download ppt "Handout 2CIS 550, Fall 20011 CIS 550, Fall 2001 Handout 2. SQL, Relational Calculus and Datalog."

Similar presentations


Ads by Google