Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content.

Similar presentations


Presentation on theme: "Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content."— Presentation transcript:

1 Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2 2 Administrivia  Homework 1 handed out  Will be due in 1 week, unless otherwise announced  Focus: relational algebra and calculus

3 3 Example Data Instance sidname 1Jill 2Qun 3Nitin fidname 1Ives 2Saul 8Roth sidexp-gradecid 1A550-0103 1A700-1003 3C500-0103 cidsubjsem 550-0103DBF03 700-1003AIS03 501-0103ArchF03 fidcid 1550-0103 2700-1003 8501-0103 STUDENT Takes COURSE PROFESSOR Teaches

4 4 Codd’s Relational Algebra  A set of mathematical operators that compose, modify, and combine tuples within different relations  Relational algebra operations operate on relations and produce relations (“closure”) f: Relation  Relationf: Relation x Relation  Relation

5 5 A Set of Logical Operations: The Relational Algebra  Six basic operations:  Projection   (R)  Selection   (R)  UnionR 1 [ R 2  DifferenceR 1 – R 2  ProductR 1 £ R 2  (Rename)   (R)  And some other useful ones:  JoinR 1 ⋈  R 2  SemijoinR 1 ⊲  R 2  IntersectionR 1 Å R 2  DivisionR 1 ¥ R 2

6 6 Data Instance for Operator Examples sidname 1Jill 2Qun 3Nitin 4Marty fidname 1Ives 2Saul 8Roth sidexp-gradecid 1A550-0103 1A700-1003 3A 3C500-0103 4C cidsubjsem 550-0103DBF03 700-1003AIS03 501-0103ArchF03 fidcid 1550-0103 2700-1003 8501-0103 STUDENT Takes COURSE PROFESSOR Teaches

7 7 Projection,  

8 8 Selection,  

9 9 Product X

10 10 Join, ⋈  : A Combination of Product and Selection

11 11 Union 

12 12 Difference –

13 13 Rename,      The rename operator can be expressed several ways:  The book has a very odd definition that’s not algebraic  An alternate definition:     (x)Takes the relation with schema  Returns a relation with the attribute list   Rename isn’t all that useful, except if you join a relation with itself Why would it be useful here?

14 14 Mini-Quiz  This completes the basic operations of the relational algebra. We shall soon find out in what sense this is an adequate set of operations. Try writing queries for these:  The names of students named “Bob”  The names of students expecting an “A”  The names of students in Amir Roth’s 501 class  The sids and names of students not enrolled

15 15 Deriving Intersection Intersection: as with set operations, derivable from difference A-B B-A A B A Å B ≡ (A [ B) – (A – B) – (B – A) ≡ (A - B) – (B - A)

16 16 Division  A somewhat messy operation that can be expressed in terms of the operations we have already defined  Used to express queries such as “The fid's of faculty who have taught all subjects”  Paraphrased: “The fid’s of professors for which there does not exist a subject that they haven’t taught”

17 17 Division Using Our Existing Operators  All possible teaching assignments: Allpairs:  NotTaught, all (fid,subj) pairs for which professor fid has not taught subj:  Answer is all faculty not in NotTaught:  fid,subj (PROFESSOR £  subj (COURSE)) Allpairs -  fid,subj (Teaches ⋈ COURSE)  fid (PROFESSOR) -  fid (NotTaught) ´  fid (PROFESSOR) -  fid (  fid,subj (PROFESSOR £  subj (COURSE)) -  fid,subj (Teaches ⋈ COURSE))

18 18 Division: R 1  R 2  Requirement: schema(R 1 ) ¾ schema(R 2 )  Result schema: schema(R 1 ) – schema(R 2 )  “Professors who have taught all courses”:  What about “Courses that have been taught by all faculty”?  fid (  fid,subj ( Teaches ⋈ COURSE)   subj (COURSE))

19 19 The Big Picture: SQL to Algebra to Query Plan to Web Page SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid STUDENT Takes COURSE Merge Hash by cid Optimizer Execution Engine Storage Subsystem Web Server / UI / etc Query Plan – an operator tree

20 20 Hint of Future Things: Optimization Is Based on Algebraic Equivalences  Relational algebra has laws of commutativity, associativity, etc. that imply certain expressions are equivalent in semantics  They may be different in cost of evaluation!  c Ç d (R) ´  c (R) [  d (R)  c (R 1 £ R 2 ) ´ R 1 ⋈ c R 2  c Ç d (R) ´  c (  d (R))  Query optimization finds the most efficient representation to evaluate (or one that’s not bad)

21 21 Switching Gears: An Equivalent, But Very Different, Formalism  Codd invented a relational calculus that he proved was equivalent in expressiveness  Based on a subset of first-order logic – declarative, without an implicit order of evaluation  Tuple relational calculus  Domain relational calculus  More convenient for describing certain things, and for certain kinds of manipulations  The database uses the relational algebra internally  But query languages (e.g., SQL) are mostly based on the relational calculus

22 22 Domain Relational Calculus Queries have form: { | p} Predicate: boolean expression over x 1,x 2, …, x n  Precise operations depend on the domain and query language – may include special functions, etc.  Assume the following at minimum:  RX op Y X op constconst op X where op is , , , , ,  x i,x j,… are domain variables domain variables predicate

23 23 More Complex Predicates Starting with these atomic predicates, build up new predicates by the following rules:  Logical connectives: If p and q are predicates, then so are p  q, p  q,  p, and p  q  (x>2)  (x<4)  (x>2)   (x>0)  Existential quantification: If p is a predicate, then so is  x.p   x. (x>2)  (x<4)  Universal quantification: If p is a predicate, then so is  x.p   x.x>2   x.  y.y>x

24 24 Some Examples  Faculty ids  Subjects for courses with students expecting a “C”  All course numbers for which there exists a smaller course number

25 25 Logical Equivalences  There are two logical equivalences that will be heavily used:  p  q   p  q (Whenever p is true, q must also be true.)   x. p(x)   x.  p(x) (p is true for all x)  The second can be a lot easier to check!  Example:  The highest course number offered

26 26 Free and Bound Variables  A variable v is bound in a predicate p when p is of the form  v… or  v…  A variable occurs free in p if it occurs in a position where it is not bound by an enclosing  or   Examples:  x is free in x > 2  x is bound in  x. x > y

27 27 Can Rename Bound Variables Only  When a variable is bound one can replace it with some other variable without altering the meaning of the expression, providing there are no name clashes  Example:  x. x > 2 is equivalent to  y. y > 2  Otherwise, the variable is defined outside our “scope”…

28 28 Safety  Pitfall in what we have done so far – how do we interpret: { |   STUDENT}  Set of all binary tuples that are not students: an infinite set (and unsafe query)  A query is safe if no matter how we instantiate the relations, it always produces a finite answer  Domain independent: answer is the same regardless of the domain in which it is evaluated  Unfortunately, both this definition of safety and domain independence are semantic conditions, and are undecidable

29 29 Safety and Termination Guarantees  There are syntactic conditions that are used to guarantee “safe” formulas  The definition is complicated, and we won’t discuss it; you can find it in Ullman’s Principles of Database and Knowledge- Base Systems  The formulas that are expressible in real query languages based on relational calculus are all “safe”  Many DB languages include additional features, like recursion, that must be restricted in certain ways to guarantee termination and consistent answers

30 30 Mini-Quiz How do you write:  Which students have taken more than one course from the same professor?

31 31 Translating from RA to DRC  Core of relational algebra: , , , x, -  We need to work our way through the structure of an RA expression, translating each possible form.  Let TR[e] be the translation of RA expression e into DRC.  Relation names: For the RA expression R, the DRC expression is { |  R}

32 32 Selection: TR[   R]  Suppose we have   (e’), where e’ is another RA expression that translates as: TR[e’]= { | p}  Then the translation of  c (e’) is { | p  ’} where  ’ is obtained from  by replacing each attribute with the corresponding variable  Example: TR[  #1=#2  #4>2.5 R] (if R has arity 4) is { |  R  x 1 =x 2  x 4 >2.5}

33 33 Projection: TR[  i 1,…,i m (e)]  If TR[e]= { | p} then TR[  i 1,i 2,…,i m (e)]= { |  x j 1,x j 2, …, x j k.p}, where x j 1,x j 2, …, x j k are variables in x 1,x 2, …, x n that are not in x i 1,x i 2, …, x i m  Example: With R as before,  #1,#3 (R)={ |  x 2,x 4.  R}

34 34 Union: TR[R 1  R 2 ]  R 1 and R 2 must have the same arity  For e 1  e 2, where e 1, e 2 are algebra expressions TR[e 1 ]={ |p} and TR[e 2 ]={ |q}  Relabel the variables in the second: TR[e 2 ]={ |q’}  This may involve relabeling bound variables in q to avoid clashes TR[e 1  e 2 ]={ |p  q’}.  Example: TR[R 1  R 2 ] = { |  R 1   R 2

35 35 Other Binary Operators  Difference: The same conditions hold as for union If TR[e 1 ]={ |p} and TR[e 2 ]={ |q} Then TR[e 1 - e 2 ]= { |p  q}  Product: If TR[e 1 ]={ |p} and TR[e 2 ]={ |q} Then TR[e 1  e 2 ]= { | p  q}  Example: TR[R  S]= { |  R   S }

36 36 What about the Tuple Relational Calculus?  We’ve been looking at the Domain Relational Calculus  The Tuple Relational Calculus is nearly the same, but variables are at the level of a tuple, not an attribute  {Q | 9 S  COURSES, 9 T 2 Takes (S.cid = T.cid Æ Q.cid = S.cid Æ Q.exp-grade = T.exp-grade)}

37 37 Limitations of the Relational Algebra / Calculus Can’t do:  Aggregate operations  Recursive queries  Complex (non-tabular) structures  Most of these are expressible in SQL, OQL, XQuery – using other special operators  Sometimes we even need the power of a Turing- complete programming language

38 38 Summary  Can translate relational algebra into relational calculus  DRC and TRC are slightly different syntaxes but equivalent  Given syntactic restrictions that guarantee safety of DRC query, can translate back to relational algebra  These are the principles behind initial development of relational databases  SQL is close to calculus; query plan is close to algebra  Great example of theory leading to practice!


Download ppt "Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content."

Similar presentations


Ads by Google