Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in.

Slides:



Advertisements
Similar presentations
Relational Calculus and Datalog
Advertisements

Chapter 3 Tuple and Domain Relational Calculus. Tuple Relational Calculus.
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
1 541: Relational Calculus. 2 Relational Calculus  Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).  Calculus.
Basic Structures: Sets, Functions, Sequences, Sums, and Matrices
Basic Structures: Sets, Functions, Sequences, Sums, and Matrices
L41 Lecture 2: Predicates and Quantifiers.. L42 Agenda Predicates and Quantifiers –Existential Quantifier  –Universal Quantifier 
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4.
Relational Algebra Content based on Chapter 4 Database Management Systems, (Third Edition), by Raghu Ramakrishnan and Johannes Gehrke. McGraw Hill, 2003.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Efficient Query Evaluation on Probabilistic Databases
Database Systems Relational Algebra. Query Languages A query – An expression that enables extracting data from a database A query language – A.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Relational Algebra. 2 Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model supports.
1 Lecture 5: Relational calculus
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
SPRING 2004CENG 3521 E-R Diagram for the Banking Enterprise.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Rutgers University Relational Algebra 198:541 Rutgers University.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Rutgers University Relational Calculus 198:541 Rutgers University.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Review “Query Languages” Algebra, Calculus, and SQL.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
Lecture 3 [Self Study] Relational Calculus
The importance of sequences and infinite series in calculus stems from Newton’s idea of representing functions as sums of infinite series.  For instance,
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
The Relational Model: Relational Calculus
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Chapter 8 Relational Calculus. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.8-2 Topics in this Chapter Tuple Calculus Calculus vs. Algebra.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4.
1 Relational Algebra. 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of data from a database. v Relational model supports.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
Relational Algebra.
Propositional Calculus CS 270: Mathematical Foundations of Computer Science Jeremy Johnson.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
CompSci 102 Discrete Math for Computer Science
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
The Relational Calculus (Based on Chapter 9 in Fundamentals of Database Systems by Elmasri and Navathe, Ed. 3)
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
IST 210 The Relational Language Todd S. Bacastow January 2004.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module A: Formal Relational.
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L5_Relational Calculus 1 Relational Calculus Chapter 4 – Part B.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
1 CS122A: Introduction to Data Management Lecture #7 Relational Algebra I Instructor: Chen Li.
Logics for Data and Knowledge Representation ClassL (part 1): syntax and semantics.
Chapter 2 1. Chapter Summary Sets (This Slide) The Language of Sets - Sec 2.1 – Lecture 8 Set Operations and Set Identities - Sec 2.2 – Lecture 9 Functions.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Relational Calculus Chapter 4, Section 4.3.
Relational Algebra & Calculus
CSE202 Database Management Systems
The Relational Algebra and Relational Calculus
Elmasri/Navathe, Fundamentals of Database Systems, 4th Edition
Propositional Calculus: Boolean Algebra and Simplification
Logics for Data and Knowledge Representation
Chapter 6: Formal Relational Query Languages
Relational Algebra & Calculus
Presentation transcript:

Database Systems Relational Calculus

Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in a programming language, yet the translation of a query from a natural language (or even SQL) to relational algebra is not intuitive In the Domain Relational Calculus (DRC) language and its variant Tuple Relational Calculus (TRC), the terms of the query are expressed as first-order logic expression, which is more similar to the way humans compose questions, in contrast to the “algorithmic” structure of relational algebra

Domain Relational Calculus (DRC)

DRC – Basic Concept DRC enables writing logical expressions describing conditions on the records Each such expression will have free variables (defined later); the truth/false value of the expression will be evaluated with respect to these variables The expression  (x 1,...,x n ) describes a condition  whose free variables are x 1,...,x n –  is also called a formula The free variables of a formula describe the attributes of the records – For any set of possible values  1,...,  n for the free variables x 1,...,x n, the expression  returns a truth value Either  (  1,...,  n ) is satisfied or  (  1,...,  n ) is not satisfied – R[A 1,...,A n ] = { :  (x 1,...,x n )} stands for the relation consisting of exactly all n-tuples satisfying  (x 1,...,x n )

Atomic Expressions Belongs to formulas – If R[A 1,...,A n ] is a relation in the database, then the formula R(x 1,...,x n ) is a formula with n free variables For any sequence of values  1,...,  n, the formula R(  1,...,  n ) is satisfied iff the relation includes a record with attributes corresponding to these values – We can also write  R instead of R(  1,...,  n ) Comparisons – Between variables and between variables and constants using the usual comparison operators such as “z=2”, “x<y”, etc. – The free variables are the ones appearing in the formula

Composing Expressions – Boolean Negation – If  (x 1,...,x n ) is a valid expression, then  (x 1,...,x n )=  (x 1,...,x n ) is also a valid expression with the same free variables – Meaning:  (x 1,...,x n ) is satisfied iff  (x 1,...,x n ) is not satisfied Disjunction – If  1 (x 1,...,x n ) and  2 (x 1,...,x n ) are valid expressions, then  (x 1,...,x n )=  1 (x 1,...,x n )  2 (x 1,...,x n ) is a valid expression that is satisfied if either  1 or  2 is satisfied (or both) – It is possible that only some of the variables x 1,...,x n in either  1 or  2 are free variables in this expression It is possible to “declare” for a given expression the free variables that do not appear in it explicitly

Additional Boolean Operations Conjunction – The expression  1 (x 1,...,x n )  2 (x 1,...,x n ) is satisfied for a given sequence of values x 1,...,x n iff it is satisfied for both  1 (x 1,...,x n ) and  2 (x 1,...,x n ) – It is equivalent to  ((  1 (x 1,...,x n ))  (  2 (x 1,...,x n ))) The parenthesis describe the order in which operators are applied in the composed expression when it is not obvious Implication – Written as  1 (x 1,...,x n )  2 (x 1,...,x n ) – This expression is equivalent to ((  1 (x 1,...,x n ))  (  2 (x 1,...,x n ))) Equivalence – Written as  1 (x 1,...,x n )  2 (x 1,...,x n ) – This expression is equivalent to (  1 (x 1,...,x n )  2 (x 1,...,x n ))  (  2 (x 1,...,x n )  1 (x 1,...,x n ))

The Existential Quantifier If  (x 1,...,x n ) is a valid expression, then  (x 2,...,x n )=  x 1  (x 1,...,x n ) is a valid expression whose free variables are x 2,...,x n – The meaning of this expression is that  (  2,...,  n ) is satisfied iff there exists a value  1 for which  (  1,...,  n ) is satisfied Example: – For attributes whose value ranges are the natural numbers,  (x 2,x 3 )=  x 1 ((x 2 > x 1 )  (x 1 > x 3 ) is satisfied iff x 2,x 3 are a pair of decreasing non-consecutive numbers

The Universal Quantifier Given a valid expression  (x 1,...,x n ), the expression  (x 2,...,x n )=  x 1  (x 1,...,x n ) has the following meaning: for any given sequence of values  2,...,  n,  (  2,...,  n ) is satisfied iff for every possible value  1,  (  1,...,  n ) is satisfied How do we define “every possible value”? – Naturally, if the set of possible values is infinite, verifying such an expressions becomes problematic… – For now, we assume that this is restricted to the range of possible values for the corresponding attribute – If the range is not clear, we need to state it explicitly – A similar problem also exists for  due to negation. In particular,  x 1  (x 1,...,x n ) is equivalent to  x 1  (x 1,...,x n )

Quantified and Free Variables Consider the formula  (y,z)=(  x  1 (x,y))  (  x  2 (x,z)) – The free variables here are y and Z – What about x? x appears twice, once on each side of the  operator. Yet, there is no relation between these occurances In fact, it is equivalent to the formula  (y,z)=(  x 1  1 (x 1,y))  (  x 2  2 (x 2,z)) – Denote R 1 [A,B] the relation defined by  1 and R 2 [A,C] the relation defined by  2 The equivalent expression in RA is (  B R 1 )  (  C R 2 ) – To enforce the same value on both sides of the  operator, the formula should be written  (y,z)=  x(  1 (x,y)  2 (x,z)) This is equivalent in RA to  B,C ( R 1 ⋈ R 2 )

Implementing RA Expressions in DRC Let  (x 1,...,x n ) be a formula representing T[A 1,...,A n ] – The relation  A 1,...,A m T[A 1,...,A n ] (for m :  x m+1,...,x n  (x 1,...,x m,x m+1,...,x n )} – To obtain   T[A 1,...,A n ] for a given  expression, denote  (x 1,...,x n ) the expression obtained from  by replacing each A i with X i (for each 1  i  n) We get the following first order logic expression   T={ :  (x 1,...,x n )   (x 1,...,x n )}

Cartesian Product in DRC Let  1 (x 1,...,x n ) be a formula representing T 1 [A 1,...,A n ] and  2 (x 1,...,x m ) be a formula representing T 2 [B 1,...,B m ] – We can represent T 1 xT 2 by the following expression T 1  T 2 = { :  1 (x 1,...,x n )  2 (x n+1,...,x n+m )} Notice that we need to use different sets of variables for  1 and  2

Subtract, Union, Intersection Let  1(x 1,...,x n ) be a formula representing T 1 [A 1,...,A n ] and  2(x 1,...,x n ) be a formula representing T 2 [A 1,...,A n ] We have: – T 1 \ T 2 = { :  1 (x 1,...,x n )   2 (x 1,...,x n )} – T 1  T 2 = { :  1 (x 1,...,x n )   2 (x 1,...,x n )} – T 1  T 2 = { :  1 (x 1,...,x n )   2 (x 1,...,x n )}

Composed Expressions in DRC Let  1 (x 1,...,x n,y 1,...,y m ) be a formula representing T 1 [A 1,...,A n,B 1,...,B m ] and  2 (y 1,...,y m,z 1,...,z k ) a formula representing T 2 [B 1,...,B m,C 1,...,C c ] – The natural join of T 1 and T 2 can be expressed as follows T 1 ⋈ T 2 = { :  1 (x 1,...,x n,y 1,...,y m )   2 (y 1,...,y m,z 1,...,z k )} – The semi-join can be expressed as T 1 ⋉ T 2 = { :  z 1,...,z k (  1 (x 1,...,x n,y 1,...,y m )   2 (y 1,...,y m,z 1,...,z k ))}

Formal Equivalence Proof In order to exemplify how equivalence proofs are structured, we will formally prove that if T 1 corresponds to  1 (x 1,...,x n,y 1,...,y m ) and T 2 corresponds to  2 (y 1,...,y m,z 1,...,z k ) then for the expression  (x 1,...,x n,y 1,...,y m,z 1,...,z k ) =  1 (x 1,...,x n,y 1,...,y m )  2 (y 1,...,y m,z 1,...,z k ) the following holds: T 1 ⋈ T 2 ={ :  (x 1,...,z k )} To that end, we need to show that for every two relations T 1 and T 2 and every sequence of values t=(  1,...,  n,  1,...,  m,  1,...,  k ): – If t  T 1 ⋈ T 2 then  (t) is satisfied – If  (t) is satisfied then t  T 1 ⋈ T 2

First Direction If t=(  1,...,  n,  1,...,  m,  1,...,  k )  T 1 ⋈ T 2 then by the definition of join, (  1,...,  n,  1,...,  m )  T 1 meaning that  1 (  1,...,  n,  1,...,  m ) is satisfied and (  1,...,  m,  1,...,  k )  T 2 meaning that  2 (  1,...,  m,  1,...,  k ) is also satisfied Hence,  1 (  1,...,  n,  1,...,  m )  2 (  1,...,  m,  1,...,  k ) =  (  1,...,  n,  1,...,  m,  1,...,  k ) is satisfied, as needed

Second Direction If  (t)=  1 (  1,...,  n,  1,...,  m )  2 (  1,...,  m,  1,...,  k ) is satisfied, then by the definition of conjunction,  1 (  1,...,  n,  1,...,  m ) is satisfied, meaning that (  1,...,  n,  1,...,  m )  T 1 Similarly,  2 (  1,...,  m,  1,...,  k ) is satisfied, meaning that (  1,...,  m,  1,...,  k )  T 2 Hence, by the definition of natural join T 1 ⋈ T 2, we have (  1,...,  n,  1,...,  m,  1,...,  k )  T 1 ⋈ T 2, as needed

Boolean Queries If  is a DRC expression without any free variables (but uses the database relations), then the query “{<>:  }” is still meaningful The result will be a relation without any attributes – It is the empty relation if  is not satisfied over the database – Otherwise, it is the relation that includes a single empty record We sometimes use the notation “  ” for such a query For example, the following query checks whether the relations R and S satisfy R ⊆ S: {<>:  x 1,...,x n (R(x 1,...,x n ) → S(x 1,...,x n ))}

Problematic Expressions Consider the DRC expression: { :  R(x)} – Can it be written in RA? – Can it be implemented in a table? How about the expression: { : x=y} ? We should avoid extremely large (and in particular infinite) tables as well as queries (and sub-queries) that require addressing extremely large (and in particular infinite) domains – In other words, we would like to restrict ourselves to the content of the DB and avoid being dependent on the domains

Safe DRC Safe-DRC is intended to ensure that all expressions can be implemented without having to scan entire domains – This is both in order to prepare the input and in order to evaluate expressions with quantifiers A DRC expression is called safe if it satisfies some additional requirements that ensure that its result only depends on the relations in the database and not on the entire possible range of attribute values – Not every DRC expression has an equivalent Safe-DRC expression A more formal definition is given in the tutorial

DRC – Summarizing Example We are going to compare between a few RA and DRC queries Station Height S_Name S_Type Line L_Type L_Num Direction Serves Km Train T_Num Days Service T_Category Class Food Gives Arrives Platform D_TimeA_Time

The Relevant Relations Since in DRC the order of variables may impact the result, we will first define the order of attributes for each relation: – Station[S_Name, Height] Station_Type[S_Name,S_Type] – Line[L_Num,Direction,L_Type] Serves[S_Name,L_Num,Direction,Km] – Train[T_Num,Days] Service[T_Category,Class,Food] – Gives[T_Num,T_Category,Class] – Arrives[T_Num,S_Name,L_Num,Direction,A_Time,D_Time,Platform]

Sample Query in DRC Which stations are served on Line 1-south? – As in RA, we can extract the data from “Serves” { :  y,z,w ( Serves(x,y,z,w)  y=1  z=“south” )} – This translates directly to RA as  S_Name (  (L_Num=1)  (Direction=“south”) (Serves)) – Such an expression can be also written in a shortened manner as { :  w Serves(x,1,“south”,w)}

Another Example Which lines have stations below sea level? – For every pair of variables y,z representing a line (L_Num,Direction) for which there are stations, we can identify the stations using the variable x for which there is some w such that Serves(x,y,z,w) is true – Hence, the expression { :  x,w(Serves(x,y,z,w)  (x))} where  (x) is satisfied iff x is a station below sea level – For this, we have  (x) =  u(Station(x,u)  (u < 0 )) – Resulting in the following query { :  x,w(Serves(x,y,z,w)  (  u(Station(x,u)  (u<0)))}

The Other Direction What does the following query means? { :  y 1,y 2 (Station(y 1,y 2 )   z 1,z 2,z 3,z 4,z 5 (Arrives(x,y 1,z 1,z 2,z 3,z 4,z 5 )))} – The sub-expression  z 1,z 2,z 3,z 4,z 5 (Arrives(x,y 1,z 1,z 2,z 3,z 4,z 5 )) represents that train number x stops in station called y 1 of a certain line number, direction, etc. – The sub-expression around the  sign (without the quantifiers) indicates that if (y 1,y 2 ) is a station then x is a train that stops there – Hence, the complete expression gives the train numbers stopping in all stations Always? How can this be fixed? – Personal exercise: write an expression that will only take into account stations in which trains stop at

A More Complex Example What are the lines that share a change station? – First, here is an expression for obtaining all change stations  1 (x)=  y(Station(x,y)  Station_Type(x,“change”)) – Next, we locate pairs of lines appearing with the same change station  2 (u 1,v 1,u 2,v 2 )=  x,w 1,w 2 (  1 (x)  Serves(x,u 1,v 1,w 1 )  Serves(x,u 2,v 2,w 2 )) – The complete query, including line types, is then { : Line(u 1,v 1,t 1 )  Line(u 2,v 2,t 2 )   2 (u 1,v 1,u 2,v 2 )  (u 1 ≠u 2 )}

Tuple Relational Calculus (TRC)

TRC The main difference between TRC and DRC is that in TRC variables’ value are complete records rather than a single attribute Hence, to define a query that returns a relation, we need an expression  (t) that has a single free variable (although it may include sub-expressions that have quantified variables) The general query structure is {t[A 1,...,A n ]:  (t)}, which returns a relation with attributes A 1,...,A n including all possible records having corresponding attributes for which  is satisfied

Atomic Formulas in TRC Belongs to – For a relation R[A 1,...,A n ] and variable t, the atomic formula “t  R” is satisfied iff the value of t is a record inside R Comparisons – We denote t[A] the value of attribute A in the variable t – For example, “t[A]=2” is satisfied if that value of A in t is 2 – “r[B]<s[C]” is satisfied if the values of the corresponding attributes of s and r satisfy this condition

Composed Formulas in TRC Boolean Formulas – These are written in TRC exactly the same as in DRC and have the same meaning – For example, the formula  1 (t 1,...,t k )  2 (t 1,...,t k ) is satisfied iff both  1 and  2 are satisfied with respect to t 1,...,t k Recall that here each t i is a complete record (tuple) Quantifiers – Are also written as expected – For example,  t 1  (t 1,...,t k ) is satisfied iff there is a possible record (regardless of whether it is in the database or not) for which  (t 1,...,t k ) is satisfied – here t 2,...,t k are free variables – Yet, here the quantifier applies to complete records (lines) rather than single attributes – When it is not clear from the context, we write the attributes of the quantified variable explicitly, e.g.,  t 1 [A 1,…,A n ]  (t 1,...,t k )

Sample Queries in TRC In the train operation example, what is the formula for “which stations are served by line 1-south”? – {t[S_Name] :  s[S_Name,L_Num,Direction,Km] ( s  Serves  t[S_Name]=s[S_Name]   s[L_Num]=1  s[Direction]=“south” )} What about “which lines serve stations below sea level”? – {t[L_Num,Direction] :  r,s (r  Station  s  Serves   s[L_Num]=t[L_Num]  s[Direction]=t[Direction]   r[S_Name]=s[S_Name]  r[Height]<0 )}

The Expressive Power of TRC and DRC Every query that can be expressed in TRC can also be expressed in DRC and vice versa In particular, TRC also has the problem that it can be used to define queries that cannot be implemented in a database Similarly to DRC, TRC also has a safe version – Safe-TRC and Safe-DRC have the same expressive power, i.e., any formula written by Safe-TRC can be expressed in Safe-DRC and vice versa

The Expressive Power of Relational Calculus The expressive power of relational calculus is at least as strong as RA since as we saw for DRC (and it is true also for TRC), any query in RA can be expressed in DRC Yet, there are expressions in DRC (and TRC) that cannot be expressed in RA – Although DRC enables writing expressions that cannot be implemented in a database Safe relational calculus (both Safe-DRC and Safe-TRC) as studied in the tutorials have the same expressive power as RA Relational calculus (both safe and unsafe) cannot be used for expressing transitive closure – E.g., “What are all the stations that can be reached from station S in a finite number of train changes?”