The Relational Model: Relational Calculus Fall 2002 CSE330/CIS550 Handout 2
Relational Calculus First-order logic (FOL) can also be thought of as a query language, and can be used in two ways: Tuple relational calculus Domain relational calculus The difference is the level at which variables are used: for attributes (domains) or for tuples. The calculus is non-procedural (declarative) as compared to the algebra. Fall 2002 CSE330/CIS550 Handout 2
Domain relational calculus Queries have form: {<x1,x2, …, xn>|p} where x1,x2, …, xn are domain variables and p is a predicate which may mention the variables x1,x2, …, xn Example: simple projection {<RN,H>|RI,G,R. <RI,RN,G,R,H>Routes} Example: selection and projection: {<RN,H>|RI,G,R. <RI,RN,G,R,H>Routes G >15} Fall 2002 CSE330/CIS550 Handout 2
DRC examples, cont Join: {<CI,R>|RI,RN,G,H,RI’,Da,Du. <RI,RN,G,R,H>Routes <CI,RI’,Da,Du>Climbs RI=RI’} We could also have written the above as: {<CI,R>|RI,RN,G,H,Da,Du. <RI,RN,G,R,H>Routes <CI,RI,Da,Du>Climbs} Fall 2002 CSE330/CIS550 Handout 2
Predicate Logic - a quick review The syntax of predicate logic starts with variables, constants and predicates that can be built using a collection of boolean-valued operators (boolean expressions) Examples: 1=2, x y, prime(x), contains(t,”Joe”). Precisely what operations are available depends on the domain and on the query language. For now we will assume the following boolean expressions: <X,Y,…> Rel, X op Y, X op constant, or constant op X, where op is , , , , , and X,Y,… are domain variables Fall 2002 CSE330/CIS550 Handout 2
Predicate Logic, cont. Starting with these basic predicates (also called atomic) , we can build up new predicates by the following rules: Logical connectives: If p and q are predicates, then so are pq, pq, p, and pq (x>2) (x<4) (x>2) (x>0) Existential quantification: If p is a predicate, then so is x.p x. (x>2) (x<4) Universal quantification: If p is a predicate, then so is x.p x.x>2 x. y.y>x Fall 2002 CSE330/CIS550 Handout 2
Logical Equivalences There are two logical equivalences that will be heavily used: pq p q (Whenever p is true, q must also be true.) x. p(x) x. p(x) (p is true for all x) The second will be especially important when we study SQL. Fall 2002 CSE330/CIS550 Handout 2
Free and bound variables A variable v is bound in a predicate p when p is of the form v… or v… A variable occurs free in p if it occurs in a position where it is not bound by an enclosing or Examples: x is free in x>2 x is bound in x.x>y x is free in (x>17) (x.x>2) Note that there are two occurrences of x in the last example. Fall 2002 CSE330/CIS550 Handout 2
Renaming variables When a variable is bound one can replace it with some other variable without altering the meaning of the expression, providing there are no name clashes Example: x.x>2 is equivalent to y.y>2 Fall 2002 CSE330/CIS550 Handout 2
Some queries… Try the following examples: The names and ages of climbers The names and ages of climbers who have climbed route 214 The names of climbers who have climbed “Last Tango” The names of climbers who have climbed all routes with rating greater than 15 The names of climbers who have climbed the same route twice Fall 2002 CSE330/CIS550 Handout 2
Safety There is a problem with what we have done so far. How should we treat a query like: {<CI,CN,S,A>| <CI,CN,S,A> Climbers>} This presumably means the set of all four-tuples that are not climbers, which is presumably an infinite set (and unsafe query). A query is safe if no matter how we instantiate the relations, it always produces a finite answer. In particular, the query should be domain independent, meaning that the answer is the same regardless of the domain in which it is evaluated. Unfortunately, both this definition of safety and domain independence are semantic conditions, and are undecidable. Fall 2002 CSE330/CIS550 Handout 2
Syntactic Safety There are syntactic conditions that are used to define “safe” formulas. In particular: Every “safe” formula is domain independent. It is implementable. The formulas that are expressible in real query languages based on relational calculus are all “safe” The definition is complicated, and is not in the text book. It can be found in Ullman’s book on databases (Principles of Database and Knowledge-Base Systems). Fall 2002 CSE330/CIS550 Handout 2
Translating from RA to DRC Recall that the relational algebra consists of , , , x, -. We need to work our way through the structure of an RA expression, translating each possible form. Let TR[e] be the translation of RA expression e into DRC. Relation names: For the RA expression R, the DRC expression is {<x1,x2, …, xn>| <x1,x2, …, xn> R} Fall 2002 CSE330/CIS550 Handout 2
Selection Suppose the RA expression is c(e’), where e’ is another RA expression with TR[e’]= {<x1,x2, …, xn>| p} Then the translation of c(e’) is {<x1,x2, …, xn>| pC’}, where C’ is the condition obtained from C by replacing each attribute with the corresponding variable. Example: TR[#1=#2 #4>2.5R] (where R has arity 4) is {<x1,x2, x3, x4>|< x1,x2, x3, x4> R x1=x2 x4>2.5} Fall 2002 CSE330/CIS550 Handout 2
Projection If TR[e]= {<x1,x2, …, xn>| p} then TR[i1,i2,…,im(e)]= {<x i1,x i2, …, x im >| xj1,xj2, …, xjk.p}, where xj1,xj2, …, xjk are variables in x1,x2, …, xn that are not in x i1,x i2, …, x im Example: With R as before, #1,#3 (R)={<x1,x3>| x2,x4. <x1,x2, x3,x4> R} Fall 2002 CSE330/CIS550 Handout 2
Union We know that R and S in RS must be union compatible, so they must have the same arity. Therefore we can assume that for e1e2, where e1, e2are algebra expressions, TR[e1]={<x1,…,xn>|p} and TR[e2]={<y1,…yn>|q}. Relabel the variables in the second so that TR[e2]={< x1,…,xn>|q’}. This may involve relabeling bound variables in q to avoid clashes. Then TR[e1e2]={<x1,…,xn>|pq’}. Example: TR[RS]= {< x1,x2, x3,x4>| <x1,x2, x3,x4>R <x1,x2, x3,x4>S Fall 2002 CSE330/CIS550 Handout 2
Other binary operators Difference: The same conditions hold as for union. So TR[e1]={<x1,…,xn>|p} and TR[e2]={< x1,…,xn>|q}. Then TR[e1- e2]= {<x1,…,xn>|pq} Product: If TR[e1]={<x1,…,xn>|p} and TR[e2]={< y1,…,ym>|q}, then TR[e1 e2]= {<x1,…,xn, y1,…,ym >| pq} Example: TR[RS]= {<x1,…,xn, y1,…,ym >| <x1,…,xn> R <y1,…,ym > S } Fall 2002 CSE330/CIS550 Handout 2
Summary We’ve seen how to translate relational algebra into (domain) relational calculus. There are various syntactic restrictions for guaranteeing the safety of a DRC query. From any of these we can translate back into relational algebra It was this correspondence between an (implementable and optimizable) algebra and first-order logic that was responsible for the initial development of relational databases – a prime example of some theory leading to highly successful practical developments! Fall 2002 CSE330/CIS550 Handout 2
What we cannot compute with relational algebra Aggregate operations, e.g. “The number of climbers who have climbed ‘Last Tango’” or “The average age of climbers.” These are possible in SQL. Recursive queries. Given a relation Parent(Parent, Child) compute the ancestor relation. This appears to call for an arbitrary number of joins. It is known that it cannot be expressed in first-order logic, hence it cannot be expressed in relational algebra. Fall 2002 CSE330/CIS550 Handout 2
What we cannot compute with relational algebra, cont Computing with complex structures that are not (1NF) relations, e.g. lists, arrays, multisets. Of course, we can always compute such things if we can “talk to” a database from a full-blown (Turing complete) programming language, and we’ll see how to do this later. However, communicating with a database in this way may well be inefficient, and adding computational power to a query language remains an important research topic. Fall 2002 CSE330/CIS550 Handout 2