Lecture 3 Relational Algebra and SQL CM036: Advanced Database Lecture 3 Relational Algebra and SQL
Content 1 Models, Languages and their Use in databases 2 The Relational Algebra (RA) 3 Simulating RA operations in SQL CM036: Advanced Databases Lecture 3: Relational Languages
1.1 Relational and other data models Relational models (relational algebra, relational calculus) – most of the contemporary RDBMS are based on them Tree models (hierarchical, object-relational) – both legacy systems and new systems use them Object-oriented models (ODMG) – recent development, still not widely employed Note: XML native databases have some similarities with the hierarchical database systems (legacy systems), but they have more elaborated model and query languages, which are close to OQL (the standard query language of object-oriented databases) CM036: Advanced Databases Lecture 3: Relational Languages
1.2 Relational Languages and their Use Data Manipulation Language (DML) Use: Populates, updates, and queries relational DB Example: relational algebra, SQL DML Data Definition Language (DDL) Use: Specifies the data structures and defines the relational schema Example: domain calculus, SQL DDL Data Control Language (DCL) Use: Specifies operation permissions, resource access discipline and user profiles Example: SQL DCL, LDAP Note: Contemporary relational languages often incorporate some object-relational features of the model – e.g. Oracle 8i SQL has types, Oracle 9i SQL has type inheritance CM036: Advanced Databases Lecture 3: Relational Languages
1.3 Can DB live without formal model? The answer is NO for several reasons: as we will see, SQL has ambiguities, while the relational algebra is unambiguous – so it can provide semantic interpretation for SQL; Moreover, because of the same reason, SQL cannot be executed directly, it needs to be translated into a realistic structure of operations first, which can then be interpreted; Finally, if we want to control the execution of the SQL statements, we need to know how it works. CM036: Advanced Databases Lecture 3: Relational Languages
2 The Relational Algebra Proposed by Codd in 1970 as a formal data model. Describes the relations and the operations to manipulate relations Relational operations in relational algebra transform either a single relation (unary operation), or a pair (binary operation) into another relation Can also be used to specify retrieval requests (queries). Query result is also in the form of a relation. Relational Operations: RESTRICT (s) and PROJECT () unary operations. Set operations: UNION (), INTERSECTION (), DIFFERENCE (—), CARTESIAN PRODUCT (). JOIN operations (⋈) are binary. Other relational operations: DIVISION, OUTER JOIN, AGGREGATE FUNCTIONS. CM036: Advanced Databases Lecture 3: Relational Languages
CM036: Advanced Databases Lecture 3: Relational Languages
2.1 RESTRICT s RESTRICT operation (called also SELECT- denoted bys ): Selects the tuples (rows) from a relation R that satisfy a certain selection condition c on the attributes of R : s c(R) Resulting relation includes each tuple in R whose attribute values satisfy c, i.e. it has the same attributes as R Examples: s DNO=4(EMPLOYEE) s SALARY>30000(EMPLOYEE) s(DNO=4 AND SALARY>25000) OR DNO=5(EMPLOYEE) CM036: Advanced Databases Lecture 3: Relational Languages
2.2 PROJECT P PROJECT operation (denoted by ): Keeps only certain attributes (columns) from a relation R specified in an attribute list L: L(R) Resulting relation has only those attributes of R specified in L Example: FNAME,LNAME,SALARY(EMPLOYEE) The PROJECT operation eliminates duplicate tuples in the resulting relation so that it remains a true set (no duplicate elements). Example: SEX,SALARY(EMPLOYEE) If several male employees have salary 30000, only a single tuple <M, 30000> is kept in the resulting relation. CM036: Advanced Databases Lecture 3: Relational Languages
CM036: Advanced Databases Lecture 3: Relational Languages
2.3 Combining Operations Because of closure, several operations can be combined to form a relational algebra expression. For example, the names and salaries of employees in Department 4: FNAME,LNAME,SALARY(DNO=4(EMPLOYEE)) Alternatively, we could specify explicit intermediate relations for each step: TEMP DNO=4(EMPLOYEE) R FNAME,LNAME,SALARY(TEMP) Attributes can optionally be renamed in a left-hand-side relation (this may be required for some operations that will be presented later), e.g. R (FIRSTNAME,LASTNAME,SALARY) FNAME,LNAME,SALARY(TEMP) CM036: Advanced Databases Lecture 3: Relational Languages
CM036: Advanced Databases Lecture 3: Relational Languages
2.4 Set Operations Binary operations from set theory: UNION: R1 R2,INTERSECTION: R1 R2, DIFFERENCE: R1 — R2, For , , —, the operand relations R1(A1, ..., An) and R2(B1, ..., Bn) must have the same number of attributes, and the domains of attributes must be compatible; that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n. This condition is called union compatibility. The resulting relation for , , or — has the same attribute names as the first operand relation R1 (by convention). CM036: Advanced Databases Lecture 3: Relational Languages
CM036: Advanced Databases Lecture 3: Relational Languages
t [A1, A2, ..., Am] = t1 and t [B1, B2, ..., Bn] = t2 More Set Operations CARTESIAN PRODUCT R(A1, A2, ..., Am, B1, ..., Bn) R1(A1, A2, ..., Am) R2 (B1, ..., Bn) A tuple t exists in R for each combination of tuples t1 from R1 and t2 from R2 such that: t [A1, A2, ..., Am] = t1 and t [B1, B2, ..., Bn] = t2 3 attributes + 2 attributes = 5 attributes = R1 A B C X R2 D E A B C D E R 1 2 3 a b 1 2 3 a b 2 3 4 b c 1 2 3 b c 3 4 5 2 3 4 a b 2 3 4 b c 3 tuples * 2 tuples = 6 tuples 3 4 5 a b 3 4 5 b c The resulting relation R has m1 + m2 columns If R1 has n1 tuples and R2 has n2 tuples, then R will have n1*n2 tuples. CM036: Advanced Databases Lecture 3: Relational Languages
CARTESIAN PRODUCT – continued Obviously, CARTESIAN PRODUCT is useless if alone, since it generates all possible combinations. It can combine related tuples from two relations in a more informative way if followed by the appropriate RESTRICT operation Example: Retrieve a list of the names of dependents for each female employee FEMALE_EMPS s SEX=‘F’(EMPLOYEE) EMPNAMES FNAME,LNAME,SSN(FEMALE_EMPS) EMP_DEPENDENTS EMPNAMES DEPENDENT ACTUAL_DEPENDENTS s SSN=ESSN(EMP_DEPENDENTS) RESULT FNAME,LNAME,DEPENDENT_NAME(ACTUAL_DEPENDENTS) CM036: Advanced Databases Lecture 3: Relational Languages
CM036: Advanced Databases Lecture 3: Relational Languages
2.5 JOIN Operations THETA JOIN: Similar to a CARTESIAN PRODUCT followed by a RESTRICT. The condition c is called a join condition. R(A1, A2, ..., Am, B1, B2, ..., Bn) R1(A1, A2, ..., Am) ⋈ c R2 (B1, B2, ..., Bn) EQUIJOIN: The join condition c includes equality comparisons involving attributes from R1 and R2. That is, c is of the form: (Ai=Bj) AND ... AND (Ah=Bk); 1<i,h<m, 1<j,k<n In the above EQUIJOIN operation: Ai, ..., Ah are called the join attributes of R1 Bj, ..., Bk are called the join attributes of R2 Example: Retrieve each department's name and its manager's name: T DEPARTMENT ⋈MGRSSN=SSN EMPLOYEE RESULT DNAME,FNAME,LNAME(T) CM036: Advanced Databases Lecture 3: Relational Languages
DEPT_MGR DNAME,…,MGRSSN,…LNAME,…,SSN… (DEPARTMENT ⋈ MGRSSN=SSN EMPLOYEE) CM036: Advanced Databases Lecture 3: Relational Languages
Natural Join In an EQUIJOIN R R1 ⋈c R2, the join attribute of R2 appear redundantly in the result relation R. In a NATURAL JOIN, the join attributes of R2 are eliminated from R. The equality is implied and there is no need to specify it. The form of the operator is R R1 * R2 Example: Retrieve each project's details along with the details of its department: Step 1: (Rename DNUMBER to DNUM) DEPT (DNAME, DNUM, MGRSSN, MGRSTARTDATE) DNAME, DNUMBER, MGRSSN, MGRSTARTDATE (DEPARTMENT) Step 2: (Now both DEPT and PROJECT have DNUM) PROJ_DEPT PROJECT * DEPT CM036: Advanced Databases Lecture 3: Relational Languages
CM036: Advanced Databases Lecture 3: Relational Languages
Multiple Join Example: Retrieve each employee’s name and the name of the department he/she works for: T EMPLOYEE ⋈ DNO=DNUMBER DEPARTMENT RESULT FNAME,LNAME,DNAME(T) JOIN ATTRIBUTES RELATIONSHIP EMPLOYEE.SSN = DEPARTMENT.MGRSSN EMPLOYEE manages the DEPARTMENT EMPLOYEE.DNO = DEPARTMENT.DNUMBER EMPLOYEE works in the DEPARTMENT CM036: Advanced Databases Lecture 3: Relational Languages
Self Join A relation can have a set of join attributes to join it with itself : JOIN ATTRIBUTES RELATIONSHIP EMPLOYEE(1).SUPERSSN= EMPLOYEE(2) supervises EMPLOYEE(2).SSN EMPLOYEE(1) One can think of this as joining two distinct copies of the relation, although only one relation actually exists In this case, renaming can be useful Example: Retrieve each employee’s name and the name of his/her supervisor: SUPERVISOR(SSSN,SFN,SLN) SSN,FNAME,LNAME(EMPLOYEE) T EMPLOYEE ⋈SUPERSSN = SSSNSUPERVISOR RESULT FNAME,LNAME,SFN,SLN(T) CM036: Advanced Databases Lecture 3: Relational Languages
2.6 Complete Set of Operations All the operations discussed so far can be described as a sequence of only the operations RESTRICT, PROJECT, UNION, SET DIFFERENCE, and CARTESIAN PRODUCT. Hence, the set {s , , ,—, } is called a complete set of relational algebra operations. Any query language equivalent to these operations is called relationally complete. For database applications, additional operations are needed that were not part of the original relational algebra. These include: 1. Aggregate functions and grouping. 2. OUTER JOIN and OUTER UNION. CM036: Advanced Databases Lecture 3: Relational Languages
2.7 More Relational Operations OUTER JOIN In a regular EQUIJOIN or NATURAL JOIN operation, tuples in R1 or R2 that do not have matching tuples in the other relation do not appear in the result. Some queries require all tuples in R1 (or R2 or both) to appear in the result When no matching tuples are found, nulls are placed for the missing attributes LEFT OUTER JOIN: R1 R2 lets every tuple in R1 appear in the result RIGHT OUTER JOIN: R1 R2 lets every tuple in R2 appear in the result FULL OUTER JOIN: R1 R2 lets every tuple in both R1 and R2 appear in the result CM036: Advanced Databases Lecture 3: Relational Languages
TEMP EMPLOYEE SSN=MGRSSNDEPARTMENT RESULT FNAME,MINIT,LNAME,DNAME(TEMP) CM036: Advanced Databases Lecture 3: Relational Languages
3 Simulating RA operations All relational algebra operations can be simulated in SQL using SELECT statements only. For this purpose the following variations of different parameters in the SELECT clauses can be used: The list of attributes to be selected in the SELECT clause The number of tables to be looked into in the FROM clause The conditions specified in the WHERE clause The resulting relation printed out or transferred row-by-row to some variables if INTO clause is present (e.g. using PL/SQL). There is no structural correspondence between the expressions of relational algebra and the SQL expressions; in a single SQL statement the following combinations are possible: single unary operator applied to one single relation single binary operator applied to pair of relations sequence of unary operators applied inner side out to one relation and all the intermediate results combinations of binary and unary operators applied to several relations and the intermediate results according nesting rules CM036: Advanced Databases Lecture 3: Relational Languages
3.1 Summary of RA vs. SQL CM036: Advanced Databases Lecture 3: Relational Languages