1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Advanced Relational Algebra These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see
2 © Ellis Cohen Overview of Lecture Cross Joins and Natural Joins Joins & Renaming Natural Outer Joins Un-Natural Outer Joins Views Collection Operators Assignment & Reassignment Relational Equivalence & Completeness Relational Division
3 © Ellis Cohen Cross Joins & Natural Joins
4 © Ellis Cohen Natural Join Classic Relational Algebra Emps OtherEmpData REAL Emps |X| OtherEmpData SQL-92 (also in Oracle 9i, but not 8i) SELECT * FROM (Emps NATURAL JOIN OtherEmpData)
5 © Ellis Cohen Projected Join of 1:M Relationship To just get each employee's name & department name, do SELECT ename, dname FROM (Emps NATURAL JOIN Depts) ENAME DNAME ALLEN SALES MARTIN SALES BLAKE SALES KING ACCOUNTING TURNER SALES STERN SUPPORT (Emps |X| Depts){ ename, dname } empno ename addr deptno dname 7499ALLEN...30SALES 7654MARTIN…30SALES 7698BLAKE…30SALES 7839KING…10ACCOUNTING 7844TURNER…30SALES 7986STERN…50SUPPORT Emps |X| Depts
6 © Ellis Cohen REAL Parentheses Matter means (and in this case, neither would work) What we want is Restriction and Projection bind more tightly than Join Emps |X| Depts{ ename, dname } Emps |X| ( Depts{ ename, dname } ) (Emps |X| Depts){ ename, dname }
7 © Ellis Cohen Cross Product Joins SELECT * FROM Cats catid missy buffy puff SELECT * FROM Dogs dogid rover spot ubu duke SELECT * FROM Cats, Dogs catid dogid missy rover missy spot missy ubu missy duke buffy rover buffy spot buffy ubu buffy duke puff rover puff spot puff ubu puff duke Cats X Dogs
8 © Ellis Cohen Restricted Joins SELECT * FROM Cats, Dogs catid dogid missy rover missy spot missy ubu missy duke buffy rover buffy spot buffy ubu buffy duke puff rover puff spot puff ubu puff duke SELECT * FROM Cats, Dogs WHERE length(catid) = length(dogid) catid dogid missy rover buffy rover puff spot puff duke Restricted joins restrict a cross product with a join condition relating the columns from the specified tables Join condition (Cats X Dogs)[ length(catid) = length(dogid)]
9 © Ellis Cohen Cross Joins and Natural Joins In REAL You can only use a cross join when the joined relations have attribute names which are all different You can use a natural join to join relations on the attributes whose names are the same (Note: a natural join between two relations whose names are different is a cross join)
10 © Ellis Cohen DISTINCT Join Exercise List the names of employees who manage projects with budgets over empno ename job mgr hiredate sal comm deptno Projs pno pname pmgr persons budget pstart pend Emps
11 © Ellis Cohen Answer: DISTINCT Join Exercise List the names of employees who manage projects with budgets over empno ename job mgr hiredate sal comm deptno Emps Projs pno pname pmgr persons budget pstart pend SELECT DISTINCT ename FROM (Emps JOIN Projs ON empno = pmgr) WHERE budget > (Emps X Projs)[empno = pmgr][budget > ]{ ename ! } >
12 © Ellis Cohen Advanced Join/Grouping Exercise For each named department, list the average salary of the project managers in that department empno ename job mgr hiredate sal comm deptno Projs pno pname pmgr persons budget pstart pend Emps Depts deptno dname loc
13 © Ellis Cohen Advanced Join/Grouping Answer (Depts |X| Emps X Projs)[ empno = pmgr ] { empno, sal, dname ! }{ dname ! avgsal:avg(sal) } For each named department, list the average salary of the project managers in that department empno ename job mgr hiredate sal comm deptno Projs pno pname pmgr persons budget pstart pend Emps Depts deptno dname loc ( (Emps X Projs)[ empno = pmgr ]{empno !} |X| Emps |X| Depts ) { dname ! avgsal:avg(sal) } or avg()
14 © Ellis Cohen Joins & Renaming
15 © Ellis Cohen Using Natural Joins If you want to use a natural join, the joined attributes MUST have the same name. Use Renaming in the Relational Algebra empno ename job mgr hiredate sal comm deptno Emps Join Diagram Projs pno pname pmgr persons budget pstart pend (Emps{*, ep empno} |X| Projs{*, ep pmgr}) { pname, ename } empno ename job mgr hiredate sal comm deptno Emps Join Diagram Projs pno pname pmgr persons budget pstart pend ep (Emps X Projs)[ empno = pmgr ]{ pname, ename }
16 © Ellis Cohen Using Cross Joins If you want to use a cross join, all attributes MUST have different names. Use Renaming in the Relational Algebra Join Diagram (Emps{*, ed deptno} X Depts{*, dd deptno}) [ed = dd]{ ename, dname } Join Diagram (Emps |X| Depts){ ename, dname } empno ename job mgr hiredate sal comm deptno Depts deptno dname loc Emps empno ename job mgr hiredate sal comm deptno Depts deptno dname loc Emps ed dd
17 © Ellis Cohen Cross Joins and Qualified Names In the Relational Algebra, Cross Joined Relations MUST have different attribute names In SQL, this is not necessary, since qualified names can be used (Emps |X| Depts){ ename, dname } (Emps{*, ed deptno} X Depts{*, dd deptno}) [ed = dd]{ ename, dname } SELECT ename, dname FROM (Emps NATURAL JOIN Depts) SELECT ename, dname FROM Emps e, Depts d WHERE e.deptno = d.deptno
18 © Ellis Cohen Bulk Prefixing vs Qualified Names REAL does NOT have qualified names. But the bulk prefixing operator $$ adds a prefix to all attributes in a relation e$$Emps has attributes named e$empno, e$ename, …, e$deptno SELECT ename, dname FROM Emps e, Depts d WHERE e.deptno = d.deptno (e$$Emps X d$$Depts) [e$deptno = d$deptno]{ e$ename, d$dname }
19 © Ellis Cohen REAL Self Join Using Cross Join To use a cross join in REAL, the attribute names in the joined tables cannot overlap. empno ename job mgr hiredate sal comm deptno Emps e Join Diagram empno ename job mgr hiredate sal comm deptno Emps m (Emps{ ename, mgr } X Emps{ empno, mname:ename }) [mgr = empno]{ ename, mname } mname SELECT e.ename, m.ename as mname FROM (Emps e JOIN Emps m ON e.mgr = m.empno) (e$$Emps X m$Emps) [e$mgr = m$empno]{ e$ename, m$ename }
20 © Ellis Cohen REAL Self Join Using Natural Join To use a Natural Join in REAL, the ONLY attribute names that overlap are the ones to be joined empno ename job mgr hiredate sal comm deptno Emps e Join Diagram empno ename job mgr hiredate sal comm deptno Emps m (Emps{ ename, em:mgr } |X| Emps{ em:empno, mname:ename }) { ename, mname } mname SELECT e.ename, m.ename as mname FROM (Emps e JOIN Emps m ON e.mgr = m.empno) em
21 © Ellis Cohen Natural Outer Joins
22 © Ellis Cohen Natural Outer Joins :X|NATURAL LEFT JOIN |X:NATURAL RIGHT JOIN :X:NATURAL FULL JOIN
23 © Ellis Cohen Natural Left Join DNAME ENAME ACCOUNTING CLARK RESEARCH SALES ALLEN SALES BLAKE OPERATIONS 10ACCOUNTING… 20RESEARCH… 30SALES… 40OPERATIONS… deptno dname … Depts Emps * * empno ename … deptno 7782CLARK… ALLEN… JOJO… 7698BLAKE…30 Show names and department names of the employees in each department. Include departments with no employees. (Depts :X| Emps){ dname, ename } SELECT dname, ename FROM Depts NATURAL LEFT JOIN Emps
24 © Ellis Cohen Natural Right Joins Show the names and department names of all employees. Include employees who are unassigned. ENAME DNAME CLARK ACCOUNTING ALLEN SALES JOJO BLAKE SALES SELECT ename, dname FROM Depts NATURAL RIGHT JOIN Emps 10ACCOUNTING… 20RESEARCH… 30SALES… 40OPERATIONS… deptno dname … Depts empno ename … deptno Emps 7782CLARK… ALLEN… JOJO… 7698BLAKE…30 * (Depts |X: Emps){ ename, dname }
25 © Ellis Cohen Natural Full Join DNAME ENAME JOJO ACCOUNTING CLARK RESEARCH SALES ALLEN SALES BLAKE OPERATIONS SELECT dname, ename FROM Depts NATURAL FULL JOIN Emps 10ACCOUNTING… 20RESEARCH… 30SALES… 40OPERATIONS… deptno dname … Depts empno ename … deptno Emps 7782CLARK… ALLEN… JOJO… 7698BLAKE…30 * * * (Depts :X: Emps){ dname, ename }
26 © Ellis Cohen Un-Natural Outer Joins
27 © Ellis Cohen Un-Natural Outer Joins SELECT pname, ename FROM (Projs LEFT JOIN Emp ON pmgr = empno) For each project, list its name, and the name of its manager (show NULL if there is no manager) ( Projs :X| Emps{ pmgr:empno, ename } ){ pname, ename } empno ename job mgr hiredate sal comm deptno Emps Projs pno pname pmgr persons budget pstart pend pmgr REAL only has natural outer joins. Renaming can be used so join attributes have the same name
28 © Ellis Cohen Problem: Joining Cats & Dogs Given Cats( catid ) and Dogs( dogid ), for each cat list the dogs whose names are the same length, but make sure each cat is listed (will a NULL dog if necessary) SELECT catid, dogid FROM (Cats LEFT JOIN Dogs ON length(catid) = length(dogid)) Real only supports NATURAL OUTER JOINS. How can this be done in REAL?
29 © Ellis Cohen Outer Rejoin Idiom SELECT catid, dogid FROM (Cats LEFT JOIN Dogs ON length(catid) = length(dogid)) This requires an un-natural outer join, but in REAL, all outer joins are natural! There's a standard REAL outer rejoin idiom: 1.Perform a cross join 2.Restrict it with the join condition 3.Then, perform the natural outer join Cats :X| (Cats X Dogs) [length(catid) = length(dogid)]
30 © Ellis Cohen Outer Joins with Renaming SELECT pname, ename FROM (Projs LEFT JOIN Emp ON pmgr = empno) For each project, list its name, and the name of its manager (show NULL if there is no manager) empno ename job mgr hiredate sal comm deptno Emps Projs pno pname pmgr persons budget pstart pend How can this be written in REAL without renaming? ( Projs :X| Emps{ pmgr:empno, ename } ){ pname, ename } empno
31 © Ellis Cohen Using the Rejoin Idiom SELECT pname, ename FROM (Projs LEFT JOIN Emp ON pmgr = empno) For each project, list its name, and the name of its manager (show NULL if there is no manager) empno ename job mgr hiredate sal comm deptno Emps Projs pno pname pmgr persons budget pstart pend (Projs :X| (Projs X Emps)[pmgr = empno]){ pname, ename } What's the REAL to list the name of every employee, and (if they have a manager), their manager's name
32 © Ellis Cohen REAL Alternatives List the name of every employee, and, (if they have a manager), the name of their manager empno ename job mgr hiredate sal comm deptno Emps e Join Diagram empno ename job mgr hiredate sal comm deptno Emps m (Emps{ ename, em:mgr } :X| Emps{ em:empno, mname:ename }) { ename, mname } mname em (e$$Emps :X| (e$$Emps X m$Emps)[e$mgr = m$empno]) { ename:e$ename, mname:m$ename }
33 © Ellis Cohen Views
34 © Ellis Cohen Creating Tables vs. Views HiEmps: Emps [sal > 1500] CREATE VIEW HiEmps AS SELECT * FROM Emps WHERE sal > 1500 Creates a new table. Subsequent changes to Emps will not be reflected in HiEmps. Just creates a view based on Emps. Every use of HiEmps actually refers to its underlying base tables – in this case, Emps. HiEmps Emps[sal > 1500] CREATE TABLE HiEmps AS SELECT * FROM Emps WHERE sal > 1500
35 © Ellis Cohen Database Engines Expand based on Relational Algebra Suppose we define CREATE VIEW HiEmps AS SELECT * FROM Emps WHERE sal > 1500 HiEmps: Emps[ sal > 1500] and then execute the query SELECT ename, job FROM HiEmps HiEmps{ ename, job } The database engine automatically turns this into SELECT ename, job FROM Emps WHERE sal > 1500 Emps[ sal > 1500 ]{ ename, job }
36 © Ellis Cohen SQL & REAL Factoring WITH DeptMgrs AS (SELECT * FROM Emps WHERE job = 'DEPTMGR') SELECT pname, ename FROM (Projs p JOIN DeptMgrs m ON p.pmgr = m.empno) (DeptMgrs: Emps[job = 'DEPTMGR']) ( (Projs X DeptMgrs) [empno = pmgr]{ pname, ename } )
37 © Ellis Cohen Collection Operators
38 © Ellis Cohen Unions event evdate ALLEN20-FEB-81 MARTIN28-SEP-81 BLAKE01-MAY-81 KING17-NOV-81 TURNER08-SEP-81 STERN23-NOV-99 Running Amuck12-FEB-82 Cooling Off01-JAN-05 Lifting Off01-JAN-05 Emps{ event:ename, evdate:hiredate } U Projs{ event:pname, evdate:pstart } SELECT ename AS event, hiredate AS evdate FROM Emps UNION SELECT pname AS event, pstart AS evdate FROM Projs
39 © Ellis Cohen SQL & REAL Union SELECT * FROM A1 UNION ALL SELECT * FROM A2 SELECT * FROM A UNION SELECT * FROM B A1 #U A2 (A1 #U A2){ * ! } Use if there are duplicates to be gotten rid of A1 U A2 Set Union eliminates duplicates Counted Union counts # of tuples Unlike SQL, in REAL, the names & types must conform
40 © Ellis Cohen Counted Union vs. Set Union Emps[sal ≥ 3000]{ job } ANALYST PRESIDENT ANALYST Emps[sal ≥ 3000]{ job } U# Emps[hiredate > '1-jan-82']{ job } ANALYST PRESIDENT ANALYST CLERK Emps[hiredate > '1-jan-82']{ job } ANALYST CLERK Emps[sal ≥ 3000]{ job } U Emps[hiredate > '1-jan-82']{ job } ANALYST CLERK PRESIDENT Set Union eliminates duplicates Counted Union counts # of tuples & includes all of them
41 © Ellis Cohen SQL & REAL Difference and Intersect SELECT * FROM A1 EXCEPT ALL SELECT * FROM A2 SELECT * FROM A EXCEPT SELECT * FROM B A1 #— A2 A1{ * ! } #— A2{ * ! } SELECT * FROM A1 INTERSECT ALL SELECT * FROM A2 SELECT * FROM A INTERSECT SELECT * FROM B A1 # A2 A1{ * ! } # A2{ * ! } A1 — A2 A1 A2 (A1 # A2){ * ! }
42 © Ellis Cohen Counted Difference vs. Set Difference Emps[sal ≥ 3000]{ job } ANALYST PRESIDENT ANALYST Emps[sal ≥ 3000]{ job } — # Emps[hiredate > '1-jan-82']{ job } ANALYST PRESIDENT Emps[hiredate > '1-jan-82']{ job } ANALYST CLERK Emps[sal ≥ 3000]{ job } — Emps[hiredate > '1-jan-82']{ job } PRESIDENT Set Difference eliminates duplicates before taking the difference Counted Difference counts # of tuples
43 © Ellis Cohen Counted Intersect vs. Set Intersect Emps[deptno = 20]{ job } CLERK MANAGER ANALYST CLERK ANALYST Emps[deptno = 20]{ job } # Emps[hiredate > '1-jan-82']{ job } ANALYST CLERK Emps[hiredate > '1-jan-82']{ job } ANALYST CLERK Emps[deptno = 20]{ job } Emps[hiredate > '1-jan-82']{ job } ANALYST CLERK Set Intersect eliminates duplicates Counted Intersect counts # of tuples & computes minimum #
44 © Ellis Cohen Representing Outer Joins using Collection Operators (EmptyDepts: Depts{deptno} – Emps{deptno}) ( (Emps |X| Depts) { dname, ename } U (EmptyDepts |X| Depts) { dname, ename }) (Depts :X Emps){ dname, ename }
45 © Ellis Cohen Assignment & Reassignment
46 © Ellis Cohen REAL Query & Assignment Two kinds of REAL statements: 1.Expression (query) e.g. Emps[ sal < 1000 ] 2.Assignment e.g. PoorEmps Emps[ sal < 1000 ] CREATE TABLE PoorEmps AS SELECT * FROM Emps WHERE sal < 1000
47 © Ellis Cohen REAL Reassignment In REAL, you can also reassign to an existing relation Emps Emps[ sal IS NOT < 1000 ] corresponds to DELETE Emps WHERE sal < 1000
48 © Ellis Cohen Insert as Reassignment Emps Emps #U OtherEmps[ sal < 1000 ] INSERT INTO Emps SELECT * FROM OtherEmps WHERE sal < 1000
49 © Ellis Cohen Update as Reassignment Emps Emps[ job IS NOT = 'DEPTMGR' ] # U Emps[ job = 'DEPTMGR' ]{ *, sal:(sal + 200) } UPDATE Emps SET sal = sal WHERE job = DEPTMGR'
50 © Ellis Cohen Relational Equivalence & Completeness
51 © Ellis Cohen Primitive SQL Suppose we define Primitive SQL. It is exactly like standard SQL queries (SELECTs only), except we remove numeric/string operations (+, –, ||, etc.) ordinary or aggregate functions duplicates (no base relations have duplicates, and all SELECTs automatically remove duplicates).
52 © Ellis Cohen Primitive Relational Algebra Imagine a Relational Algebra which ONLY has comparison operators (<, =, etc.) logical operators (AND, OR, NOT) restriction and simple projection cross joins and natural inner joins set-oriented collection operators It does NOT have numeric/string operations (+, –, ||, etc.) ordinary or aggregate functions duplicates (no base relation has duplicates, and algebra operation automatically removes duplicate tuples) grouping outer joins This is the Primitive Relational Algebra
53 © Ellis Cohen Primitive Tuple Relational Calculus The Primitive Tuple Relational Calculus ONLY has tuple variables with qualified attributes (e.g. e.sal) comparison operators (<, =, etc.) tuple membership tests (e.g. e Emps ) logical operators (AND, OR, NOT) quantification expressions SOME e1, e2, … SATISFIES … EACH e1, e2, … SATISFIES … EACH e1, e2, … WHERE … SATISFIES … simple SELECT (SELECT … WHERE …) at the outermost level only It does NOT have numeric/string operations (+, –, ||, etc.) ordinary or aggregate functions duplicates (no base relations have duplicates, and SELECT does not produce duplicates) any mechanism for grouping collection operations (UNION, INTERSECT, EXCEPT) join operators
54 © Ellis Cohen Primitive Domain Relational Calculus The Primitive Domain Relational Calculus ONLY has domain variables comparison operators (<, =, etc.) attribute matching tests (e.g. Emps( job: 'CLERK' )) logical operators (AND, OR, NOT) quantification expressions SOME e1, e2, … SATISFIES … EACH e1, e2, … SATISFIES … EACH e1, e2, … WHERE … SATISFIES … simple SELECT (SELECT … WHERE …) at the outermost level only It does NOT have numeric/string operations (+, –, ||, etc.) ordinary or aggregate functions duplicates (no base relations have duplicates, and SELECT does not produce duplicates) any mechanism for grouping collection operations (UNION, INTERSECT, EXCEPT) join operators
55 © Ellis Cohen Relational Equivalence Exactly the same set of queries can be expressed using Primitive SQL The Primitive Relational Algebra The Safe Primitive Tuple Relational Calculus The Safe Primitive Domain Relational Calculus
56 © Ellis Cohen Relational Completeness Any query language which can express a query equivalent to any query expressible in the Primitive Relational Algebra is relationally complete. So Primitive SQL The Safe Primitive Tuple Relational Calculus The Safe Domain Relational Calculus are all relationally complete
57 © Ellis Cohen Relational Division
58 © Ellis Cohen Cool Projects Given Emps( empno, ename, job, … ) Projects( pno, pname, … ) Assigns( empno, pno ) empno references Emps pno references Projects CoolProjs( pno ) pno references Projects Problem Use SQL or REAL to List the cool employees -- those who are assigned to at least one of the cool projects
59 © Ellis Cohen Cool Employees SELECT DISTINCT empno FROM Assigns NATURAL JOIN CoolProjs (Assigns |X| CoolProjs){ empno ! } That's easy! How about the ultra-cool employees: those assigned to every one of the cool projects This is called a Relational Division problem
60 © Ellis Cohen Ultra Cool in the Primitive Domain Relational Calculus SELECT eno WHERE (EACH cpno WHERE ?CoolProjs( pno: cpno ) SATISIFIES ?Assigns( empno: eno, pno: cpno ))
61 © Ellis Cohen Ultra Cool in the Primitive Tuple Relational Calculus SELECT e.empno WHERE (e Emps ) AND (EACH c WHERE (c CoolProjs ) SATISFIES (SOME a SATISFIES (a Assigns ) AND (a.empno = s.empno) AND (a.pno = c.pno)))
62 © Ellis Cohen Ultra Cool in Extended SQL SELECT empno FROM Emps e WHERE (EACH CoolProjs c SATISFIES SOME Assigns a SATISFIES a.empno = e.empno AND a.pno = c.pno)
63 © Ellis Cohen Ultra Cool in Extended SQL with Attribute Filtering SELECT empno FROM Emps e WHERE (EACH CoolProjs c SATISFIES (SOME Assigns( empno: e.empno, pno: c.pno )))
64 © Ellis Cohen Ultra Cool with Extended SQL Mapped to Standard SQL SELECT empno FROM Emps e WHERE NOT exists( SELECT * FROM CoolProjs c WHERE NOT exists( SELECT * FROM Assigns a WHERE a.empno = e.empno AND a.pno = c.pno) ) SELECT empno FROM Emps e WHERE (SELECT count(*) FROM CoolProjs) = (SELECT count(*) FROM CoolProjs c WHERE exists( SELECT * FROM Assigns a WHERE a.empno = e.empno AND a.pno = c.pno) )
65 © Ellis Cohen Ultra Cool without Aggregation (based on the Primitive Relational Algebra) REAL (without using Aggregation) ( AsnEmps: Assigns{ empno ! }, CoolAssigns: CoolProjs X AsnEmps, MissingAssigns: CoolAssigns – Assigns, UncoolEmps: MissingAssigns{ empno ! } ) AsnEmps – UncoolEmps CORRESPONDING SQL WITH AsnEmps AS (SELECT DISTINCT empno FROM Assigns) MissingAssigns AS (SELECT empno, pno FROM CoolProjs, AsnEmps EXCEPT SELECT empno, pno FROM Assigns) SELECT empno FROM AsnEmps EXCEPT SELECT empno FROM MissingAssigns
66 © Ellis Cohen Best Ultra Cool SQL BEST SQL SOLUTION SELECT empno FROM (Assigns NATURAL JOIN CoolProjs) GROUP BY empno HAVING count(*) = (SELECT count(*) FROM CoolProjs) CORRESPONDING REAL (EmpNumAssigns: (Assigns |X| CoolProjs){ empno ! epk:count(*) }, ProjKnt: CoolProjs{ ! pk:count(*) }) (EmpNumAssigns X CoolKnt)[pk = epk]{ empno }
67 © Ellis Cohen Best REAL Answer The ultra-cool employees: those assigned to every one of the cool projects Use the REAL Divide operator Assigns ÷ CoolProjs
68 © Ellis Cohen Why Division? Because it's the inverse of JOIN (which is like multiplication) Assigns (Assigns ÷ CoolProjs) X CoolProjs In general, RelA = (RelA X RelB) ÷ RelB RelC (RelC ÷ RelB) X RelB