1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Basic Relational Algebra These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see
2 © Ellis Cohen Overview of Lecture Relational Algebra using REAL Operator Composition Extended Projection Comparisons Case Expressions Duplicate Elimination Aggregate Functions Distinct Aggregation
3 © Ellis Cohen Relational Algebra using REAL
4 © Ellis Cohen Algebra Domain (e.g. numbers) Operators (e.g. for numbers) Unary Operators (e.g. Unary Minus) - 3 -3 - (-7) 7 Binary Operators (e.g. Sum, Product) 10 3 * -5 -15 An algebra is closed Apply the operators to values in the domain, and the result is ALWAYS another value from the domain
5 © Ellis Cohen Relations A relation is just a collection of tuples A relation corresponds to both a SQL table a SQL result set (i.e. the result of executing a SQL query)
6 © Ellis Cohen Relational Algebra Domain: Relations Unary Operators: Restrict (~ like the WHERE clause) Project (~ like the SELECT clause) Binary Operators: Joins (Cross, Inner & Outer Natural) Collection Operators (Union, …) Division (inverse of cross Join) The Relational Algebra DOES NOT have subqueries They're not needed!
7 © Ellis Cohen REAL Relations as Unordered Bags 1.Relations in REAL are unordered No way to express ORDER BY in the Relational Algebra 2.Most Relational Algebras (including the Classic Relational Algebra) do not allow duplicates tuples in a relation (or support aggregation or grouping) REAL does allow duplicates (formally, relations are bags, not sets) The language we use for the relational algebra is called REAL Relation Expression and Assignment Language
8 © Ellis Cohen Unary Relation Operators In the relational algebra, a unary relation operator is applied to a relation and produces a relation as its result In Mathematical Terms Unary Operator: Relation Relation That is, if O is a relation operator and R is a relation, then O (R) is also a relation
9 © Ellis Cohen Unary Relational Operators Two important relational operators: Restrict: Chooses subset of rows Project: Chooses subset of columns
10 © Ellis Cohen Restrict Operator Standard forms of restrict operator SELECT * FROM Emps WHERE sal > 1550 Restrict [sal > 1550] ( Emps ) σ sal > 1550 ( Emps ) SQL Equivalent Restriction is a unary relation operator: Apply it to a relation, and the result is another relation REAL restrict operator Emps[ sal > 1550 ] Restriction checks one tuple at a time and keeps tuples which satisfy the restriction condition
11 © Ellis Cohen The Restriction Machine Emps[ sal > 1550 ] empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN Emps 7499ALLEN BLAKE KING empno ename deptno sal comm [ sal > 1550 ] Start with a relation Run it through the restriction machine Get a new relation as a result
12 © Ellis Cohen Project Operator REAL project operator Emps{ empno, ename } Standard forms of project operator SELECT empno, ename FROM Emps Project [empno,ename] ( Emps ) π empno,ename ( Emps ) SQL Equivalent Restriction checks one tuple at a time and keeps tuples which satisfy the restriction condition
13 © Ellis Cohen The Projection Machine Emps{ empno, ename } empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN ALLEN 7654MARTIN 7698BLAKE 7839KING 7844TURNER 7986STERN empno ename { empno, ename } Start with a relation Run it through the projection machine Get a new relation as a result
14 © Ellis Cohen Why Learn REAL? Semantic Clarity –REAL is simpler than SQL ands helps explain how SQL works –Clear semantics allows us to reason about systems and is the basis for optimizing queries Succinctness –Can be a good shorthand for describing complicated queries and assertions. It can sometimes be easier to write complicated queries and assertions in REAL first, and then translate them to SQL
15 © Ellis Cohen Operator Composition
16 © Ellis Cohen Closure and Composition An algebra is closed Apply the operators to values in the domain, and the result is ALWAYS another value from the domain So, operators can be composed ((5 + 7) * 3) + 8 (12 * 3) + 8 44
17 © Ellis Cohen Operator Composition Standard forms of composition SELECT empno, ename FROM Emps WHERE sal > 1550 Project [empno,ename] ( Restrict [sal < 1550] ( Emps ) ) π empno,ename ( σ sal < 1550 ( Emps ) ) SQL Equivalent REAL composition Emps[ sal > 1550 ]{ empno, ename }
18 © Ellis Cohen REAL Composition Processed L-to-R Emps[ sal > 1550 ]{ empno, ename } Emps -- the base relation containing employees Emps[ sal > 1550 ] -- Emps restricted to the employees with sal > 1500 Emps[ sal > 1550 ]{ empno, ename } -- First we restrict Emps (to the employees with sal > 1500) -- Then, from that, we extract only the empno and ename attributes
19 © Ellis Cohen Composition Yields Relations Emps[ sal > 1550 ]{ empno, ename } empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN Emps 7499ALLEN BLAKE KING empno ename deptno sal comm 7499ALLEN 7698BLAKE 7839KING empno ename step 1 step 2 Can you do the project and the restrict in the opposite sequence? [ sal > 1550 ] step 1 { empno, ename }
20 © Ellis Cohen Sequence Matters Emps{ empno, ename }[ sal > 1550 ] empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN ALLEN 7654MARTIN 7698BLAKE 7839KING 7844TURNER 7986STERN empno ename step 2step 1 Fails! No sal attribute to restrict step 1 { empno, ename } [ sal > 1550 ] step 2
21 © Ellis Cohen Composition Exercise For each of the REAL expressions below Write the corresponding SQL Write a simpler equivalent REAL expression Emps{ ename, job, sal }[sal < 1000]{ ename, job } Emps[sal < 1000]{ empno, ename, job }[ job = 'CLERK']
22 © Ellis Cohen Combined Projection Emps{ ename, job, sal } -- get the ename, job & sal of each employee Emps{ ename, job, sal }[sal < 1000] -- get the ename, job & sal of each employee, -- and get those whose sal is less than 1000 Emps{ ename, job, sal }[sal < 1000]{ ename, job } -- get the ename, job & sal of each employee, -- get those whose sal is less than but really just get the ename & job -- NO POINT in restricting { ename, job, sal } first Emps[sal < 1000]{ ename, job } -- just get the employees whose sal is less than get just their ename & job SELECT ename, job FROM Emps WHERE sal < 1000
23 © Ellis Cohen Combined Restriction Emps[sal < 1000] -- get the employees whose sal is less than 1000 Emps[sal < 1000]{ empno, ename, job } -- get the employees whose sal is less than get their empno, ename & job Emps[sal < 1000]{ empno, ename, job }[ job = 'CLERK'] -- get the employees whose sal is less than get their empno, ename & job -- get that information, but only for the clerks Emps[sal < 1000][ job = 'CLERK'] Emps[ (sal < 1000) AND (job = 'CLERK') ] -- get the clerks whose sal is less than 1000 Emps[sal < 1000][job = 'CLERK']{ empno, ename, job } Emps[ (sal < 1000) AND (job = 'CLERK') ]{ empno, ename, job } -- get the clerks whose sal is less than get their empno, ename & job SELECT empno, ename, job FROM Emps WHERE (sal < 1000) AND (job = 'CLERK')
24 © Ellis Cohen Transformation Rules for Algebras Elementary Algebra Over: Over: Numbers Operators Operators (include) Sum (a.k.a +) Product (a.k.a. *)Rules Commutative: Sum(a,b) ↔ Sum(b,a) a + b ↔ b + a Associative: a + (b + c) ↔ ( a + b) + c Distributive: a * (b + c) ↔ a*b + a*c Relational Algebra Over: Over: Relations Operators Operators (include) Restrict (subset of rows) Project (subset of columns)Rules What are they? Algebras have rules for transforming one algebraic expression into another
25 © Ellis Cohen Some Rules for Restrict Emps[ sal < 1000 ][ job = 'CLERK' ] Emps[ job = 'CLERK' ][ sal < 1000 ] Commutativity Rule for Restrict R[C1][C2] ↔ R[C2][C1] Emps[ (sal < 1000) AND (job = 'CLERK') ] Conjunction Rule for Restrict R[C1][C2] ↔ R[ C1 AND C2 ] What are some other rules for REAL?
26 © Ellis Cohen Extended Projection
27 © Ellis Cohen Calculating & Naming Attributes SELECT empno, ename AS empname, job, (sal * 52) AS yrsal FROM Emps Emps{ empno, empname:ename, job, yrsal:(sal*52) } Named Projection with Calculation in REAL
28 © Ellis Cohen Bulk Prefixing Given Emps (empno, ename, job, sal, comm) We want our query result to have all the same attributes, but all prefixed in the same way empno ename job sal comm z_empno z_ename z_job z_sal z_comm SELECT empno AS z_empno, ename AS z_ename, job AS z_job, sal AS z_sal, comm AS z_comm FROM Emps Emps{ z_(*):* } or just z_$Emps Bulk Attribute Naming in REAL
29 © Ellis Cohen Attribute Removal Given Emps (empno, ename, job, sal, comm) We want our query result to have all the same attributes, but with some of them removed empno ename job sal comm empno ename sal comm SELECT empno, ename, sal, comm FROM Emps Emps{ *, job } Attribute Removal in REAL note: job not listed Then, remove job First, include all of Emps attributes
30 © Ellis Cohen Attribute Replacement Given Emps (empno, ename, job, sal, comm) We want our query result to have all the same attributes, but with some attribute names replaced empno ename job sal comm empno empname job wksal comm SELECT empno, ename AS empname, job, sal AS wksal, comm FROM Emps Emps{ *, empname ename, wksal sal } Attribute Replacement in REAL Then, replace ename by empname First, include all of Emps attributes
31 © Ellis Cohen Relational Algebra Exercise Assume sal is the weekly salary, and that all employees are paid 52 weeks/year. a)Write the REAL expressions to list the names, weekly salary (as wksal) and yearly salaries (as yrsal) of employees whose yearly salary is more than 70,000. b)Just list their names & weekly salaries (as wksal) c)Just list their employee number, name, job, & weekly salaries (as wksal)
32 © Ellis Cohen Answer (a) to REAL Exercise List the names, weekly and yearly salaries of employees whose yearly salary is more than 70,000. Emps[52*sal > 70000] { ename, wksal:sal, yrsal:(52*sal) } SELECT ename, sal AS wksal, 52*sal AS yrsal FROM Emps WHERE 52*sal < Emps{ ename, wksal:sal, yrsal:(52*sal) } [yrsal > 70000] SELECT ename, sal AS wksal, 52*sal AS yrsal FROM Emps WHERE yrsal < OK in SQL Server; NOT OK in Oracle
33 © Ellis Cohen Answer (b) to REAL Exercise List the names, and weekly salaries (as wksal) of employees whose yearly salary is more than 70,000. Emps[52*sal > 70000]{ ename, wksal:sal } SELECT ename, sal AS wksal FROM Emps WHERE 52*sal < Emps{ ename, wksal:sal }[52*wksal > 70000] Emps{ ename, wksal:sal, yrsal:(52*sal) } [yrsal > 70000]{ ename, wksal }
34 © Ellis Cohen Answer (c) to REAL Exercise List the employee number, name, job, & weekly salaries of employees whose yearly salary is more than 70,000. Emps[52*sal > 70000]{ empno, ename, job, wksal:sal } Emps[52*sal > 70000]{ *, wksal sal, comm } SELECT empno, ename, job, sal AS wksal FROM Emps WHERE 52*sal < 70000
35 © Ellis Cohen REAL Rules Exercise Design some additional REAL rules based on –Project –Restrict –Named Projection –Removal & Replacement
36 © Ellis Cohen Comparisons
37 © Ellis Cohen IS Comparison Operator v1 = v2 Result is NULL (think UNKNOWN) if either V1 IS NULL or V2 IS NULL v1 IS v2 (like =, but two NULLs match) Result is TRUE if either v1 = v2 v1 IS NULL and v2 IS NULL Result is FALSE otherwise As defined in SQL and REAL Only defined in REAL (not in SQL)
38 © Ellis Cohen IS and IS NOT v1 IS NOT v2 means the same as NOT( v1 IS v2 ) It's like ≠, with NULL treated as an ordinary value Result is FALSE if either v1 = v2 v1 IS NULL and v2 IS NULL Result is TRUE otherwise
39 © Ellis Cohen IS-Augmented Comparisons v1 > v2 TRUE if v1 > v2, FALSE if v2 ≤ v1 NULL if either v1 or v2 is NULL (i.e. result is unknown if either value is unknown) v1 IS > v2 TRUE if v1 > v2, FALSE if v2 ≤ v1 FALSE if either v1 or v2 is NULL (i.e. read this as v1 is definitely > v2) As defined in SQL and REAL Only defined in REAL (not in SQL)
40 © Ellis Cohen Real Notions of Equality Strict Equality: v1 = v2 Result is NULL (think UNKNOWN) if v1 and/or v2 is NULL Projected Strict Equality: v1 IS = v2 Result is FALSE if v1 and/or v2 is NULL Extended Equality: v1 IS v2 Result is TRUE if both V1 and v2 are NULL Result is FALSE if only one of v1 or v2 is NULL All are the same if both v1 and v2 are non-NULL
41 © Ellis Cohen IS NOT Augmented Comparisons v1 IS NOT > v2 means the same as NOT( v1 IS > v2 ) It represents the cases other than those where v1 is definitely > v2 sal IS NOT > 300 is equivalent to (sal ≤ 300) OR (sal IS NULL)
42 © Ellis Cohen Negated Augmented Comparisons v1 IS ≤ v2 FALSE if v1 > v2, TRUE if v2 ≤ v1 FALSE if either v1 or v2 is NULL (i.e. read this as v1 is definitely ≤ v2) v1 IS NOT > v2 NOT(v1 > v2) FALSE if v1 > v2, TRUE if v2 ≤ v1 TRUE if either v1 or v2 is NULL (i.e. read this as v1 is not definitely > v2)
43 © Ellis Cohen Real Notions of Inequality Strict Inequality: v1 ≠ v2, NOT(v1 = v2) Result is NULL (think UNKNOWN) if v1 and/or v2 is NULL Projected Strict Inequality: v1 IS ≠ v2 Result is FALSE if v1 and/or v2 is NULL Counter-Projected Strict Inequality: v1 IS NOT = v2, NOT( v1 IS = v2 ) Result is TRUE if v1 and/or v2 is NULL Extended Inequality: v1 IS NOT v2, NOT(v1 IS v2) Result is TRUE if only one of V1 or v2 is NULL Result is FALSE if both v1 and v2 are NULL All are the same if both v1 and v2 are non-NULL
44 © Ellis Cohen Case Expressions
45 © Ellis Cohen Simple Case Expressions SELECT ename, (CASE WHEN sal = 3000 THEN 'OVERPAID' ELSE to_char(sal) END) AS salary, sal FROM Emps Emps{ ename, salary:( sal =3000 ? 'OVERPAID', to_char(sal) ), sal } REAL Simple Case Expressions
46 © Ellis Cohen Case Expressions and NULLs SELECT ename, (CASE WHEN sal = 3000 THEN 'OVERPAID' END) AS salary, sal FROM Emps Emps{ ename, salary:( sal =3000 ? 'OVERPAID' ), sal } salary will be NULL for those who are neither UNDERPAID or OVERPAID
47 © Ellis Cohen Searched Case Expressions SELECT ename, (CASE job WHEN 'CLERK' THEN 'ASSISTANT' WHEN 'MANAGER' THEN 'CHIEF' ELSE job END) AS title, sal FROM Emps; Emps{ ename, title:( job='CLERK' ? 'ASSISTANT', job='MANAGER' ? 'CHIEF', job) sal } Searched Case Expressions in REAL
48 © Ellis Cohen Duplicate Elimination
49 © Ellis Cohen REAL Duplicate Elimination SELECT DISTINCT deptno FROM Emps Emps{ deptno ! } REAL Duplicate Elimination Note: The Classical Relational Algebra is set-based and automatically eliminates duplicates. REAL is based on Garcia-Molina, Ullman & Widom, and allows duplicate tuples in a relation Read ! as squeeze, specifically Read a trailing ! as "group squeeze" – Group together all the employees with the same deptno & squeeze out the duplicate deptno's
50 © Ellis Cohen REAL Grouped Squeeze Emps{ deptno ! } empno ename deptno … 7839KING10… 7499ALLEN30… 7654MARTIN30… 7698BLAKE30… 7844TURNER30… 7986STERN50… Emps Order doesn't matter, so just show the Emps table ordered by deptno deptno grouped squeeze { deptno ! }
51 © Ellis Cohen Exercise: REAL Restriction & Grouped Squeeze What is the meaning of Emps[ sal > 1550 ]{ deptno ! }
52 © Ellis Cohen Answer: REAL Restriction & Grouped Squeeze Emps[ sal > 1550 ]{ deptno ! } empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN Emps 7499ALLEN BLAKE KING empno ename deptno sal comm deptno step 1 step 2 [ sal > 1550 ] step 1 { deptno ! } 1.Get the employees whose make > Get the departments in which those employees work List the departments which have employees who make > 1550
53 © Ellis Cohen Composite Duplicate Elimination SELECT DISTINCT deptno, job FROM Emps Emps{ deptno, job ! } CLERK ANALYST ANALYST CLERK SALESMAN CLERK CLERK SALESMAN Emps empno deptno job 10CLERK 30ANALYST 30CLERK 30SALESMAN 50CLERK 50SALESMAN deptno job List the distinct jobs within each department
54 © Ellis Cohen Distinct Tuples 1. What is the effect of SELECT DISTINCT * from Emps Emps{ * ! } 2. What's the difference between Emps{ job, sal ! } Emps{ job, sal }{ * ! }
55 © Ellis Cohen Distinct Tuple Answers 1. What is the effect of SELECT DISTINCT * from Emps Emps{ * ! } Lists Emps, eliminating duplicate tuples. This is the same as Emps, since Emps has a primary key, which ensures that (all values of empno, and therefore) all tuples arer unique 2. What's the difference between Emps{ job, sal ! } Emps{ job, sal }{ * ! } No difference. They both find all the unique pairs of jobs and salaries in Emps
56 © Ellis Cohen Aggregate Functions
57 © Ellis Cohen REAL Aggregate Functions SELECT count(comm) AS knt FROM Emps Emps{ ! knt:count(comm) } Aggregate Functions in REAL Read a leading ! as "aggregate squeeze" – Apply an aggregation function to all the rows and squeeze them down to a single result How many employees get commissions? The name is required in REAL
58 © Ellis Cohen Aggregation Produces Relations SELECT avg(sal) AS avgsal, max(sal) AS maxsal FROM Emps Emps{ ! avgsal:avg(sal), maxsal:max(sal) } still produces a relation That relation has a single tuple with two attributes: avgsal and maxsal
59 © Ellis Cohen REAL Aggregate Squeeze empno ename deptno sal 7499ALLEN MARTIN BLAKE KING TURNER STERN Emps avgsal maxsal Emps{ ! avgsal:avg(sal), maxsal:max(sal) } aggregate squeeze Aggregation results in a relation with a single tuple! { ! avgsal:avg(sal), maxsal:max(sal) }
60 © Ellis Cohen Exercise: REAL Restriction & Aggregation What is the REAL equivalent to SELECT avg(sal) AS avgsal FROM Emps WHERE deptno = 10
61 © Ellis Cohen REAL Aggregation & Restriction Emps[ deptno = 10 ]{ ! avgsal:avg(sal) } empno ename deptno sal comm 3049DILIP MARTIN BLAKE KING TURNER STERN Emps 3049DILIP KING empno ename deptno sal comm 3300 avgsal step 1 step 2 Can you do the project and the restrict in the opposite sequence? [ deptno = 10 ] step 1 { ! avgsal:avg(sal) } SELECT avg(sal) AS avgsal FROM Emps WHERE deptno = 10
62 © Ellis Cohen Sequence Matters Again! Emps { ! avgsal:avg(sal) }[ deptno = 10 ] empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN step 2step 1 Fails! No deptno attribute to restrict step 1 [ deptno = 10 ] step avgsal { ! avgsal:avg(sal) }
63 © Ellis Cohen REAL Placement of Aggregate Functions Emps{ ! knt:count(deptno) } Aggregate functions CANNOT be used in restrictions e.g. [count(*) > 10] is ILLEGAL! Restriction specifies a test applied to a tuple at a time, so aggregation makes no sense! The ONLY place aggregate functions can appear are in curly braces after the ! The ONLY thing that can appear after the ! are (expressions involving) aggregate functions In REAL Remember: The name is required in REAL * *
64 © Ellis Cohen Aggregate Function Exercise Using Emps( empno, ename, deptno, sal, comm ) Assume sal is the weekly salary, and that all employees work 40 hrs/week. Write REAL to determine the average hourly salary.
65 © Ellis Cohen REAL Answers: Aggregate Functions Determine the average hourly salary. Emps{ ! avghsal:avg(sal/40) } Emps{ hrsal:(sal/40) } { ! avghsal:avg(hrsal) } Emps{ ! avgsal:avg(sal) } { avghsal:avgsal/40) }
66 © Ellis Cohen Attribute Aggregation Problem Using Emps( empno, ename, deptno, sal, comm ) If only count(*) were allowed in REAL, but not count( attribute ), how would you write Emps{ ! knt:count(job) }
67 © Ellis Cohen Attribute Aggregation Answer If only count(*) were allowed in REAL, but not count( attribute ), how would you write Emps{ ! knt:count(job) } Emps[ job IS NOT NULL ] { ! knt:count(*) }
68 © Ellis Cohen Distinct Aggregation
69 © Ellis Cohen Distinct Aggregation SELECT count(DISTINCT deptno) AS knt FROM Emps Emps{ ! knt:count(deptno !) } REAL Distinct Aggregation Distinct Aggregation can be used with any aggregation function, though it is primarily used with count How many different departments do employees work in?
70 © Ellis Cohen Distinct Aggregation Problem Using Emps( empno, ename, deptno, sal, comm ) If distinct aggregation were not supported in REAL, (but you still could use ! for aggregation and to eliminate duplicates) how else could you write Emps{ ! knt:count(deptno !) } ?
71 © Ellis Cohen Diagram for Distinct Aggregation Emps{ ! knt:count(deptno!) } empno ename deptno … 7839KING10… 7499ALLEN30… 7654MARTIN30… 7698BLAKE30… 7844TURNER30… 7986STERN50… Emps deptno { deptno ! } { ! knt:count(deptno) } Emps{ deptno ! }{ ! knt:count(deptno) } 3 knt
72 © Ellis Cohen Grouped Aggregation
73 © Ellis Cohen REAL Grouped Aggregate Squeeze deptno avgsal maxsal Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } grouped aggregate squeeze empno deptno sal Emps group by deptnoaggregate each group A Grouped Aggregate Squeeze results in a relation with one tuple for each group! { deptno ! avgsal:avg(sal), maxsal:max(sal) }
74 © Ellis Cohen SQL vs REAL Grouping SELECT deptno, avg(sal) AS avgsal, max(sal) AS maxsal FROM Emps GROUP BY deptno Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } GROUPING in REAL DON'T include deptno here too! The result already has attributes deptno and avgsal and maxsal
75 © Ellis Cohen GROUP and DISTINCT Compare the results of SELECT job FROM Emps GROUP BY job SELECT DISTINCT job FROM Emps How would you write these both in REAL?
76 © Ellis Cohen Answer: GROUP and DISTINCT SELECT job FROM Emps GROUP BY job SELECT DISTINCT job FROM Emps Emps{ job ! } Identical Results!
77 © Ellis Cohen Composite Grouping SELECT deptno, job, count(*) AS knt FROM Emps GROUP BY deptno, job Emps{ deptno, job ! knt:count(*) } CLERK ANALYST ANALYST CLERK SALESMAN CLERK CLERK SALESMAN Emps empno deptno job 10CLERK1 30ANALYST2 30CLERK1 30SALESMAN1 50CLERK2 50SALESMAN1 deptno job knt How many employees hold each job within each department { deptno, job ! knt:count(*) }
78 © Ellis Cohen Grouping & Distinct Aggregation SELECT deptno, count(DISTINCT job) AS njob FROM Emps GROUP BY deptno Emps{ deptno ! njob:count(job !) } CLERK ANALYST ANALYST CLERK SALESMAN CLERK CLERK SALESMAN Emps empno deptno job deptno njob How many different jobs are there within each department { deptno ! njob:count(job !) }
79 © Ellis Cohen Distinct Counts Problem What's the difference between Emps{ deptno ! knt:count(job!) } Emps{ deptno, job ! } { deptno ! knt:count(job) }
80 © Ellis Cohen Diagram for Grouping Exercise CLERK ANALYST ANALYST CLERK SALESMAN CLERK CLERK SALESMAN Emps empno deptno job 10CLERK 30ANALYST 30CLERK 30SALESMAN 50CLERK 50SALESMAN deptno job { deptno, job ! } deptno njob { deptno ! knt:count(job) } Emps{ deptno ! knt:count(job!) }
81 © Ellis Cohen Distinct Counts With NULLs Emps{ deptno ! knt:count(job!) } – this ignores employees with NULL jobs Emps{ deptno, job ! } { deptno ! knt:count(job) } – No difference! This also ignores employees with NULL jobs Emps{ deptno, job ! } { deptno ! knt:count(*) } – the count will be one higher if any employees have NULL jobs
82 © Ellis Cohen Group Restriction
83 © Ellis Cohen Group Restriction Problem Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } Find the average and maximum salary of the employees in each department But suppose we only care about departments where the average salary is > deptno avgsal maxsal grouped aggregate squeeze empno deptno sal Emps { deptno ! avgsal:avg(sal), maxsal:max(sal) }
84 © Ellis Cohen REAL Group Restriction Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } [avgsal > 2000] Suppose we want to only keep departments whose average salary > deptno avgsal maxsal empno deptno sal Emps deptno avgsal maxsal [ avgsal > 2000 ] Keep those groups whose average salary > 2000 { deptno ! avgsal:avg(sal), maxsal:max(sal) }
85 © Ellis Cohen Projected Group Restriction Exercise The preceding result has deptno, avgsal & maxsal attributes Write REAL to Determine just the deptno and the maximum salary of those departments where the average salary > 2000
86 © Ellis Cohen REAL Projected Group Restriction Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } [avgsal > 2000] { deptno, maxsal } deptno maxsal deptno avgsal maxsal empno deptno sal Emps deptno avgsal maxsal [ avgsal > 2000 ] { deptno ! avgsal:avg(sal), maxsal:max(sal) } { deptno, maxsal }
87 © Ellis Cohen Real HAVING SELECT deptno, max(sal) AS maxsal FROM Emps GROUP BY deptno HAVING avg(sal) > 2000 Determine the deptno and the maximum salary of those departments where the average salary > 2000 Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } [avgsal > 2000] { deptno, maxsal }
88 © Ellis Cohen Group Restriction Exercise Using Emps( empno, ename, job, sal, comm, deptno ) Write the REAL expression for the following: Show the average salary per job, excluding those jobs found only in a single department
89 © Ellis Cohen Answer to Group Restriction Exercise Show the average salary per job, excluding those jobs found only in a single department Emps{ job ! avgsal:avg(sal), knt:count(deptno!) } [knt > 1]{ job, avgsal } SELECT job, avg(sal) AS avgsal FROM Emps GROUP BY job HAVING count(DISTINCT deptno) > 1