1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2007 Basic Relational Algebra These slides.

Slides:



Advertisements
Similar presentations
พีชคณิตแบบสัมพันธ์ (Relational Algebra) บทที่ 3 อ. ดร. ชุรี เตชะวุฒิ CS (204)321 ระบบฐานข้อมูล 1 (Database System I)
Advertisements

Copyright  Oracle Corporation, All rights reserved. 4 Aggregating Data Using Group Functions.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5 Modified by Donghui Zhang.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Natural Joins These slides are licensed.
SQL Subqueries Objectives of the Lecture : To consider the general nature of subqueries. To consider simple versus correlated subqueries. To consider the.
12-1 Copyright  Oracle Corporation, All rights reserved. What Is a View? EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Extended SQL & The Relational Calculus.
Aggregating Data Using Group Functions. Objectives After completing this lesson, you should be able to do the following: Identify the available group.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Rutgers University Relational Algebra 198:541 Rutgers University.
Chapter 11.1 and 11.2 Data Manipulation: Relational Algebra and SQL Brian Cobarrubia Introduction to Database Management Systems October 4, 2007.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen M:N Relationships & Bridge Classes These.
4-1 Copyright  Oracle Corporation, All rights reserved. Displaying Data from Multiple Tables.
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
1 CS 430 Database Theory Winter 2005 Lecture 12: SQL DML - SELECT.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 6 The Relational Algebra.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen :1 Relationships These slides are licensed.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Introduction to Relational Databases &
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Cursors These slides are licensed under.
Subqueries.
Joins & Sub-queries. Oracle recognizes that you may want data that resides in multiple tables drawn together in some meaningful way. One of the most important.
SELECT Statements Lecture Notes Sree Nilakanta Fall 2010 (rev)
7 Multiple-Column Subqueries. 7-2 Objectives At the end of this lesson, you should be able to: Write a multiple-column subquery Describe and explain the.
SQL- DQL (Oracle Version). 2 SELECT Statement Syntax SELECT [DISTINCT] column_list FROM table_list [WHERE conditional expression] [GROUP BY column_list]
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Basic SQL These slides are licensed under.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Relational Mapping with Constraints &
Chapter 6 The Relational Algebra Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 6- 1.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Advanced Relational Algebra & SQL (Part1 )
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Advanced Relational Algebra These slides.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
Copyright  Oracle Corporation, All rights reserved. 12 Creating Views.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Subqueries These slides are licensed under.
An Introduction To SQL Part 2 (Special thanks to Geoff Leese)
1 Information Retrieval and Use (IRU) An Introduction To SQL Part 2.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Views These slides are licensed under.
Copyright س Oracle Corporation, All rights reserved. I Introduction.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Relational State Assertions These slides.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Grouping These slides are licensed under.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
Copyright س Oracle Corporation, All rights reserved. 12 Creating Views.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Inner Joins These slides are licensed.
CSE202 Database Management Systems
Chapter (6) The Relational Algebra and Relational Calculus Objectives
Aggregating Data Using Group Functions
Enhanced Guide to Oracle 10g
Subqueries.
Subqueries Schedule: Timing Topic 25 minutes Lecture
Aggregating Data Using Group Functions
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
(SQL) Aggregating Data Using Group Functions
What Is a View? EMPNO ENAME JOB EMP Table EMPVU10 View
The Relational Algebra and Relational Calculus
Aggregating Data Using Group Functions
Aggregating Data Using Group Functions
Subqueries Schedule: Timing Topic 25 minutes Lecture
Restricting and Sorting Data
Subqueries Schedule: Timing Topic 25 minutes Lecture
Copyright © Ellis Cohen
Presentation transcript:

1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Basic Relational Algebra These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see

2 © Ellis Cohen Overview of Lecture Relational Algebra using REAL Operator Composition Extended Projection Comparisons Case Expressions Duplicate Elimination Aggregate Functions Distinct Aggregation

3 © Ellis Cohen Relational Algebra using REAL

4 © Ellis Cohen Algebra Domain (e.g. numbers) Operators (e.g. for numbers) Unary Operators (e.g. Unary Minus) - 3  -3 - (-7)  7 Binary Operators (e.g. Sum, Product)  10 3 * -5  -15 An algebra is closed Apply the operators to values in the domain, and the result is ALWAYS another value from the domain

5 © Ellis Cohen Relations A relation is just a collection of tuples A relation corresponds to both a SQL table a SQL result set (i.e. the result of executing a SQL query)

6 © Ellis Cohen Relational Algebra Domain: Relations Unary Operators: Restrict (~ like the WHERE clause) Project (~ like the SELECT clause) Binary Operators: Joins (Cross, Inner & Outer Natural) Collection Operators (Union, …) Division (inverse of cross Join) The Relational Algebra DOES NOT have subqueries  They're not needed!

7 © Ellis Cohen REAL Relations as Unordered Bags 1.Relations in REAL are unordered No way to express ORDER BY in the Relational Algebra 2.Most Relational Algebras (including the Classic Relational Algebra) do not allow duplicates tuples in a relation (or support aggregation or grouping) REAL does allow duplicates (formally, relations are bags, not sets) The language we use for the relational algebra is called REAL Relation Expression and Assignment Language

8 © Ellis Cohen Unary Relation Operators In the relational algebra, a unary relation operator is applied to a relation and produces a relation as its result In Mathematical Terms Unary Operator: Relation  Relation That is, if O is a relation operator and R is a relation, then O (R) is also a relation

9 © Ellis Cohen Unary Relational Operators Two important relational operators: Restrict: Chooses subset of rows Project: Chooses subset of columns

10 © Ellis Cohen Restrict Operator Standard forms of restrict operator SELECT * FROM Emps WHERE sal > 1550 Restrict [sal > 1550] ( Emps ) σ sal > 1550 ( Emps ) SQL Equivalent Restriction is a unary relation operator: Apply it to a relation, and the result is another relation REAL restrict operator Emps[ sal > 1550 ] Restriction checks one tuple at a time and keeps tuples which satisfy the restriction condition

11 © Ellis Cohen The Restriction Machine Emps[ sal > 1550 ] empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN Emps 7499ALLEN BLAKE KING empno ename deptno sal comm [ sal > 1550 ] Start with a relation Run it through the restriction machine Get a new relation as a result

12 © Ellis Cohen Project Operator REAL project operator Emps{ empno, ename } Standard forms of project operator SELECT empno, ename FROM Emps Project [empno,ename] ( Emps ) π empno,ename ( Emps ) SQL Equivalent Restriction checks one tuple at a time and keeps tuples which satisfy the restriction condition

13 © Ellis Cohen The Projection Machine Emps{ empno, ename } empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN ALLEN 7654MARTIN 7698BLAKE 7839KING 7844TURNER 7986STERN empno ename { empno, ename } Start with a relation Run it through the projection machine Get a new relation as a result

14 © Ellis Cohen Why Learn REAL? Semantic Clarity –REAL is simpler than SQL ands helps explain how SQL works –Clear semantics allows us to reason about systems and is the basis for optimizing queries Succinctness –Can be a good shorthand for describing complicated queries and assertions. It can sometimes be easier to write complicated queries and assertions in REAL first, and then translate them to SQL

15 © Ellis Cohen Operator Composition

16 © Ellis Cohen Closure and Composition An algebra is closed Apply the operators to values in the domain, and the result is ALWAYS another value from the domain So, operators can be composed ((5 + 7) * 3) + 8  (12 * 3) + 8   44

17 © Ellis Cohen Operator Composition Standard forms of composition SELECT empno, ename FROM Emps WHERE sal > 1550 Project [empno,ename] ( Restrict [sal < 1550] ( Emps ) ) π empno,ename ( σ sal < 1550 ( Emps ) ) SQL Equivalent REAL composition Emps[ sal > 1550 ]{ empno, ename }

18 © Ellis Cohen REAL Composition Processed L-to-R Emps[ sal > 1550 ]{ empno, ename } Emps -- the base relation containing employees Emps[ sal > 1550 ] -- Emps restricted to the employees with sal > 1500 Emps[ sal > 1550 ]{ empno, ename } -- First we restrict Emps (to the employees with sal > 1500) -- Then, from that, we extract only the empno and ename attributes

19 © Ellis Cohen Composition Yields Relations Emps[ sal > 1550 ]{ empno, ename } empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN Emps 7499ALLEN BLAKE KING empno ename deptno sal comm 7499ALLEN 7698BLAKE 7839KING empno ename step 1 step 2 Can you do the project and the restrict in the opposite sequence? [ sal > 1550 ] step 1 { empno, ename }

20 © Ellis Cohen Sequence Matters Emps{ empno, ename }[ sal > 1550 ] empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN ALLEN 7654MARTIN 7698BLAKE 7839KING 7844TURNER 7986STERN empno ename step 2step 1 Fails! No sal attribute to restrict step 1 { empno, ename } [ sal > 1550 ] step 2

21 © Ellis Cohen Composition Exercise For each of the REAL expressions below Write the corresponding SQL Write a simpler equivalent REAL expression Emps{ ename, job, sal }[sal < 1000]{ ename, job } Emps[sal < 1000]{ empno, ename, job }[ job = 'CLERK']

22 © Ellis Cohen Combined Projection Emps{ ename, job, sal } -- get the ename, job & sal of each employee Emps{ ename, job, sal }[sal < 1000] -- get the ename, job & sal of each employee, -- and get those whose sal is less than 1000 Emps{ ename, job, sal }[sal < 1000]{ ename, job } -- get the ename, job & sal of each employee, -- get those whose sal is less than but really just get the ename & job -- NO POINT in restricting { ename, job, sal } first Emps[sal < 1000]{ ename, job } -- just get the employees whose sal is less than get just their ename & job SELECT ename, job FROM Emps WHERE sal < 1000

23 © Ellis Cohen Combined Restriction Emps[sal < 1000] -- get the employees whose sal is less than 1000 Emps[sal < 1000]{ empno, ename, job } -- get the employees whose sal is less than get their empno, ename & job Emps[sal < 1000]{ empno, ename, job }[ job = 'CLERK'] -- get the employees whose sal is less than get their empno, ename & job -- get that information, but only for the clerks Emps[sal < 1000][ job = 'CLERK'] Emps[ (sal < 1000) AND (job = 'CLERK') ] -- get the clerks whose sal is less than 1000 Emps[sal < 1000][job = 'CLERK']{ empno, ename, job } Emps[ (sal < 1000) AND (job = 'CLERK') ]{ empno, ename, job } -- get the clerks whose sal is less than get their empno, ename & job SELECT empno, ename, job FROM Emps WHERE (sal < 1000) AND (job = 'CLERK')

24 © Ellis Cohen Transformation Rules for Algebras Elementary Algebra Over: Over: Numbers Operators Operators (include) Sum (a.k.a +) Product (a.k.a. *)Rules Commutative: Sum(a,b) ↔ Sum(b,a) a + b ↔ b + a Associative: a + (b + c) ↔ ( a + b) + c Distributive: a * (b + c) ↔ a*b + a*c Relational Algebra Over: Over: Relations Operators Operators (include) Restrict (subset of rows) Project (subset of columns)Rules What are they? Algebras have rules for transforming one algebraic expression into another

25 © Ellis Cohen Some Rules for Restrict Emps[ sal < 1000 ][ job = 'CLERK' ] Emps[ job = 'CLERK' ][ sal < 1000 ] Commutativity Rule for Restrict R[C1][C2] ↔ R[C2][C1] Emps[ (sal < 1000) AND (job = 'CLERK') ] Conjunction Rule for Restrict R[C1][C2] ↔ R[ C1 AND C2 ] What are some other rules for REAL?

26 © Ellis Cohen Extended Projection

27 © Ellis Cohen Calculating & Naming Attributes SELECT empno, ename AS empname, job, (sal * 52) AS yrsal FROM Emps Emps{ empno, empname:ename, job, yrsal:(sal*52) } Named Projection with Calculation in REAL

28 © Ellis Cohen Bulk Prefixing Given Emps (empno, ename, job, sal, comm) We want our query result to have all the same attributes, but all prefixed in the same way empno ename job sal comm z_empno z_ename z_job z_sal z_comm SELECT empno AS z_empno, ename AS z_ename, job AS z_job, sal AS z_sal, comm AS z_comm FROM Emps Emps{ z_(*):* } or just z_$Emps Bulk Attribute Naming in REAL

29 © Ellis Cohen Attribute Removal Given Emps (empno, ename, job, sal, comm) We want our query result to have all the same attributes, but with some of them removed empno ename job sal comm empno ename sal comm SELECT empno, ename, sal, comm FROM Emps Emps{ *,  job } Attribute Removal in REAL note: job not listed Then, remove job First, include all of Emps attributes

30 © Ellis Cohen Attribute Replacement Given Emps (empno, ename, job, sal, comm) We want our query result to have all the same attributes, but with some attribute names replaced empno ename job sal comm empno empname job wksal comm SELECT empno, ename AS empname, job, sal AS wksal, comm FROM Emps Emps{ *, empname  ename, wksal  sal } Attribute Replacement in REAL Then, replace ename by empname First, include all of Emps attributes

31 © Ellis Cohen Relational Algebra Exercise Assume sal is the weekly salary, and that all employees are paid 52 weeks/year. a)Write the REAL expressions to list the names, weekly salary (as wksal) and yearly salaries (as yrsal) of employees whose yearly salary is more than 70,000. b)Just list their names & weekly salaries (as wksal) c)Just list their employee number, name, job, & weekly salaries (as wksal)

32 © Ellis Cohen Answer (a) to REAL Exercise List the names, weekly and yearly salaries of employees whose yearly salary is more than 70,000. Emps[52*sal > 70000] { ename, wksal:sal, yrsal:(52*sal) } SELECT ename, sal AS wksal, 52*sal AS yrsal FROM Emps WHERE 52*sal < Emps{ ename, wksal:sal, yrsal:(52*sal) } [yrsal > 70000] SELECT ename, sal AS wksal, 52*sal AS yrsal FROM Emps WHERE yrsal < OK in SQL Server; NOT OK in Oracle

33 © Ellis Cohen Answer (b) to REAL Exercise List the names, and weekly salaries (as wksal) of employees whose yearly salary is more than 70,000. Emps[52*sal > 70000]{ ename, wksal:sal } SELECT ename, sal AS wksal FROM Emps WHERE 52*sal < Emps{ ename, wksal:sal }[52*wksal > 70000] Emps{ ename, wksal:sal, yrsal:(52*sal) } [yrsal > 70000]{ ename, wksal }

34 © Ellis Cohen Answer (c) to REAL Exercise List the employee number, name, job, & weekly salaries of employees whose yearly salary is more than 70,000. Emps[52*sal > 70000]{ empno, ename, job, wksal:sal } Emps[52*sal > 70000]{ *, wksal  sal,  comm } SELECT empno, ename, job, sal AS wksal FROM Emps WHERE 52*sal < 70000

35 © Ellis Cohen REAL Rules Exercise Design some additional REAL rules based on –Project –Restrict –Named Projection –Removal & Replacement

36 © Ellis Cohen Comparisons

37 © Ellis Cohen IS Comparison Operator v1 = v2 Result is NULL (think UNKNOWN) if either V1 IS NULL or V2 IS NULL v1 IS v2 (like =, but two NULLs match) Result is TRUE if either v1 = v2 v1 IS NULL and v2 IS NULL Result is FALSE otherwise As defined in SQL and REAL Only defined in REAL (not in SQL)

38 © Ellis Cohen IS and IS NOT v1 IS NOT v2 means the same as NOT( v1 IS v2 ) It's like ≠, with NULL treated as an ordinary value Result is FALSE if either v1 = v2 v1 IS NULL and v2 IS NULL Result is TRUE otherwise

39 © Ellis Cohen IS-Augmented Comparisons v1 > v2 TRUE if v1 > v2, FALSE if v2 ≤ v1 NULL if either v1 or v2 is NULL (i.e. result is unknown if either value is unknown) v1 IS > v2 TRUE if v1 > v2, FALSE if v2 ≤ v1 FALSE if either v1 or v2 is NULL (i.e. read this as v1 is definitely > v2) As defined in SQL and REAL Only defined in REAL (not in SQL)

40 © Ellis Cohen Real Notions of Equality Strict Equality: v1 = v2 Result is NULL (think UNKNOWN) if v1 and/or v2 is NULL Projected Strict Equality: v1 IS = v2 Result is FALSE if v1 and/or v2 is NULL Extended Equality: v1 IS v2 Result is TRUE if both V1 and v2 are NULL Result is FALSE if only one of v1 or v2 is NULL All are the same if both v1 and v2 are non-NULL

41 © Ellis Cohen IS NOT Augmented Comparisons v1 IS NOT > v2 means the same as NOT( v1 IS > v2 ) It represents the cases other than those where v1 is definitely > v2 sal IS NOT > 300 is equivalent to (sal ≤ 300) OR (sal IS NULL)

42 © Ellis Cohen Negated Augmented Comparisons v1 IS ≤ v2 FALSE if v1 > v2, TRUE if v2 ≤ v1 FALSE if either v1 or v2 is NULL (i.e. read this as v1 is definitely ≤ v2) v1 IS NOT > v2 NOT(v1 > v2) FALSE if v1 > v2, TRUE if v2 ≤ v1 TRUE if either v1 or v2 is NULL (i.e. read this as v1 is not definitely > v2)

43 © Ellis Cohen Real Notions of Inequality Strict Inequality: v1 ≠ v2, NOT(v1 = v2) Result is NULL (think UNKNOWN) if v1 and/or v2 is NULL Projected Strict Inequality: v1 IS ≠ v2 Result is FALSE if v1 and/or v2 is NULL Counter-Projected Strict Inequality: v1 IS NOT = v2, NOT( v1 IS = v2 ) Result is TRUE if v1 and/or v2 is NULL Extended Inequality: v1 IS NOT v2, NOT(v1 IS v2) Result is TRUE if only one of V1 or v2 is NULL Result is FALSE if both v1 and v2 are NULL All are the same if both v1 and v2 are non-NULL

44 © Ellis Cohen Case Expressions

45 © Ellis Cohen Simple Case Expressions SELECT ename, (CASE WHEN sal = 3000 THEN 'OVERPAID' ELSE to_char(sal) END) AS salary, sal FROM Emps Emps{ ename, salary:( sal =3000 ? 'OVERPAID', to_char(sal) ), sal } REAL Simple Case Expressions

46 © Ellis Cohen Case Expressions and NULLs SELECT ename, (CASE WHEN sal = 3000 THEN 'OVERPAID' END) AS salary, sal FROM Emps Emps{ ename, salary:( sal =3000 ? 'OVERPAID' ), sal } salary will be NULL for those who are neither UNDERPAID or OVERPAID

47 © Ellis Cohen Searched Case Expressions SELECT ename, (CASE job WHEN 'CLERK' THEN 'ASSISTANT' WHEN 'MANAGER' THEN 'CHIEF' ELSE job END) AS title, sal FROM Emps; Emps{ ename, title:( job='CLERK' ? 'ASSISTANT', job='MANAGER' ? 'CHIEF', job) sal } Searched Case Expressions in REAL

48 © Ellis Cohen Duplicate Elimination

49 © Ellis Cohen REAL Duplicate Elimination SELECT DISTINCT deptno FROM Emps Emps{ deptno ! } REAL Duplicate Elimination Note: The Classical Relational Algebra is set-based and automatically eliminates duplicates. REAL is based on Garcia-Molina, Ullman & Widom, and allows duplicate tuples in a relation Read ! as squeeze, specifically Read a trailing ! as "group squeeze" – Group together all the employees with the same deptno & squeeze out the duplicate deptno's

50 © Ellis Cohen REAL Grouped Squeeze Emps{ deptno ! } empno ename deptno … 7839KING10… 7499ALLEN30… 7654MARTIN30… 7698BLAKE30… 7844TURNER30… 7986STERN50… Emps Order doesn't matter, so just show the Emps table ordered by deptno deptno grouped squeeze { deptno ! }

51 © Ellis Cohen Exercise: REAL Restriction & Grouped Squeeze What is the meaning of Emps[ sal > 1550 ]{ deptno ! }

52 © Ellis Cohen Answer: REAL Restriction & Grouped Squeeze Emps[ sal > 1550 ]{ deptno ! } empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN Emps 7499ALLEN BLAKE KING empno ename deptno sal comm deptno step 1 step 2 [ sal > 1550 ] step 1 { deptno ! } 1.Get the employees whose make > Get the departments in which those employees work List the departments which have employees who make > 1550

53 © Ellis Cohen Composite Duplicate Elimination SELECT DISTINCT deptno, job FROM Emps Emps{ deptno, job ! } CLERK ANALYST ANALYST CLERK SALESMAN CLERK CLERK SALESMAN Emps empno deptno job 10CLERK 30ANALYST 30CLERK 30SALESMAN 50CLERK 50SALESMAN deptno job List the distinct jobs within each department

54 © Ellis Cohen Distinct Tuples 1. What is the effect of SELECT DISTINCT * from Emps Emps{ * ! } 2. What's the difference between Emps{ job, sal ! } Emps{ job, sal }{ * ! }

55 © Ellis Cohen Distinct Tuple Answers 1. What is the effect of SELECT DISTINCT * from Emps Emps{ * ! } Lists Emps, eliminating duplicate tuples. This is the same as Emps, since Emps has a primary key, which ensures that (all values of empno, and therefore) all tuples arer unique 2. What's the difference between Emps{ job, sal ! } Emps{ job, sal }{ * ! } No difference. They both find all the unique pairs of jobs and salaries in Emps

56 © Ellis Cohen Aggregate Functions

57 © Ellis Cohen REAL Aggregate Functions SELECT count(comm) AS knt FROM Emps Emps{ ! knt:count(comm) } Aggregate Functions in REAL Read a leading ! as "aggregate squeeze" – Apply an aggregation function to all the rows and squeeze them down to a single result How many employees get commissions? The name is required in REAL

58 © Ellis Cohen Aggregation Produces Relations SELECT avg(sal) AS avgsal, max(sal) AS maxsal FROM Emps Emps{ ! avgsal:avg(sal), maxsal:max(sal) } still produces a relation That relation has a single tuple with two attributes: avgsal and maxsal

59 © Ellis Cohen REAL Aggregate Squeeze empno ename deptno sal 7499ALLEN MARTIN BLAKE KING TURNER STERN Emps avgsal maxsal Emps{ ! avgsal:avg(sal), maxsal:max(sal) } aggregate squeeze Aggregation results in a relation with a single tuple! { ! avgsal:avg(sal), maxsal:max(sal) }

60 © Ellis Cohen Exercise: REAL Restriction & Aggregation What is the REAL equivalent to SELECT avg(sal) AS avgsal FROM Emps WHERE deptno = 10

61 © Ellis Cohen REAL Aggregation & Restriction Emps[ deptno = 10 ]{ ! avgsal:avg(sal) } empno ename deptno sal comm 3049DILIP MARTIN BLAKE KING TURNER STERN Emps 3049DILIP KING empno ename deptno sal comm 3300 avgsal step 1 step 2 Can you do the project and the restrict in the opposite sequence? [ deptno = 10 ] step 1 { ! avgsal:avg(sal) } SELECT avg(sal) AS avgsal FROM Emps WHERE deptno = 10

62 © Ellis Cohen Sequence Matters Again! Emps { ! avgsal:avg(sal) }[ deptno = 10 ] empno ename deptno sal comm 7499ALLEN MARTIN BLAKE KING TURNER STERN step 2step 1 Fails! No deptno attribute to restrict step 1 [ deptno = 10 ] step avgsal { ! avgsal:avg(sal) }

63 © Ellis Cohen REAL Placement of Aggregate Functions Emps{ ! knt:count(deptno) } Aggregate functions CANNOT be used in restrictions e.g. [count(*) > 10] is ILLEGAL! Restriction specifies a test applied to a tuple at a time, so aggregation makes no sense! The ONLY place aggregate functions can appear are in curly braces after the ! The ONLY thing that can appear after the ! are (expressions involving) aggregate functions In REAL Remember: The name is required in REAL * *

64 © Ellis Cohen Aggregate Function Exercise Using Emps( empno, ename, deptno, sal, comm ) Assume sal is the weekly salary, and that all employees work 40 hrs/week. Write REAL to determine the average hourly salary.

65 © Ellis Cohen REAL Answers: Aggregate Functions Determine the average hourly salary. Emps{ ! avghsal:avg(sal/40) } Emps{ hrsal:(sal/40) } { ! avghsal:avg(hrsal) } Emps{ ! avgsal:avg(sal) } { avghsal:avgsal/40) }

66 © Ellis Cohen Attribute Aggregation Problem Using Emps( empno, ename, deptno, sal, comm ) If only count(*) were allowed in REAL, but not count( attribute ), how would you write Emps{ ! knt:count(job) }

67 © Ellis Cohen Attribute Aggregation Answer If only count(*) were allowed in REAL, but not count( attribute ), how would you write Emps{ ! knt:count(job) } Emps[ job IS NOT NULL ] { ! knt:count(*) }

68 © Ellis Cohen Distinct Aggregation

69 © Ellis Cohen Distinct Aggregation SELECT count(DISTINCT deptno) AS knt FROM Emps Emps{ ! knt:count(deptno !) } REAL Distinct Aggregation Distinct Aggregation can be used with any aggregation function, though it is primarily used with count How many different departments do employees work in?

70 © Ellis Cohen Distinct Aggregation Problem Using Emps( empno, ename, deptno, sal, comm ) If distinct aggregation were not supported in REAL, (but you still could use ! for aggregation and to eliminate duplicates) how else could you write Emps{ ! knt:count(deptno !) } ?

71 © Ellis Cohen Diagram for Distinct Aggregation Emps{ ! knt:count(deptno!) } empno ename deptno … 7839KING10… 7499ALLEN30… 7654MARTIN30… 7698BLAKE30… 7844TURNER30… 7986STERN50… Emps deptno { deptno ! } { ! knt:count(deptno) } Emps{ deptno ! }{ ! knt:count(deptno) } 3 knt

72 © Ellis Cohen Grouped Aggregation

73 © Ellis Cohen REAL Grouped Aggregate Squeeze deptno avgsal maxsal Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } grouped aggregate squeeze empno deptno sal Emps group by deptnoaggregate each group A Grouped Aggregate Squeeze results in a relation with one tuple for each group! { deptno ! avgsal:avg(sal), maxsal:max(sal) }

74 © Ellis Cohen SQL vs REAL Grouping SELECT deptno, avg(sal) AS avgsal, max(sal) AS maxsal FROM Emps GROUP BY deptno Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } GROUPING in REAL DON'T include deptno here too! The result already has attributes deptno and avgsal and maxsal

75 © Ellis Cohen GROUP and DISTINCT Compare the results of SELECT job FROM Emps GROUP BY job SELECT DISTINCT job FROM Emps How would you write these both in REAL?

76 © Ellis Cohen Answer: GROUP and DISTINCT SELECT job FROM Emps GROUP BY job SELECT DISTINCT job FROM Emps Emps{ job ! } Identical Results!

77 © Ellis Cohen Composite Grouping SELECT deptno, job, count(*) AS knt FROM Emps GROUP BY deptno, job Emps{ deptno, job ! knt:count(*) } CLERK ANALYST ANALYST CLERK SALESMAN CLERK CLERK SALESMAN Emps empno deptno job 10CLERK1 30ANALYST2 30CLERK1 30SALESMAN1 50CLERK2 50SALESMAN1 deptno job knt How many employees hold each job within each department { deptno, job ! knt:count(*) }

78 © Ellis Cohen Grouping & Distinct Aggregation SELECT deptno, count(DISTINCT job) AS njob FROM Emps GROUP BY deptno Emps{ deptno ! njob:count(job !) } CLERK ANALYST ANALYST CLERK SALESMAN CLERK CLERK SALESMAN Emps empno deptno job deptno njob How many different jobs are there within each department { deptno ! njob:count(job !) }

79 © Ellis Cohen Distinct Counts Problem What's the difference between Emps{ deptno ! knt:count(job!) } Emps{ deptno, job ! } { deptno ! knt:count(job) }

80 © Ellis Cohen Diagram for Grouping Exercise CLERK ANALYST ANALYST CLERK SALESMAN CLERK CLERK SALESMAN Emps empno deptno job 10CLERK 30ANALYST 30CLERK 30SALESMAN 50CLERK 50SALESMAN deptno job { deptno, job ! } deptno njob { deptno ! knt:count(job) } Emps{ deptno ! knt:count(job!) }

81 © Ellis Cohen Distinct Counts With NULLs Emps{ deptno ! knt:count(job!) } – this ignores employees with NULL jobs Emps{ deptno, job ! } { deptno ! knt:count(job) } – No difference! This also ignores employees with NULL jobs Emps{ deptno, job ! } { deptno ! knt:count(*) } – the count will be one higher if any employees have NULL jobs

82 © Ellis Cohen Group Restriction

83 © Ellis Cohen Group Restriction Problem Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } Find the average and maximum salary of the employees in each department But suppose we only care about departments where the average salary is > deptno avgsal maxsal grouped aggregate squeeze empno deptno sal Emps { deptno ! avgsal:avg(sal), maxsal:max(sal) }

84 © Ellis Cohen REAL Group Restriction Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } [avgsal > 2000] Suppose we want to only keep departments whose average salary > deptno avgsal maxsal empno deptno sal Emps deptno avgsal maxsal [ avgsal > 2000 ] Keep those groups whose average salary > 2000 { deptno ! avgsal:avg(sal), maxsal:max(sal) }

85 © Ellis Cohen Projected Group Restriction Exercise The preceding result has deptno, avgsal & maxsal attributes Write REAL to Determine just the deptno and the maximum salary of those departments where the average salary > 2000

86 © Ellis Cohen REAL Projected Group Restriction Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } [avgsal > 2000] { deptno, maxsal } deptno maxsal deptno avgsal maxsal empno deptno sal Emps deptno avgsal maxsal [ avgsal > 2000 ] { deptno ! avgsal:avg(sal), maxsal:max(sal) } { deptno, maxsal }

87 © Ellis Cohen Real HAVING SELECT deptno, max(sal) AS maxsal FROM Emps GROUP BY deptno HAVING avg(sal) > 2000 Determine the deptno and the maximum salary of those departments where the average salary > 2000 Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } [avgsal > 2000] { deptno, maxsal }

88 © Ellis Cohen Group Restriction Exercise Using Emps( empno, ename, job, sal, comm, deptno ) Write the REAL expression for the following: Show the average salary per job, excluding those jobs found only in a single department

89 © Ellis Cohen Answer to Group Restriction Exercise Show the average salary per job, excluding those jobs found only in a single department Emps{ job ! avgsal:avg(sal), knt:count(deptno!) } [knt > 1]{ job, avgsal } SELECT job, avg(sal) AS avgsal FROM Emps GROUP BY job HAVING count(DISTINCT deptno) > 1