SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL RA  uses sets SQL  uses bags (multisets)

Slides:



Advertisements
Similar presentations
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra) Query Formulation Exercise.
Advertisements

Binary Operations in Relational Algebra & SQL
Union “join” two tables – the same number of columns select RA2000, DEC2000, TWOMASSID from TWOMASS WHERE ID
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
CS CS4432: Database Systems II Logical Plan Rewriting.
CS4432: Database Systems II Query Operator & Algebraic Expressions 1.
Algebraic and Logical Query Languages Spring 2011 Instructor: Hassan Khosravi.
1 Lecture 12: Further relational algebra, further SQL
Query Compiler. The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical.
The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical query plan into.
16.2.Algebraic Laws for Improving Query Plans Algebraic Laws for Improving Query Plans Commutative and Associative Laws Laws Involving.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
Relational Algebra on Bags A bag is like a set, but an element may appear more than once. –Multiset is another name for “bag.” Example: {1,2,1,3} is a.
Exercise Exercise3.1 8 Exercise3.1 9 Exercise
Exercise Exercise Exercise Exercise
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
Exercise Exercise Exercise Exercise
CS 4432query processing1 CS4432: Database Systems II.
Exercise Exercise6.1 7 Exercise6.1 8 Exercise6.1 9.
Relational Operations on Bags Extended Operators of Relational Algebra.
Introduction to Database Systems 1 Relational Algebra Relational Model: Topic 3.
16.2 ALGEBRAIC LAWS FOR IMPROVING QUERY PLANS Ramya Karri ID: 206.
Algebraic Laws Commutative and Associative Laws Commutativity for Sets and Bags (Ch5): R x S = S x R (Proof) R  S = S  R (ch5 e) R U S = S U.
Copyright © 2004 Pearson Education, Inc.. Chapter 8 SQL-99: Schema Definition, Basic Constraints, and Queries.
Some slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. Review Database Management Systems I Alex Coman, Winter 2006.
16.2.Algebraic Laws for Improving Query Plans Algebraic Laws for Improving Query Plans Commutative and Associative Laws Laws Involving.
Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data retrieval) for relational model SQL’s DML (Data Manipulation.
Query Optimization CS 157B Ch. 14 Mien Siao. Outline Introduction Steps in Cost-based query optimization- Query Flow Projection Example Query Interaction.
Midterm 1 Concepts Relational Algebra (DB4) SQL Querying and updating (DB5) Constraints and Triggers (DB11) Unified Modeling Language (DB9) Relational.
Relational Algebra 2 Chapter 5.2 V3.0 Napier University Dr Gordon Russell.
The Relational Model: Relational Calculus
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Database Management Systems,1 Relational Calculus.
From Relational Algebra to SQL CS 157B Enrique Tang.
SCUHolliday - coen 1785–1 Schedule Today: u Relational Algebra. u Read Chapter 5 to page 199. Next u SQL Queries. u Read Sections And then u Subqueries,
Section 3.2 Connections to Algebra.  In algebra, you learned a system of two linear equations in x and y can have exactly one solution, no solutions,
Databases : Relational Algebra - Complex Expression 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof.
More Relation Operations 2015, Fall Pusan National University Ki-Joune Li.
More Relation Operations 2014, Fall Pusan National University Ki-Joune Li.
Relational Algebra Instructor: Mohamed Eltabakh 1 Part II.
1 Algebra of Queries Classical Relational Algebra It is a collection of operations on relations. Each operation takes one or two relations as its operand(s)
Relational Algebra Instructor: Mohamed Eltabakh 1.
1 Announcements Reading for next week: Chapter 4 Your first homework will be assigned as soon as your database accounts have been set up.  Expect an .
Chapter 2: Equations and Inequalities Section 2.3/2.4: Conjunctions and Disjunctions and Solving Compound Sentences with Inequalities.
Copyright © 2004 Pearson Education, Inc.. Chapter 6 The Relational Algebra and Relational Calculus.
Manipulating Data Lesson 3. Objectives Queries The SELECT query to retrieve or extract data from one table, how to retrieve or extract data by using.
CMPT 354 Database Management Systems Oliver Schulte
1. Chapter 2: The relational Database Modeling Section 2.4: An algebraic Query Language Chapter 5: Algebraic and logical Query Languages Section 5.1:
Midterm Review. Main Topics ER model Relational model Relational Database Design (Theory)
CS4432: Database Systems II
Fundamental of Database Systems
CS257 Query Optimization.
Chapter 3 Introduction to SQL(3)
FUNDAMENTAL ALGEBRA Week 11.
CS157B Query Optimization.
CS 405G: Introduction to Database Systems
Operators Expression Trees Bag Model of Data
The Relational Algebra
More Relation Operations
The Relational Model Textbook /7/2018.
16.2.Algebraic Laws for Improving Query Plans
Basic Operations Algebra of Bags
Query Optimization CS 157B Ch. 14 Mien Siao.
Algebraic Laws.
Lecture 33: The Relational Model 2
Where are we? Until now: Modeling databases (ODL, E/R): all about the schema Now: Manipulating the data: queries, updates, SQL Then: looking inside -
CMPT 354: Database System I
Schedule Today: Next And then Relational Algebra.
5.1 Relational Operations on Bags
Presentation transcript:

SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL RA  uses sets SQL  uses bags (multisets) There are good performance reasons for using bags: Queries involve 2+ join, union, etc., which would require an extra pass through the relation being built There are times we WANT every instance, particularly for aggregate functions (e.g. taking an average) Downside: Extra memory

Section 5.1 Topics include: Union, Difference, Intersection and how they are affected by operation over bags Projection operator over bags Selection operator over bags Product and join over bags All the above follow what you would expect Other topics in 5.1: Algebraic laws of set operators applied to bags Make sure you remember duplicate behavior. If t is in R m times and in S n times, then: R U S will have m+n copies of t R INTERSECTION S will have min(m,n) copies of t R – S will have max(0,m-n) - Projection and selection do not eliminate duplicates that result

Examples: set operators over bags {1,2,1} ∪ {1,1,2,3,1} = {1,1,1,1,1,2,2,3} {1,2,1,1} ∩ {1,2,1,3} = {1, 1, 2} {1,2,1,1,1} – {1,1,2,3} = {1,1}

Exercise 5.1.3a

Exercise 5.1.3b πbore(Ships |><| Classes)

More relational algebra

δ – Duplicate elimination δ(R) Eliminate duplicates from relation R (i.e. converts a relation from a bag to set representation) R2 := δ(R1) R2 consists of one copy of each tuple that appears in R2 one or more times DISTINCT modifier in SELECT stmt

δ - Example R = ( A B ) 1 2 3 4 δ(R) = A B 1 2 3 4

τ – Sorting R2 := τL(R1) Benefit: L – list of some attributes of R1 L specifies the order of sorting Increasing order Tuples with identical components in L specify no order Benefit: Obvious – ordered output Not so obvious – stored sorted relations can have substantial query benefit Recall running time for binary search O(log n) is far superior than O(n)

Aggregation Operators Use to summarize something about the values in attribute of a relation Produces a single value as a result SUM(attr) AVG(attr) MIN(attr) MAX(attr) COUNT(attr)

Example: Aggregation R = ( A B ) 1 3 3 4 3 2 SUM(A) = 7 COUNT(A) = 3 1 3 3 4 3 2 SUM(A), COUNT(A), MAX(B), AVG(B) = ? SUM(A) = 7 COUNT(A) = 3 MAX(B) = 4 AVG(B) = 3

Grouping Operator R2 := γL(R1) L is a list of elements that are: Individual attributes of R1 Called grouping attributes Aggregated attribute of R1 Use an arrow and a new name to rename the component R2 projects only what is in L

How does γL(R) work? Form one group for each distinct list of values for those attributes in R Within each group, compute AGG(A) for each aggregation on L Result has one tuple for each group The grouping attributes' values for the group The aggregations over all tuples of the group (for the aggregated attributes)

Example: Grouping / Aggregation R = ( A B C ) 1 2 3 4 5 6 1 2 5 1 3 5 γA,B,AVG(C)->X (R) = ?? Then, average C within groups: A B X 1 2 4 4 5 6 1 3 5 First, partition R by A and B : A B C 1 2 3 1 2 5 4 5 6 1 3 5 Note that groups formed by ALL grouping attributes first

Note about aggregation If R is a relation, and R has attributes A1…An, then δ(R) == γA1,A2,…,An(R) Grouping on ALL attributes in R eliminates duplicates i.e. δ is not really necessary Also, if relation R is also a set, then πA1,A2,…,An(R) = γA1,A2,…,An(R)

Extended Projection Recall R2 := πL(R1) R2 contains only L attributes from R1 L can be extended to allow arbitrary expressions: Renaming (e.g., A -> B) Arithmetic expressions (e.g., A + B -> SUM) Duplicate attributes (i.e., include in L multiple times)

Example: Extended Projection R = ( A B ) 1 2 3 4 πA+B->C,A,A (R) = C A1 A2 3 1 1 7 3 3

Outer joins Recall that the standard natural join occurs only if there is a match from both relations A tuple of R that has NO tuple of S with which it can join is said to be dangling Vice versa applies Outer join: preserves dangling tuples in join Missing components set to NULL R |>◦<|C S. This is a bad approximation of the symbol – see text NO C? Natural outer join We visited this already – SELECT * FROM R LEFT JOIN S ON…

Example: Outer Join R = ( A B ) S = ( B C ) 1 2 2 3 4 5 6 7 1 2 2 3 4 5 6 7 (1,2) joins with (2,3), but the other two tuples are dangling. R |>◦<| S = A B C 1 2 3 4 5 NULL NULL 6 7

Types of outer joins R |>◦<| R S SQL: R |>◦<| S No condition, requires matching attributes Pads dangling tuples from both side R |>◦<| L S Pad dangling tupes of R only R |>◦<| R S Pad dangling tuples of S only SQL: R NATURAL {LEFT | RIGHT} JOIN S R {LEFT | RIGHT} JOIN S NOTE MySQL does not allow a FULL OUTER JOIN! Only LEFT or RIGHT Just UNION a left outer join and a right outer join… mostly

A+B A2 B2 1 0 1 5 4 9 6 4 16 7 9 16 B+1 C-1 1 0 3 3 3 4 4 3 1 1

A B 0 1 2 3 2 4 3 4 A SUM(B) 0 2 2 7 3 4 SELECT A,SUM(B) FROM R GROUP BY A

A 2 3 SELECT A FROM R GROUP BY A; SELECT DISTINCT A FROM R;

What if MAX(C) was SUM(C)? A MAX(C) 2 4 2 4 SUM(C) gives you A | SUM(C) 2 | 8 SELECT A,MAX(C) FROM R NATURAL JOIN S GROUP BY A;

SELECT * FROM R NATURAL LEFT JOIN S; A B C 2 3 4 0 1 ┴ 2 4 ┴ 3 4 ┴ If the group has matches, there is no SELECT * FROM R NATURAL LEFT JOIN S;

SELECT * FROM R NATURAL RIGHT JOIN S; A B C 2 3 4 ┴ 0 1 ┴ 2 4 ┴ 2 5 ┴ 0 2 If the group has matches, there is no SELECT * FROM R NATURAL RIGHT JOIN S;

SELECT. FROM R NATURAL LEFT JOIN S UNION SELECT SELECT * FROM R NATURAL LEFT JOIN S UNION SELECT * FROM R NATURAL RIGHT JOIN S; A B C 2 3 4 0 1 ┴ 2 4 ┴ 3 4 ┴ ┴ 0 1 ┴ 2 4 ┴ 2 5 ┴ 0 2 We could use UNION ALL, but then we end up with FOUR copies of (2,3,4) since it is included in both LEFT and RIGHT join Right?

SELECT. FROM R NATURAL LEFT JOIN S UNION ALL SELECT SELECT * FROM R NATURAL LEFT JOIN S UNION ALL SELECT * FROM R NATURAL RIGHT JOIN S WHERE A IS NULL; You can NOT use comparison operators with NULL. NULL means "missing value", and will always return FALSE. Use IS NULL or IS NOT NULL to check for null

A R.B S.B C 0 1 2 4 0 1 2 5 0 1 3 4 2 3 ┴ ┴ 2 4 ┴ ┴ 3 4 ┴ ┴ ┴ ┴ 0 1 ┴ ┴ 0 2 We could use UNION ALL, but then we end up with FOUR copies of (2,3,4) since it is included in both LEFT and RIGHT join

Back to SQL

Aggregations SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause Produces an aggregation on the attribute COUNT(*) count the number of tuples Use DISTINCT inside of an aggregation to eliminate duplicates in the function

Example: Sells(bar, beer, price) Find the average price of Guinness SELECT AVG(price) FROM Sells WHERE beer = 'Guinness'; Find the number of different prices charged for Guinness SELECT COUNT(DISTINCT price) AS "# Prices"

Grouping SELECT attr(s) FROM tbls WHERE cond_expr GROUP BY attr(s) The resulting SELECT-FROM-WHERE relation determined FIRST, then grouped according to GROUP BY clause MySQL will also sort the relations according to attributes listed in GROUP BY clause Therefore, allows optional ASC or DESC (just like ORDER BY) Aggregations are applied only within each group

Grouping and NULLS

Note on NULL and Aggregation NULL values in a tuple: never contribute to a sum, average or count can never be a min or max of an attribute If all values for an attribute are NULL, then the result of an aggregation is NULL Exception: COUNT of an empty set is 0 NULL values are treated as ordinary values when forming groups Could illustrate NULL grouping with SELECT * from Outcomes LEFT JOIN Ships ON (Outcomes.ship = Ships.name); Then add GROUP BY launched, modify SELECT launched, count(*)

Example: Grouping Sells(bar, beer, price) Frequents(drinker, bar) Find the average price for each beer SELECT beer, AVG(price) FROM Sells GROUP BY beer; Find for each drinker the average price of Guinness at the bars they frequent SELECT drinker, AVG(price) FROM Frequents NATURAL JOIN Sells WHERE beer = 'Guinness' GROUP BY drinker;

Restrictions Example: Book states that this is illegal SQL Find the bar that sells Guinness the cheapest SELECT bar, MIN(price) FROM Sells WHERE beer = 'Guinness'; Is this correct? Book states that this is illegal SQL if an aggregation used, then each SELECT element should be aggregated or be an attribute in GROUP BY MySQL allows the above, but such queries will give meaningless results What happens if there are multiple bars that sell Guinness? How does the query know which one is minimum?

Example of confusing aggregation Find the country of the ship with bore of 15 with the smallest displacement SELECT country, MIN(displacement) FROM Classes WHERE bore = 15; Demonstrate this! SELECT country, MIN(displacement) from Classes where bore = 15;

Not quite the correct answer! Be sure to follow the rules for aggregation.

What if we wanted the smallest country listed only? SELECT country FROM Classes WHERE displacement IN (select MIN(displacement) FROM Classes);

HAVING Clause HAVING cond Rules for conditions in HAVING clause: Follows a GROUP BY clause Condition applies to each possible group Groups not satisfying condition are eliminated Rules for conditions in HAVING clause: Aggregated attributes: Any attribute in relation in FROM clause can be aggregated Only applies to the group being tested Unaggregated attributes Only attributes in GROUP BY list mySQL is more lenient with this, though they result in meaningless information

Example: HAVING Sells(bar, beer, price) Find the average price of those beers that are served in at least three bars SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(*) >= 3;

Example: HAVING Sells(bar, beer, price) Beers(name, manf) Find the average price of beers that are either served in at least three bars or are manufactured by Sam Adams SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(*) >= 3 OR beer IN (SELECT name FROM Beers WHERE manf = 'Sam Adams'); Demonstrates aggregated and unaggregated attribute

Find the average displacement of ships from each country having at least two classes SELECT country, AVG(displacement) FROM Classes GROUP BY country HAVING count(*) >= 2; Notice, you can use count(*) >= 2 because each aggregated condition applies only to the group!

Summary so far SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2 ORDER BY b1,…,bk; S attributes from R1,…,Rn or aggregates C1 are conditions on R1,…,Rn a1,…,ak are attributes from R1,…,Rn C2 are conditions based on any attribute, or on any aggregation in GROUP BY clause b1,…,bk are attributes on R1,…,Rn

Exercises

Exercise 6.2.3f SELECT battle FROM Outcomes INNER JOIN Ships ON Outcomes.ship = Ships.name NATURAL JOIN Classes GROUP BY country, battle HAVING COUNT(ship) >= 3; Now this makes sense!

Exercise 6.4.7a SELECT COUNT(type) FROM Classes WHERE type = 'bb';

Exercise 6.4.7b SELECT AVG(numGuns) AS 'Avg Guns' FROM Classes WHERE type = 'bb';

Exercise 6.4.7c SELECT AVG(numGuns) AS 'Avg Guns' FROM Classes NATURAL JOIN Ships WHERE type = 'bb';

Exercise 6.4.7d SELECT class, MIN(launched) AS First_Launched FROM Classes NATURAL JOIN Ships GROUP BY class;

Exercise 6.4.7e SELECT C.class, COUNT(O.ship) AS '# sunk' FROM Classes AS C NATURAL JOIN Ships AS S INNER JOIN Outcomes AS O ON S.name = O.ship WHERE O.result = 'sunk' GROUP BY C.class; Why so few? There are many classes… many ships.. many outcomes. So why so few answers? SELECT * from Outcomes LEFT JOIN Ships ON Outcomes.ship = Ships.name WHERE result = 'sunk'; Notice … only ONE match!