Relational Database Systems 1 Instructor: Prof. James Cheng Acknowledgement: The slides are extracted from and modified based on the slides provided by Prof. Sourav S. Bhowmick from Nanyang Technological University.
“Relational databases are the foundation of western civilization.” Bruce Lindsay IBM Fellow IBM Almaden Research Center
Topics to be covered ER model Relational Algebra SQL Storage and Index Structures Query Processing and Query Optimization
Topics to be covered ER model Relational Algebra SQL Storage and Index Structures Query Processing and Query Optimization
Entity/Relationship Model (ER Model) The First Step Analyze Analysis of information that should be stored in the database Relationships Relationships between the components of information Entity/Relationship Model (ER Model) A popular approach – Entity/Relationship Model
Purpose of E/R Diagram Design database informally Graphical The E/R model allows us to sketch the design of a database informally. Describing the schema of databases Graphical Designs are pictures called entity-relationship diagrams. Conversion to implementation Mechanical ways to convert E/R diagrams to real implementations
The Process Ideas E/R Design Relational Schema RDBMS
Example Meaning Bars sell some beers Drinkers like some beers Drinkers visit some bars Bars Beers Sells manf name addr Drinkers Likes Visits ID Relations Beers(name, manf) Bars(name, addr) Drinkers(ID, name, addr) Sells(bar,beer) Visits(drinker,bar) Likes(drinker,beer) addr
Topics to be covered ER model Relational Algebra SQL Storage and Index Structures Query Processing and Query Optimization
Place in the big picture Declarative query language Algebra Implementation Relational Algebra SQL, relational calculus
Core Relational Algebra Union, Intersection, and Difference Usual set operations, but require both operands have the same relation schema. Selection Projection Picking certain rows. Picking certain columns. Products & Joins Renaming Compositions of relations. Renaming of relations and attributes.
Union Union operator Rule Builds a relation consisting of all tuples appearing in either or both of two specified relations. Combines all rows from two tables, excluding duplicate rows Rule Tables must have the same attribute characteristics
Example R1 R2 Name Addr favBeer Pauline hku Heineken Joe cuhk Bud Name Eve Harry cuhk Tiger R1 R2 Name Addr favBeer Pauline hku Heineken Eve Joe cuhk Bud Harry Tiger
Intersection Intersection operator Results Builds a relation consisting of all tuples appearing in both of two specified relations Results Yields only the rows that appear in both tables
Example R1 R2 Name Addr favBeer Pauline hku Heineken Joe cuhk Bud Name Eve Harry cuhk Tiger R1 R2 Name Addr favBeer Pauline hku Heineken
Difference Difference operator Results Builds a relation consisting of all tuples appearing in first relation but not the second. Results It subtracts one table from the other
Example R1 R2 Name Addr favBeer Pauline hku Heineken Joe cuhk Bud Name Eve Harry cuhk Tiger R1 - R2 Name Addr favBeer Joe cuhk Bud
Selection Selection operator Representation Extracts specified tuples (rows) from a specified relation (table). Returns all tuples which satisfy a condition Representation R1 = sc(R2) C is a condition (as in “if” statements) that refers to attributes of R2. R1 is all those tuples of R2 that satisfy C.
Example Bar Beer Price Joe’s Heineken 8.00 Bud 7.60 Sells Bar Beer Ku De Ta Miller 9.00 Bud 7.60 Clinic Harry’s Tiger 9.50 JoeMenu := Bar=“Joe’s”(Sells)
Projection Projection operator Representation Extracts specified attributes (columns) from a specified relation. Representation R1 := P L (R2) L is a list of attributes from the schema of R2. R1 is constructed by looking at each tuple of R2, extracting the attributes on list L, in the order specified, and creating from those components a tuple for R1. Eliminate duplicate tuples, if any.
Example Sells Bar Beer Price Joe’s Heineken 8.00 Ku De Ta Miller 9.00 Bud 7.60 Clinic Harry’s Tiger 9.50 Beer Price Heineken 8.00 Miller 9.00 Bud 7.60 Tiger 9.50 Prices := Beer,Price(Sells)
Cartesian Product Cartesian Product Representation Builds a relation from two specified relations consisting of all possible concatenated pairs of tuples, one from each of the two relations. Representation R3 := R1 × R2 Pair each tuple t1 of R1 with each tuple t2 of R2. The concatenation “t1 t2” is a tuple of R3. Schema of R3 is the attributes of R1 and R2, in order. Beware! Beware of attribute A of the same name in R1 and R2: use R1.A and R2.A.
Example A B B C A R1.B R2.B C R1 R2 1 2 5 6 3 3 4 7 8 R1 × R2 1 2 3 4
Theta-Join Join Representation Builds a relation from two specified relations consisting of all possible concatenated pairs, one from each of the two relations, such that in each pair the two tuples satisfy some condition. Representation R3 := R1 ⋈ C R2 Take the product R1 × R2. Then apply C to the result. R1 ⋈ C R2 = C (R1 × R2)
Equi-Join The Condition C Equi-Join C can be any boolean-valued condition. Historic versions of this operator allowed only A theta B, where theta was =, <, etc.; hence the name “theta-join.” Equi-Join If C is a conjunction of equality then it is called an equi-join
Example Sells Bars Bar Beer Price Joe’s Heineken 8.00 Bud 7.60 Pump Room Name Addr Joe’s Scotts Rd Pump Room Clark Quay Harry’s Esplanade BarInfo :=Sells ⋈ Sell.Bar = Bars.Name Bars Bar Beer Price Name Addr Joe’s Heineken 8.00 Scotts Rd Bud 7.60 Pump Room Clark Quay
Natural Join Natural Join Representation Connects two relations by: Equating attributes of the same name, and Projecting out one copy of each pair of equated attributes. Representation R3 := R1 ⋈ R2
Example Sells Bars Bar Beer Price Joe’s Heineken 8.00 Bud 7.60 Pump Room Bar Addr Joe’s Scotts Rd Pump Room Clark Quay Harry’s Esplanade BarInfo := Sells ⋈ Bars Bar Beer Price Addr Joe’s Heineken 8.00 Scotts Rd Bud 7.60 Pump Room Clark Quay
Expression Trees Structure Example Leaves are operands Variables standing for relations. Interior nodes are operators Applied to their child or children. Example Using the relations Bars(name, addr) and Sells(bar, beer, price), find the names of all the bars that are either on Nathan Road or sell Bud for less than $7.
Sequence of Assignments Example Using the relations Bars(name, addr) and Sells(bar, beer, price), find the names of all the bars that are either on Nathan Road or sell Bud for less than $7. Sequence of Assignments R1 := price < 7 AND beer=“bud” (Sells) R2 := addr=“Nathan Rd” (Bars) R3 := bar(R1) R4:= name(R2) R5:= ρ name(R3) R6:= R5 R4
Sequence of Assignments Expression Tree Sequence of Assignments R1 := price < 7 AND beer=“bud” (Sells) R2 := addr=“Nathan Rd” (Bars) R3 := bar(R1) R4:= name(R2) R5:= ρ name(R3) R6:= R5 R4 UNION RENAMER(name) PROJECTname PROJECTbar SELECTaddr = “Nathan Rd.” SELECT price<7 AND beer=“Bud” Bars Sells
Relational Algebra on Bags SQL SQL, the most important query language for relational databases is actually a bag language. SQL will eliminate duplicates, but usually only if you ask it to do so explicitly. Efficiency Some operations, like projection, are much more efficient on bags than sets.
Example - Projection Sells Beer Heineken Miller Bud Tiger Bar Beer Price Joe’s Heineken 8.00 Ku De Ta Miller 9.00 Bud 7.60 Pump Room Harry’s Tiger 9.50 Beer Heineken Miller Bud Tiger Beers := Beer(Sells)
Example - Union R2 R1 Name Addr favBeer Pauline hku Heineken Eve Harry cuhk Tiger Name Addr favBeer Pauline hku Heineken Joe cuhk Bud Name Addr favBeer Pauline hku Heineken Eve Joe cuhk Bud Harry Tiger R1 R2 Name Addr favBeer Pauline hku Heineken Eve Joe cuhk Bud Harry Tiger
Topics to be covered ER model Relational Algebra SQL Storage and Index Structures Query Processing and Query Optimization
Operations on Relations What we want to do on the relations? Retrieve Insert Delete Update SQL Structured Query Language (SQL) is the standard query language for relational databases. It first became an official standard in 1986 as defined by the American National Standards Institute (ANSI). All major database vendors conform to the SQL standard with minor variations in syntax (different dialects).
SQL Declarative Language Not a complete programming language SQL is a declarative language (non-procedural). A SQL query specifies what to retrieve but not how to retrieve it. Not a complete programming language It does not have control or iteration commands.
Aspects of SQL Data Manipulation Language (DML) Perform queries Perform updates Focus of this course Data Definition Language (DDL) Creates databases, tables, indices Create views Specify authorization Specify integrity constraints Embedded SQL Wrap a Turing-complete programming language around DML to do more sophisticated queries/updates
Principle Form of SQL Basic Structure of SQL SELECT desired attributes (A1, A2, … , An) FROM one or more tables (R1, R2, … , Rm) WHERE condition about tuples of the tables (P) Mapping to Relational Algebra Π A1, A2, …, An (σP (R1 × R2 × … × Rm))
Our Running Example Relational Database Beers(name, manf) Bars(name, addr, license) Drinkers(name, addr, phone) Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar)
Example Query Using Beers(name, manf), what beers are made by Anheuser-Busch? SELECT name FROM Beers WHERE manf = `Anheuser-Busch’;
Results Name Manf Name Beers Heineken Dutch Bud Anheuser-Busch Michelob Beck’s Beer Bremen Bud Lite Name Bud Michelob Bud Lite
* In SELECT Clause Name Manf Name Manf SELECT * FROM Beers WHERE manf = `Anheuser-Busch’; Beers Name Manf Heineken Dutch Bud Anheuser-Busch Michelob Beck’s Beer Bremen Bud Lite Name Manf Bud Anheuser-Busch Michelob Bud Lite
Multi-Relation Queries Motivation Queries often combine data from more than one relation. We can address several relations in one query by listing them all in the FROM clause. Distinguish attributes of the same name by “<relation>.<attribute>” Query Using Likes(drinker, beer) and Frequents(drinker, bar), find the beers liked by at least one person who frequents Bar X SELECT beer FROM Likes AS L, Frequents AS F WHERE bar=`Bar X’ AND F.drinker = L.drinker;
Example Drinker Beer Drinker Bar Beer Likes Melissa Heineken Sean Bud ……… ………… Sally Frequents Drinker Bar Sally MOS Bar X ……. ……………. Melissa Beer Heineken
Explicit Tuple Variables Motivation A query may use two copies of the same relation. Distinguish copies by following the relation name by the name of a tuple-variable, in the FROM clause. An option to rename relations this way Query SELECT b1.name, b2.name FROM Beers b1, Beers b2 WHERE b1.manf = b2.manf AND b1.name < b2.name; From Beers(name, manf), find all pairs of beers by the same manufacturer. Do not produce pairs like (Bud, Bud). Produce pairs in alphabetic order, e.g. (Bud, Miller)
Example Beers b2 Beers b1 SELECT b1.name, b2.name Manf Heineken Dutch Bud Anheuser-Busch Beck’s Beer Bremen Bud Lite Name Manf Heineken Dutch Bud Anheuser-Busch Beck’s Beer Bremen Bud Lite Beers b2 Beers b1 SELECT b1.name, b2.name FROM Beers b1, Beers b2 WHERE b1.manf = b2.manf AND b1.name < b2.name; True False
Subqueries SELECT Clause FROM Clause SQL WHERE Clause SQL SQL
Example Query Subqueries From Sells(bar, beer, price), find the bars that serve Heineken for the same price Bar X charges for Bud. Subqueries Find the price Bar X charges for Bud. Find the bars that serve Heineken at that price.
Scalar Subquery SELECT bar FROM Sells WHERE beer = ‘Heineken’ AND price = (SELECT price FROM Sells WHERE bar = ‘Bar X’ AND beer = ‘Bud’);
Example Bar Beer Price Price Bar SELECT price FROM Sells WHERE bar = `Bar X’ AND beer = `Bud’; Sells Bar Beer Price Clinic Heineken 8.00 Bud 6.60 Bar X 7.90 MOS Price 7.90 SELECT bar FROM Sells WHERE beer = `Heineken’ AND price = 7.90; Bar MOS
Operators inTable Subqueries EXISTS <tuple> IN <relation> is true if and only if the tuple is a member of the relation. EXISTS( <relation> ) is true if and only if the <relation> is not empty. Returns true if the nested query has 1 or more tuples. ANY ALL x = ANY( <relation>) is a boolean cond. meaning that x equals at least one tuple in the relation. x <> ALL(<relation>) is true if and only if for every tuple t in the relation, x is not equal to t. Note Any of the comparison operators (<, <=, =, etc.) can be used. The keyword NOT can proceed any of the operators (s NOT IN R)
Union, Intersection, Difference Usefulness They are generally used to combine the results of two separate SQL queries. UNION, INTERSECT, EXCEPT Syntax ( subquery ) UNION ( subquery ) ( subquery ) INTERSECT ( subquery ) ( subquery ) EXCEPT ( subquery )
Bag Semantics for SQL Difference between Relational Algebra & SQL Relations in SQL are bags instead of sets. Default for SELECT-FROM-WHERE is bag Default for UNION, INTERSECT, and EXCEPT is set How to change the default? Force set semantics with DISTINCT after SELECT Force bag semantics with ALL after UNION, etc. Why? When doing projection in relational algebra, it is easier to avoid eliminating duplicates. When doing intersection or difference, it is most efficient to sort the relations first (eliminate the duplicates then).
Example: DISTINCT Query From Sells(bar, beer, price), find all the different prices charged for beers SELECT DISTINCT price FROM Sells; Note Without DISTINCT, each price would be listed as many times as there were bar/beer pairs at that price.
ORDER BY Clause Ordering Tuples Order of Sorted Attributes The query result returned is not ordered on any attribute by default. We can order the data using the ORDER BY 'ASC' sorts the data in ascending order, and 'DESC' sorts it in descending order. The default is 'ASC'. Order of Sorted Attributes The first attribute specified is sorted on first, then the second attribute is used to break any ties, etc. What about NULL? NULL is normally treated as less than all non-null values.
Example Query Using Beers(name, manf, price), list the beers (and their prices) that are made by Anheuser-Busch? List the more expensive beers first, and sort beers with the same price in ascending order according to their names. SELECT name, price FROM Beers WHERE manf = `Anheuser-Busch’ ORDER BY price DESC, name ASC;
Join Expressions Joins in SQL Natural Join Product SQL provides a number of expression forms that act like varieties of join in relational algebra. But using bag semantics, not set semantics. These expressions can be stand-alone queries or used in place of relations in a FROM clause. Natural Join R NATURAL JOIN S; Example: Likes NATURAL JOIN Serves; Product R CROSS JOIN S;
Theta Join Syntax Example Inner Join R JOIN S ON <condition> A theta-join using <condition> for selection. Example Using Drinkers(name, addr) and Frequents(drinker, bar): Drinkers JOIN Frequents ON name = drinker; Inner Join General form for Equijoin R INNER JOIN S USING (<attribute list>) equi-join on <attribute list> Likes INNER JOIN Frequents USING (drinker);
Outer Join Syntax Different Variants R OUTER JOIN S is the core of an outerjoin expression. Different Variants Optional NATURAL in front of OUTER. Optional ON <condition> after JOIN. Optional LEFT, RIGHT, or FULL before OUTER. LEFT = pad dangling tuples of R only. RIGHT = pad dangling tuples of S only. FULL = pad both; this choice is the default.
Aggregate Functions Five functions Rules COUNT - returns the # of values in a column SUM - returns the sum of the values in a column AVG - returns the average of the values in a column MIN - returns the smallest value in a column MAX - returns the largest value in a column Rules COUNT, MAX, and MIN apply to all types of fields SUM and AVG apply to only numeric fields. Except for COUNT(*) all functions ignore nulls. COUNT(*) returns the number of rows in the table. Use DISTINCT to eliminate duplicates.
Example Query From Sells(bar, beer, price), find the average price of Bud SELECT AVG(price) FROM Sells WHERE beer = `Bud’;
Example – Duplicate Elimination Query From Sells(bar, beer, price), find the number of different prices charged for Bud SELECT COUNT(DISTINCT price) FROM Sells WHERE beer = `Bud’;
Grouping Motivation GROUP BY In many cases, we want to apply the aggregate functions to subgroups of tuples in a relation Each subgroup of tuples consists of the set of tuples that have the same value for the grouping attribute(s) The function is applied to each subgroup independently GROUP BY clause GROUP BY We may follow a SELECT-FROM-WHERE expression by GROUP BY and a list of attributes. The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group.
Example Query From Sells(bar, beer, price), find the average price of each beer SELECT beer, AVG(price) FROM Sells GROUP BY beer;
Results Bar Beer Price Bar Beer Price Beer Avg(Price) Sells Sells Joe’s Heineken 8.00 Sky Bar Miller 9.00 Tiger 7.60 Bar X Harry’s 9.50 Bar Beer Price Sky Bar Miller 9.00 Joe’s Heineken 8.00 Bar X Tiger 7.60 Harry’s 9.50 Beer Avg(Price) Miller 9.00 Heineken 8.00 Tiger 8.55
HAVING Clauses Syntax and Semantics HAVING <condition> may follow a GROUP BY clause. If so, the condition applies to each group, and groups not satisfying the condition are eliminated.
Example: Having Query From Sells(bar, beer, price) and Beers(name, manf), find the average price of those beers that are either served in at least three bars or are manufactured by Pete’s. Beer groups with at least 3 non-NULL bars and also beer groups where the manufacturer is Pete’s. SELECT beer, AVG(price) FROM Sells GROUP BY beer; HAVING COUNT(bar)>= 3 OR beer IN (SELECT name FROM beers WHERE manf = ‘Pete’’s’);
SQL Summary Evaluation SQL SELECT <attribute list> FROM <table list> [WHERE (condition)] [GROUP BY <grouping attributes>] [HAVING <group condition>] [ORDER BY <attribute list>] Evaluation A query is evaluated by first applying the WHERE-clause, then GROUP BY and HAVING, and finally the SELECT-clause Clauses in square brackets ([,]) are optional.
More SQL Database Modification Creation & Deletion of Tables Reference: Hector Garcia-Molina, Jeffrey Ullman, Jenifer Widom. Database Systems - the Complete Book , Second Edition(Prentice Hall)