Relational Database Systems 1

Slides:



Advertisements
Similar presentations
Union, Intersection, Difference (subquery) UNION (subquery) produces the union of the two relations. Similarly for INTERSECT, EXCEPT = intersection and.
Advertisements

1 Relational Algebra* and Tuple Calculus * The slides in this lecture are adapted from slides used in Standford's CS145 course.
SQL Queries Principal form: SELECT desired attributes FROM tuple variables –– range over relations WHERE condition about tuple variables; Running example.
Winter 2002Arthur Keller – CS 1806–1 Schedule Today: Jan. 22 (T) u SQL Queries. u Read Sections Assignment 2 due. Jan. 24 (TH) u Subqueries, Grouping.
SQL CSET 3300.
1 Database Systems Relations as Bags Grouping and Aggregation Database Modification.
1 Introduction to SQL Multirelation Queries Subqueries Slides are reused by the approval of Jeffrey Ullman’s.
IS698: Database Management Min Song IS NJIT. Overview  Query processing  Query Optmization  SQL.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #3.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #3.
Winter 2002Arthur Keller – CS 1807–1 Schedule Today: Jan. 24 (TH) u Subqueries, Grouping and Aggregation. u Read Sections Project Part 2 due.
1 More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
Chapter 6 Notes. 6.1 Simple Queries in SQL SQL is not usually used as a stand-alone language In practice there are hosting programs in a high-level language.
Relational Algebra Basic Operations Algebra of Bags.
Databases 1 First lecture. Informations Lecture: Monday 12:15-13:45 (3.716) Practice: Thursday 10:15-11:45 (2-519) Website of the course:
SCUHolliday6–1 Schedule Today: u SQL Queries. u Read Sections Next time u Subqueries, Grouping and Aggregation. u Read Sections And then.
Databases : SQL-Introduction 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof. Jeffrey D. Ullman distributes.
Constraints on Relations Foreign Keys Local and Global Constraints Triggers Following lecture slides are modified from Jeff Ullman’s slides
Computational Biology Dr. Jens Allmer Lecture Slides Week 6.
Databases : Relational Algebra 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof. Jeffrey D. Ullman distributes.
From Professor Ullman, Relational Algebra.
Databases 1 Second lecture.
Databases : Relational Algebra - Complex Expression 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof.
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
1 Introduction to SQL Database Systems. 2 Why SQL? SQL is a very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation.
1 Lecture 6 Introduction to SQL part 4 Slides from
Relational Algebra BASIC OPERATIONS 1 DATABASE SYSTEMS AND CONCEPTS, CSCI 3030U, UOIT, COURSE INSTRUCTOR: JAREK SZLICHTA.
Himanshu GuptaCSE 532-SQL-1 SQL. Himanshu GuptaCSE 532-SQL-2 Why SQL? SQL is a very-high-level language, in which the programmer is able to avoid specifying.
SCUHolliday - coen 1787–1 Schedule Today: u Subqueries, Grouping and Aggregation. u Read Sections Next u Modifications, Schemas, Views. u Read.
More SQL (and Relational Algebra). More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
Databases : SQL Multi-Relations 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof. Jeffrey D. Ullman.
1 Introduction to Database Systems, CS420 Relational Algebra.
1 Introduction to Database Systems, CS420 SQL JOIN, Aggregate, Grouping, HAVING and DML Clauses.
1 Database Design: DBS CB, 2 nd Edition SQL: Select-From-Where Statements & Multi-relation Queries & Subqueries Ch. 6.
Select-From-Where Statements Multirelation Queries Subqueries
CS 440 Database Management Systems
Basic Operations Algebra of Bags
CPSC-310 Database Systems
Schedule Today: Jan. 28 (Mon) Jan. 30 (Wed) Next Week Assignments !!
Slides are reused by the approval of Jeffrey Ullman’s
CPSC-310 Database Systems
Computational Biology
COP4710 Database Systems Relational Algebra.
Outerjoins, Grouping/Aggregation Insert/Delete/Update
Foreign Keys Local and Global Constraints Triggers
Databases : More about SQL
CPSC-310 Database Systems
Schedule Today: Next After that Subqueries, Grouping and Aggregation.
Introduction to Database Systems, CS420
CPSC-608 Database Systems
06a: SQL-1 The Basics– Select-From-Where
CS 440 Database Management Systems
CPSC-608 Database Systems
Database Design and Programming
Database Models Relational Model
CPSC-310 Database Systems
IST 210: Organization of Data
Operators Expression Trees Bag Model of Data
CPSC-310 Database Systems
IT 244 Database Management System
CSCE 315 – Programming Studio Spring 2010 Project 1, Lecture 4
Basic Operations Algebra of Bags
CPSC-608 Database Systems
CPSC-608 Database Systems
More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation
CPSC-608 Database Systems
CPSC-608 Database Systems
Instructor: Zhe He Department of Computer Science
Select-From-Where Statements Multirelation Queries Subqueries
Presentation transcript:

Relational Database Systems 1 Instructor: Prof. James Cheng Acknowledgement: The slides are extracted from and modified based on the slides provided by Prof. Sourav S. Bhowmick from Nanyang Technological University.

“Relational databases are the foundation of western civilization.” Bruce Lindsay IBM Fellow IBM Almaden Research Center

Topics to be covered ER model Relational Algebra SQL Storage and Index Structures Query Processing and Query Optimization

Topics to be covered ER model Relational Algebra SQL Storage and Index Structures Query Processing and Query Optimization

Entity/Relationship Model (ER Model) The First Step Analyze Analysis of information that should be stored in the database Relationships Relationships between the components of information Entity/Relationship Model (ER Model) A popular approach – Entity/Relationship Model

Purpose of E/R Diagram Design database informally Graphical The E/R model allows us to sketch the design of a database informally. Describing the schema of databases Graphical Designs are pictures called entity-relationship diagrams. Conversion to implementation Mechanical ways to convert E/R diagrams to real implementations

The Process Ideas E/R Design Relational Schema RDBMS

Example Meaning Bars sell some beers Drinkers like some beers Drinkers visit some bars Bars Beers Sells manf name addr Drinkers Likes Visits ID Relations Beers(name, manf) Bars(name, addr) Drinkers(ID, name, addr) Sells(bar,beer) Visits(drinker,bar) Likes(drinker,beer) addr

Topics to be covered ER model Relational Algebra SQL Storage and Index Structures Query Processing and Query Optimization

Place in the big picture Declarative query language Algebra Implementation Relational Algebra SQL, relational calculus

Core Relational Algebra Union, Intersection, and Difference Usual set operations, but require both operands have the same relation schema. Selection Projection Picking certain rows. Picking certain columns. Products & Joins Renaming Compositions of relations. Renaming of relations and attributes.

Union Union operator Rule Builds a relation consisting of all tuples appearing in either or both of two specified relations. Combines all rows from two tables, excluding duplicate rows Rule Tables must have the same attribute characteristics

Example R1 R2 Name Addr favBeer Pauline hku Heineken Joe cuhk Bud Name Eve Harry cuhk Tiger R1  R2 Name Addr favBeer Pauline hku Heineken Eve Joe cuhk Bud Harry Tiger

Intersection Intersection operator Results Builds a relation consisting of all tuples appearing in both of two specified relations Results Yields only the rows that appear in both tables

Example R1 R2 Name Addr favBeer Pauline hku Heineken Joe cuhk Bud Name Eve Harry cuhk Tiger R1  R2 Name Addr favBeer Pauline hku Heineken

Difference Difference operator Results Builds a relation consisting of all tuples appearing in first relation but not the second. Results It subtracts one table from the other

Example R1 R2 Name Addr favBeer Pauline hku Heineken Joe cuhk Bud Name Eve Harry cuhk Tiger R1 - R2 Name Addr favBeer Joe cuhk Bud

Selection Selection operator Representation Extracts specified tuples (rows) from a specified relation (table). Returns all tuples which satisfy a condition Representation R1 = sc(R2) C is a condition (as in “if” statements) that refers to attributes of R2. R1 is all those tuples of R2 that satisfy C.

Example Bar Beer Price Joe’s Heineken 8.00 Bud 7.60 Sells Bar Beer Ku De Ta Miller 9.00 Bud 7.60 Clinic Harry’s Tiger 9.50 JoeMenu := Bar=“Joe’s”(Sells)

Projection Projection operator Representation Extracts specified attributes (columns) from a specified relation. Representation R1 := P L (R2) L is a list of attributes from the schema of R2. R1 is constructed by looking at each tuple of R2, extracting the attributes on list L, in the order specified, and creating from those components a tuple for R1. Eliminate duplicate tuples, if any.

Example Sells Bar Beer Price Joe’s Heineken 8.00 Ku De Ta Miller 9.00 Bud 7.60 Clinic Harry’s Tiger 9.50 Beer Price Heineken 8.00 Miller 9.00 Bud 7.60 Tiger 9.50 Prices := Beer,Price(Sells)

Cartesian Product Cartesian Product Representation Builds a relation from two specified relations consisting of all possible concatenated pairs of tuples, one from each of the two relations. Representation R3 := R1 × R2 Pair each tuple t1 of R1 with each tuple t2 of R2. The concatenation “t1 t2” is a tuple of R3. Schema of R3 is the attributes of R1 and R2, in order. Beware! Beware of attribute A of the same name in R1 and R2: use R1.A and R2.A.

Example A B B C A R1.B R2.B C R1 R2 1 2 5 6 3 3 4 7 8 R1 × R2 1 2 3 4

Theta-Join Join Representation Builds a relation from two specified relations consisting of all possible concatenated pairs, one from each of the two relations, such that in each pair the two tuples satisfy some condition. Representation R3 := R1 ⋈ C R2 Take the product R1 × R2. Then apply C to the result. R1 ⋈ C R2 = C (R1 × R2)

Equi-Join The Condition C Equi-Join C can be any boolean-valued condition. Historic versions of this operator allowed only A theta B, where theta was =, <, etc.; hence the name “theta-join.” Equi-Join If C is a conjunction of equality then it is called an equi-join

Example Sells Bars Bar Beer Price Joe’s Heineken 8.00 Bud 7.60 Pump Room Name Addr Joe’s Scotts Rd Pump Room Clark Quay Harry’s Esplanade BarInfo :=Sells ⋈ Sell.Bar = Bars.Name Bars Bar Beer Price Name Addr Joe’s Heineken 8.00 Scotts Rd Bud 7.60 Pump Room Clark Quay

Natural Join Natural Join Representation Connects two relations by: Equating attributes of the same name, and Projecting out one copy of each pair of equated attributes. Representation R3 := R1 ⋈ R2

Example Sells Bars Bar Beer Price Joe’s Heineken 8.00 Bud 7.60 Pump Room Bar Addr Joe’s Scotts Rd Pump Room Clark Quay Harry’s Esplanade BarInfo := Sells ⋈ Bars Bar Beer Price Addr Joe’s Heineken 8.00 Scotts Rd Bud 7.60 Pump Room Clark Quay

Expression Trees Structure Example Leaves are operands Variables standing for relations. Interior nodes are operators Applied to their child or children. Example Using the relations Bars(name, addr) and Sells(bar, beer, price), find the names of all the bars that are either on Nathan Road or sell Bud for less than $7.

Sequence of Assignments Example Using the relations Bars(name, addr) and Sells(bar, beer, price), find the names of all the bars that are either on Nathan Road or sell Bud for less than $7. Sequence of Assignments R1 := price < 7 AND beer=“bud” (Sells) R2 := addr=“Nathan Rd” (Bars) R3 := bar(R1) R4:= name(R2) R5:= ρ name(R3) R6:= R5  R4

Sequence of Assignments Expression Tree Sequence of Assignments R1 := price < 7 AND beer=“bud” (Sells) R2 := addr=“Nathan Rd” (Bars) R3 := bar(R1) R4:= name(R2) R5:= ρ name(R3) R6:= R5  R4 UNION RENAMER(name) PROJECTname PROJECTbar SELECTaddr = “Nathan Rd.” SELECT price<7 AND beer=“Bud” Bars Sells

Relational Algebra on Bags SQL SQL, the most important query language for relational databases is actually a bag language. SQL will eliminate duplicates, but usually only if you ask it to do so explicitly. Efficiency Some operations, like projection, are much more efficient on bags than sets.

Example - Projection Sells Beer Heineken Miller Bud Tiger Bar Beer Price Joe’s Heineken 8.00 Ku De Ta Miller 9.00 Bud 7.60 Pump Room Harry’s Tiger 9.50 Beer Heineken Miller Bud Tiger Beers := Beer(Sells)

Example - Union R2 R1 Name Addr favBeer Pauline hku Heineken Eve Harry cuhk Tiger Name Addr favBeer Pauline hku Heineken Joe cuhk Bud Name Addr favBeer Pauline hku Heineken Eve Joe cuhk Bud Harry Tiger R1  R2 Name Addr favBeer Pauline hku Heineken Eve Joe cuhk Bud Harry Tiger

Topics to be covered ER model Relational Algebra SQL Storage and Index Structures Query Processing and Query Optimization

Operations on Relations What we want to do on the relations? Retrieve Insert Delete Update SQL Structured Query Language (SQL) is the standard query language for relational databases. It first became an official standard in 1986 as defined by the American National Standards Institute (ANSI). All major database vendors conform to the SQL standard with minor variations in syntax (different dialects).

SQL Declarative Language Not a complete programming language SQL is a declarative language (non-procedural). A SQL query specifies what to retrieve but not how to retrieve it. Not a complete programming language It does not have control or iteration commands.

Aspects of SQL Data Manipulation Language (DML) Perform queries Perform updates Focus of this course Data Definition Language (DDL) Creates databases, tables, indices Create views Specify authorization Specify integrity constraints Embedded SQL Wrap a Turing-complete programming language around DML to do more sophisticated queries/updates

Principle Form of SQL Basic Structure of SQL SELECT desired attributes (A1, A2, … , An) FROM one or more tables (R1, R2, … , Rm) WHERE condition about tuples of the tables (P) Mapping to Relational Algebra Π A1, A2, …, An (σP (R1 × R2 × … × Rm))

Our Running Example Relational Database Beers(name, manf) Bars(name, addr, license) Drinkers(name, addr, phone) Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar)

Example Query Using Beers(name, manf), what beers are made by Anheuser-Busch? SELECT name FROM Beers WHERE manf = `Anheuser-Busch’;

Results Name Manf Name Beers Heineken Dutch Bud Anheuser-Busch Michelob Beck’s Beer Bremen Bud Lite Name Bud Michelob Bud Lite

* In SELECT Clause Name Manf Name Manf SELECT * FROM Beers WHERE manf = `Anheuser-Busch’; Beers Name Manf Heineken Dutch Bud Anheuser-Busch Michelob Beck’s Beer Bremen Bud Lite Name Manf Bud Anheuser-Busch Michelob Bud Lite

Multi-Relation Queries Motivation Queries often combine data from more than one relation. We can address several relations in one query by listing them all in the FROM clause. Distinguish attributes of the same name by “<relation>.<attribute>” Query Using Likes(drinker, beer) and Frequents(drinker, bar), find the beers liked by at least one person who frequents Bar X SELECT beer FROM Likes AS L, Frequents AS F WHERE bar=`Bar X’ AND F.drinker = L.drinker;

Example Drinker Beer Drinker Bar Beer Likes Melissa Heineken Sean Bud ……… ………… Sally Frequents Drinker Bar Sally MOS Bar X ……. ……………. Melissa Beer Heineken

Explicit Tuple Variables Motivation A query may use two copies of the same relation. Distinguish copies by following the relation name by the name of a tuple-variable, in the FROM clause. An option to rename relations this way Query SELECT b1.name, b2.name FROM Beers b1, Beers b2 WHERE b1.manf = b2.manf AND b1.name < b2.name; From Beers(name, manf), find all pairs of beers by the same manufacturer. Do not produce pairs like (Bud, Bud). Produce pairs in alphabetic order, e.g. (Bud, Miller)

Example Beers b2 Beers b1 SELECT b1.name, b2.name Manf Heineken Dutch Bud Anheuser-Busch Beck’s Beer Bremen Bud Lite Name Manf Heineken Dutch Bud Anheuser-Busch Beck’s Beer Bremen Bud Lite Beers b2 Beers b1 SELECT b1.name, b2.name FROM Beers b1, Beers b2 WHERE b1.manf = b2.manf AND b1.name < b2.name; True False

Subqueries SELECT Clause FROM Clause SQL WHERE Clause SQL SQL

Example Query Subqueries From Sells(bar, beer, price), find the bars that serve Heineken for the same price Bar X charges for Bud. Subqueries Find the price Bar X charges for Bud. Find the bars that serve Heineken at that price.

Scalar Subquery SELECT bar FROM Sells WHERE beer = ‘Heineken’ AND price = (SELECT price FROM Sells WHERE bar = ‘Bar X’ AND beer = ‘Bud’);

Example Bar Beer Price Price Bar SELECT price FROM Sells WHERE bar = `Bar X’ AND beer = `Bud’; Sells Bar Beer Price Clinic Heineken 8.00 Bud 6.60 Bar X 7.90 MOS Price 7.90 SELECT bar FROM Sells WHERE beer = `Heineken’ AND price = 7.90; Bar MOS

Operators inTable Subqueries EXISTS <tuple> IN <relation> is true if and only if the tuple is a member of the relation. EXISTS( <relation> ) is true if and only if the <relation> is not empty. Returns true if the nested query has 1 or more tuples. ANY ALL x = ANY( <relation>) is a boolean cond. meaning that x equals at least one tuple in the relation. x <> ALL(<relation>) is true if and only if for every tuple t in the relation, x is not equal to t. Note Any of the comparison operators (<, <=, =, etc.) can be used. The keyword NOT can proceed any of the operators (s NOT IN R)

Union, Intersection, Difference Usefulness They are generally used to combine the results of two separate SQL queries. UNION, INTERSECT, EXCEPT Syntax ( subquery ) UNION ( subquery ) ( subquery ) INTERSECT ( subquery ) ( subquery ) EXCEPT ( subquery )

Bag Semantics for SQL Difference between Relational Algebra & SQL Relations in SQL are bags instead of sets. Default for SELECT-FROM-WHERE is bag Default for UNION, INTERSECT, and EXCEPT is set How to change the default? Force set semantics with DISTINCT after SELECT Force bag semantics with ALL after UNION, etc. Why? When doing projection in relational algebra, it is easier to avoid eliminating duplicates. When doing intersection or difference, it is most efficient to sort the relations first (eliminate the duplicates then).

Example: DISTINCT Query From Sells(bar, beer, price), find all the different prices charged for beers SELECT DISTINCT price FROM Sells; Note Without DISTINCT, each price would be listed as many times as there were bar/beer pairs at that price.

ORDER BY Clause Ordering Tuples Order of Sorted Attributes The query result returned is not ordered on any attribute by default. We can order the data using the ORDER BY 'ASC' sorts the data in ascending order, and 'DESC' sorts it in descending order. The default is 'ASC'. Order of Sorted Attributes The first attribute specified is sorted on first, then the second attribute is used to break any ties, etc. What about NULL? NULL is normally treated as less than all non-null values.

Example Query Using Beers(name, manf, price), list the beers (and their prices) that are made by Anheuser-Busch? List the more expensive beers first, and sort beers with the same price in ascending order according to their names. SELECT name, price FROM Beers WHERE manf = `Anheuser-Busch’ ORDER BY price DESC, name ASC;

Join Expressions Joins in SQL Natural Join Product SQL provides a number of expression forms that act like varieties of join in relational algebra. But using bag semantics, not set semantics. These expressions can be stand-alone queries or used in place of relations in a FROM clause. Natural Join R NATURAL JOIN S; Example: Likes NATURAL JOIN Serves; Product R CROSS JOIN S;

Theta Join Syntax Example Inner Join R JOIN S ON <condition> A theta-join using <condition> for selection. Example Using Drinkers(name, addr) and Frequents(drinker, bar): Drinkers JOIN Frequents ON name = drinker; Inner Join General form for Equijoin R INNER JOIN S USING (<attribute list>) equi-join on <attribute list> Likes INNER JOIN Frequents USING (drinker);

Outer Join Syntax Different Variants R OUTER JOIN S is the core of an outerjoin expression. Different Variants Optional NATURAL in front of OUTER. Optional ON <condition> after JOIN. Optional LEFT, RIGHT, or FULL before OUTER. LEFT = pad dangling tuples of R only. RIGHT = pad dangling tuples of S only. FULL = pad both; this choice is the default.

Aggregate Functions Five functions Rules COUNT - returns the # of values in a column SUM - returns the sum of the values in a column AVG - returns the average of the values in a column MIN - returns the smallest value in a column MAX - returns the largest value in a column Rules COUNT, MAX, and MIN apply to all types of fields SUM and AVG apply to only numeric fields. Except for COUNT(*) all functions ignore nulls. COUNT(*) returns the number of rows in the table. Use DISTINCT to eliminate duplicates.

Example Query From Sells(bar, beer, price), find the average price of Bud SELECT AVG(price) FROM Sells WHERE beer = `Bud’;

Example – Duplicate Elimination Query From Sells(bar, beer, price), find the number of different prices charged for Bud SELECT COUNT(DISTINCT price) FROM Sells WHERE beer = `Bud’;

Grouping Motivation GROUP BY In many cases, we want to apply the aggregate functions to subgroups of tuples in a relation Each subgroup of tuples consists of the set of tuples that have the same value for the grouping attribute(s) The function is applied to each subgroup independently GROUP BY clause GROUP BY We may follow a SELECT-FROM-WHERE expression by GROUP BY and a list of attributes. The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group.

Example Query From Sells(bar, beer, price), find the average price of each beer SELECT beer, AVG(price) FROM Sells GROUP BY beer;

Results Bar Beer Price Bar Beer Price Beer Avg(Price) Sells Sells Joe’s Heineken 8.00 Sky Bar Miller 9.00 Tiger 7.60 Bar X Harry’s 9.50 Bar Beer Price Sky Bar Miller 9.00 Joe’s Heineken 8.00 Bar X Tiger 7.60 Harry’s 9.50 Beer Avg(Price) Miller 9.00 Heineken 8.00 Tiger 8.55

HAVING Clauses Syntax and Semantics HAVING <condition> may follow a GROUP BY clause. If so, the condition applies to each group, and groups not satisfying the condition are eliminated.

Example: Having Query From Sells(bar, beer, price) and Beers(name, manf), find the average price of those beers that are either served in at least three bars or are manufactured by Pete’s. Beer groups with at least 3 non-NULL bars and also beer groups where the manufacturer is Pete’s. SELECT beer, AVG(price) FROM Sells GROUP BY beer; HAVING COUNT(bar)>= 3 OR beer IN (SELECT name FROM beers WHERE manf = ‘Pete’’s’);

SQL Summary Evaluation SQL SELECT <attribute list> FROM <table list> [WHERE (condition)] [GROUP BY <grouping attributes>] [HAVING <group condition>] [ORDER BY <attribute list>] Evaluation A query is evaluated by first applying the WHERE-clause, then GROUP BY and HAVING, and finally the SELECT-clause Clauses in square brackets ([,]) are optional.

More SQL Database Modification Creation & Deletion of Tables Reference: Hector Garcia-Molina, Jeffrey Ullman, Jenifer Widom. Database Systems - the Complete Book , Second Edition(Prentice Hall)