Lecture 33: The Relational Model 2

Slides:



Advertisements
Similar presentations
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 5A Relational Algebra.
Advertisements

Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014.
CS CS4432: Database Systems II Logical Plan Rewriting.
Lecture 07: Relational Algebra
1 Relational Algebra. Motivation Write a Java program Translate it into a program in assembly language Execute the assembly language program As a rough.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
CS 4432query processing1 CS4432: Database Systems II.
1 Lecture 07: Relational Algebra. 2 Outline Relational Algebra (Section 6.1)
Rutgers University Relational Algebra 198:541 Rutgers University.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Relational Schema Design (end) Relational Algebra Finally, querying the database!
One More Normal Form Consider the dependencies: Product Company Company, State Product Is it in BCNF?
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
1 Introduction to Database Systems CSE 444 Lecture 20: Query Execution: Relational Algebra May 21, 2008.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Transactions, Relational Algebra, XML February 11 th, 2004.
CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
Relational Algebra 2. Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declartive query language.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
 CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
Lecture 13: Relational Decomposition and Relational Algebra February 5 th, 2003.
The Relational Model Lecture 16. Today’s Lecture 1.The Relational Model & Relational Algebra 2.Relational Algebra Pt. II [Optional: may skip] 2 Lecture.
Optimization Overview Lecture 17. Today’s Lecture 1.Logical Optimization 2.Physical Optimization 3.Course Summary 2 Lecture 17.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
Query Processing – Implementing Set Operations and Joins Chap. 19.
1 CSE544: Lecture 7 XQuery, Relational Algebra Monday, 4/22/02.
Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins
Relational Algebra & Calculus
Lecture 14: Relational Algebra Projects XML?
Relational Algebra.
COMP3017 Advanced Databases
CS4432: Database Systems II
COP4710 Database Systems Relational Algebra.
Relational Algebra at a Glance
Lecture 8: Relational Algebra
Relational Algebra Chapter 4, Part A
Database Management Systems (CS 564)
Database Management Systems (CS 564)
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Database Management Systems (CS 564)
CS 405G: Introduction to Database Systems
Relational Algebra.
Lecture 18 The Relational Model.
Lecture 10: Relational Algebra (continued), Datalog
April 6th – relational algebra
LECTURE 3: Relational Algebra
Relational Algebra Chapter 4 - part I.
Instructor: Mohamed Eltabakh
Relational Algebra Chapter 4, Sections 4.1 – 4.2
RDBMS RELATIONAL DATABASE MANAGEMENT SYSTEM.
Lecture 9: Relational Algebra (continued), Datalog
Relational Algebra Friday, 11/14/2003.
CS639: Data Management for Data Science
CPSC-608 Database Systems
CS639: Data Management for Data Science
Lecture 3: Relational Algebra and SQL
Relational Schema Design (end) Relational Algebra SQL (maybe)
Syllabus Introduction Website Management Systems
CENG 351 File Structures and Data Managemnet
Relational Algebra & Calculus
Relational Algebra Chapter 4 - part I.
Lecture 11: Functional Dependencies
Presentation transcript:

Lecture 33: The Relational Model 2 Professor Xiannong Meng Spring 2018 Lecture and activity contents are based on what Prof Chris Ré of Stanford used in his CS 145 in the fall 2016 term with permission

Relational Algebra (RA) Five basic operators: Selection: s Projection: P Cartesian Product:  Union:  Difference: - Derived or auxiliary operators: Intersection, complement Joins (natural,equi-join, theta join, semi-join) Renaming: r Division We’ll look at these first! And also at one example of a derived operator (natural join) and a special operator (renaming)

1. Selection (𝜎) 𝜎 𝑔𝑝𝑎 >3.5 (𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠) SQL: Students(sid,sname,gpa) SQL: Returns all tuples which satisfy a condition Notation: sc(R) Examples sSalary > 40000 (Employee) sname = “Smith” (Employee) The condition c can be =, <, , >, , <> SELECT * FROM Students WHERE gpa > 3.5; RA: 𝜎 𝑔𝑝𝑎 >3.5 (𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

2. Projection (Π) Π 𝑠𝑛𝑎𝑚𝑒,𝑔𝑝𝑎 (𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠) SQL: Students(sid,sname,gpa) SQL: Eliminates columns, then removes duplicates Notation: P A1,…,An (R) Example: project social-security number and names: P SSN, Name (Employee) Output schema: Answer(SSN, Name) SELECT DISTINCT sname, gpa FROM Students; RA: Π 𝑠𝑛𝑎𝑚𝑒,𝑔𝑝𝑎 (𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

Note that RA Operators are Compositional! Students(sid,sname,gpa) SELECT DISTINCT sname, gpa FROM Students WHERE gpa > 3.5; Π 𝑠𝑛𝑎𝑚𝑒,𝑔𝑝𝑎 ( 𝜎 𝑔𝑝𝑎>3.5 (𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)) 𝜎 𝑔𝑝𝑎>3.5 (Π 𝑠𝑛𝑎𝑚𝑒,𝑔𝑝𝑎 ( 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)) How do we represent this query in RA? Are these logically equivalent?

3. Cross-Product (×) 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 × 𝑃𝑒𝑜𝑝𝑙𝑒 SQL: Students(sid,sname,gpa) People(ssn,pname,address) SQL: Each tuple in R1 with each tuple in R2 Notation: R1  R2 Example: Employee  Dependents Rare in practice; mainly used to express joins SELECT * FROM Students, People; RA: 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 × 𝑃𝑒𝑜𝑝𝑙𝑒

× 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 × 𝑃𝑒𝑜𝑝𝑙𝑒 Another example: People Students ssn pname address 1234545 John 216 Rosse 5423341 Bob 217 Rosse sid sname gpa 001 John 3.4 002 Bob 1.3 × 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 × 𝑃𝑒𝑜𝑝𝑙𝑒 ssn pname address sid sname gpa 1234545 John 216 Rosse 001 3.4 5423341 Bob 217 Rosse 002 1.3

Renaming (𝜌) SQL: Changes the schema, not the instance Students(sid,sname,gpa) SQL: Changes the schema, not the instance A ‘special’ operator- neither basic nor derived Notation: r B1,…,Bn (R) Note: this is shorthand for the proper form (since names, not order matters!): r A1B1,…,AnBn (R) SELECT sid AS studId, sname AS name, gpa AS gradePtAvg FROM Students; RA: 𝜌 𝑠𝑡𝑢𝑑𝐼𝑑,𝑛𝑎𝑚𝑒,𝑔𝑟𝑎𝑑𝑒𝑃𝑡𝐴𝑣𝑔 (𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠) We care about this operator because we are working in a named perspective

𝜌 𝑠𝑡𝑢𝑑𝐼𝑑,𝑛𝑎𝑚𝑒,𝑔𝑟𝑎𝑑𝑒𝑃𝑡𝐴𝑣𝑔 (𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠) Another example: Students sid sname gpa 001 John 3.4 002 Bob 1.3 𝜌 𝑠𝑡𝑢𝑑𝐼𝑑,𝑛𝑎𝑚𝑒,𝑔𝑟𝑎𝑑𝑒𝑃𝑡𝐴𝑣𝑔 (𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠) Students studId name gradePtAvg 001 John 3.4 002 Bob 1.3

Natural Join (⋈) 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒 SQL: RA: Notation: R1 ⋈ R2 Students(sid,name,gpa) People(ssn,name,address) SQL: Notation: R1 ⋈ R2 Joins R1 and R2 on equality of all shared attributes If R1 has attribute set A, and R2 has attribute set B, and they share attributes A⋂B = C, can also be written: R1 ⋈𝐶 R2 Our first example of a derived RA operator: Meaning: R1 ⋈ R2 = PA U B(sC=D( 𝜌 𝐶→𝐷 (R1)  R2)) Where: The rename 𝜌 𝐶→𝐷 renames the shared attributes in one of the relations The selection sC=D checks equality of the shared attributes The projection PA U B eliminates the duplicate common attributes SELECT DISTINCT ssid, S.name, gpa, ssn, address FROM Students S, People P WHERE S.name = P.name; RA: 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒

⋈ 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈𝑃𝑒𝑜𝑝𝑙𝑒 Another example: Students S People P sid S.name gpa 001 John 3.4 002 Bob 1.3 ssn P.name address 1234545 John 216 Rosse 5423341 Bob 217 Rosse ⋈ 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈𝑃𝑒𝑜𝑝𝑙𝑒 sid S.name gpa ssn address 001 John 3.4 1234545 216 Rosse 002 Bob 1.3 5423341

Natural Join Given schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? Given R(A, B, C), S(D, E), what is R ⋈ S ? Given R(A, B), S(A, B), what is R ⋈ S ?

Example: Converting SFW Query -> RA SELECT DISTINCT gpa, address FROM Students S, People P WHERE gpa > 3.5 AND sname = pname; Students(sid,sname,gpa) People(ssn,sname,address) Π 𝑔𝑝𝑎,𝑎𝑑𝑑𝑟𝑒𝑠𝑠 ( 𝜎 𝑔𝑝𝑎>3.5 (𝑆⋈𝑃)) How do we represent this query in RA?

Logical Equivalence of RA Plans Given relations R(A,B) and S(B,C): Here, projection & selection commute: 𝜎 𝐴=5 (Π 𝐴 (𝑅))= Π 𝐴 ( 𝜎 𝐴=5 (𝑅)) What about here? 𝜎 𝐴=5 (Π 𝐵 (𝑅)) ?= Π 𝐵 ( 𝜎 𝐴=5 (𝑅)) We’ll look at this in more depth later in the lecture…

Relational Algebra (RA) Plan RDBMS Architecture How does a SQL engine work ? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution We saw how we can transform declarative SQL queries into precise, compositional RA plans

RDBMS Architecture How does a SQL engine work ? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution We’ll look at how to then optimize these plans later in this lecture

RDBMS Architecture How is the RA “plan” executed? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution We already know how to execute all the basic operators!

We already know how to execute all the basic operators! RA Plan Execution Natural Join / Join: We saw how to use memory & IO cost considerations to pick the correct algorithm to execute a join with (BNLJ, SMJ, HJ…)! Selection: We saw how to use indexes to aid selection Can always fall back on scan / binary search as well Projection: The main operation here is finding distinct values of the project tuples; we briefly discussed how to do this with e.g. hashing or sorting We already know how to execute all the basic operators!

2. Adv. Relational Algebra

What you will learn about in this section Set Operations in RA Fancier RA Extensions & Limitations

Relational Algebra (RA) Five basic operators: Selection: s Projection: P Cartesian Product:  Union:  Difference: - Derived or auxiliary operators: Intersection, complement Joins (natural,equi-join, theta join, semi-join) Renaming: r Division We’ll look at these And also at some of these derived operators

1. Union () and 2. Difference (–) R1  R2 Example: ActiveEmployees  RetiredEmployees R1 – R2 AllEmployees - RetiredEmployees R1 R2 R1 R2

What about Intersection () ? It is a derived operator R1  R2 = R1 – (R1 – R2) Also expressed as a join! Example UnionizedEmployees  RetiredEmployees R1 R2

Fancier RA

Theta Join (⋈q) 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝜃 𝑃𝑒𝑜𝑝𝑙𝑒 SQL: Students(sid,sname,gpa) People(ssn,pname,address) SQL: A join that involves a predicate R1 ⋈q R2 = s q (R1  R2) Here q can be any condition SELECT * FROM Students,People WHERE q; Note that natural join is a theta join + a projection. RA: 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝜃 𝑃𝑒𝑜𝑝𝑙𝑒

Equi-join (⋈ A=B) 𝑆 ⋈ 𝑠𝑛𝑎𝑚𝑒=𝑝𝑛𝑎𝑚𝑒 𝑃 Most common join in practice! SQL: Students(sid,sname,gpa) People(ssn,pname,address) SQL: A theta join where q is an equality R1 ⋈ A=B R2 = s A=B (R1  R2) Example: Employee ⋈ SSN=SSN Dependents SELECT * FROM Students S, People P WHERE sname = pname; Most common join in practice! RA: 𝑆 ⋈ 𝑠𝑛𝑎𝑚𝑒=𝑝𝑛𝑎𝑚𝑒 𝑃

Semijoin (⋉) 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠⋉𝑃𝑒𝑜𝑝𝑙𝑒 SQL: R ⋉ S = P A1,…,An (R ⋈ S) Students(sid,sname,gpa) People(ssn,pname,address) SQL: R ⋉ S = P A1,…,An (R ⋈ S) Where A1, …, An are the attributes in R Example: Employee ⋉ Dependents SELECT DISTINCT sid,sname,gpa FROM Students,People WHERE sname = pname; RA: 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠⋉𝑃𝑒𝑜𝑝𝑙𝑒

Semijoins in Distributed Databases Semijoins are often used to compute natural joins in distributed databases Dependents Employee SSN Dname Age . . . SSN Name . . . network Employee ⋈ ssn=ssn (s age>71 (Dependents)) Send less data to reduce network bandwidth! T = P SSN s age>71 (Dependents) R = Employee ⋉ T Answer = R ⋈ Dependents

RA Expressions Can Get Complex! P name buyer-ssn=ssn pid=pid seller-ssn=ssn P ssn P pid sname=fred sname=gizmo Person Purchase Person Product

Multisets

Recall that SQL uses Multisets 𝝀 𝑿 = “Count of tuple in X” (Items not listed have implicit count 0) Multiset X Tuple (1, a) (1, b) (2, c) (1, d) Multiset X Tuple 𝝀(𝑿) (1, a) 2 (1, b) 1 (2, c) 3 (1, d) Equivalent Representations of a Multiset Note: In a set all counts are {0,1}.

Generalizing Set Operations to Multiset Operations Multiset X Multiset Y Multiset Z Tuple 𝝀(𝑿) (1, a) 2 (1, b) (2, c) 3 (1, d) Tuple 𝝀(𝒀) (1, a) 5 (1, b) 1 (2, c) 2 (1, d) Tuple 𝝀(𝒁) (1, a) 2 (1, b) (2, c) (1, d) ∩ = For sets, this is intersection 𝝀 𝒁 =𝒎𝒊𝒏(𝝀 𝑿 ,𝝀 𝒀 )

Generalizing Set Operations to Multiset Operations Multiset X Multiset Y Multiset Z Tuple 𝝀(𝑿) (1, a) 2 (1, b) (2, c) 3 (1, d) Tuple 𝝀(𝒀) (1, a) 5 (1, b) 1 (2, c) 2 (1, d) Tuple 𝝀(𝒁) (1, a) 7 (1, b) 1 (2, c) 5 (1, d) 2 ∪ = For sets, this is union 𝝀 𝒁 =𝝀 𝑿 + 𝝀 𝒀

Operations on Multisets All RA operations need to be defined carefully on bags sC(R): preserve the number of occurrences PA(R): no duplicate elimination Cross-product, join: no duplicate elimination This is important- relational engines work on multisets, not sets!

RA has Limitations ! Cannot compute “transitive closure” Find all direct and indirect relatives of Fred Cannot express in RA !!! Need to write C program, use a graph engine, or modern SQL… Name1 Name2 Relationship Fred Mary Father Joe Cousin Bill Spouse Nancy Lou Sister