April 6th – relational algebra

Slides:



Advertisements
Similar presentations
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Advertisements

Relational Algebra Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014.
Lecture 07: Relational Algebra
1 Relational Algebra. Motivation Write a Java program Translate it into a program in assembly language Execute the assembly language program As a rough.
Relational Algebra Ch. 7.4 – 7.6 John Ortiz. Lecture 4Relational Algebra2 Relational Query Languages  Query languages: allow manipulation and retrieval.
CS4432: Database Systems II Query Operator & Algebraic Expressions 1.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
Nov 24, 2003Murali Mani SQL B term 2004: lecture 12.
1 Lecture 07: Relational Algebra. 2 Outline Relational Algebra (Section 6.1)
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Nov 18, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data retrieval) for relational model SQL’s DML (Data Manipulation.
One More Normal Form Consider the dependencies: Product Company Company, State Product Is it in BCNF?
Chapter 3 Section 3.4 Relational Database Operators
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
Relational Algebra - Chapter (7th ed )
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
1 Introduction to Database Systems CSE 444 Lecture 20: Query Execution: Relational Algebra May 21, 2008.
Oracle DML Dr. Bernard Chen Ph.D. University of Central Arkansas.
From Relational Algebra to SQL CS 157B Enrique Tang.
CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.
Chapter 5 Relational Algebra and Relational Calculus Pearson Education © 2009.
Relational Algebra 2. Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declartive query language.
Advanced Relational Algebra & SQL (Part1 )
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
Views, Algebra Temporary Tables. Definition of a view A view is a virtual table which does not physically hold data but instead acts like a window into.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
Lecture 13: Relational Decomposition and Relational Algebra February 5 th, 2003.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Ritu CHaturvedi Some figures are adapted from T. COnnolly
Database Systems Chapter 6
Relational Algebra Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 4.
Relational Algebra.
COP4710 Database Systems Relational Algebra.
CS257 Query Optimization.
Relational Algebra at a Glance
Relational Algebra Chapter 4 1.
Lecture 8: Relational Algebra
Schedule Today: Next After that Subqueries, Grouping and Aggregation.
CS 440 Database Management Systems
Relational Algebra Chapter 4, Part A
SQL Structured Query Language 11/9/2018 Introduction to Databases.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
CS 405G: Introduction to Database Systems
Relational Algebra.
Cse 344 April 11th – Datalog.
April 20th – RDBMS Internals
Relational Algebra 1.
Lecture 18 The Relational Model.
LECTURE 3: Relational Algebra
Relational Algebra Chapter 4 1.
The Relational Algebra and Relational Calculus
Relational Algebra Chapter 4 - part I.
Cse 344 January 29th – Datalog.
Instructor: Mohamed Eltabakh
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Lecture 33: The Relational Model 2
The Relational Algebra
Relational Algebra Friday, 11/14/2003.
CS639: Data Management for Data Science
CMPT 354: Database System I
Syllabus Introduction Website Management Systems
CENG 351 File Structures and Data Managemnet
Relational Algebra & Calculus
Relational Algebra Chapter 4 - part I.
Lecture 11: Functional Dependencies
Select-From-Where Statements Multirelation Queries Subqueries
Presentation transcript:

April 6th – relational algebra Cse 344 April 6th – relational algebra

Assorted Minutiae HW2 out (Due Wednesday) git pull upstream master OQ1 due tonight OQ2/3 Out Both due next Friday Azure accounts will be created over the weekend Needed for HW3

Relational algebra Remember from last week SQL queries are combinations of functions on tables Each one receives tables as input and has a table as an output

Relational Algebra Set-at-a-time algebra, which manipulates relations In SQL we say what we want In RA we can express how to get it Every DBMS implementations converts a SQL query to RA in order to execute it An RA expression is called a query plan

Basics Relations and attributes Functions that are applied to relations Return relations Can be composed together Often displayed using a tree rather than linearly Use Greek symbols: σ, p, δ, etc

Sets v.s. Bags Sets: {a,b,c}, {a,d,e,f}, { }, . . . Bags: {a, a, b, c}, {b, b, b, b, b}, . . . Relational Algebra has two flavors: Set semantics = standard Relational Algebra Bag semantics = extended Relational Algebra DB systems implement bag semantics (Why?)

Relational Algebra Operators Union ∪, intersection ∩, difference - Selection σ Projection π Cartesian product X, join ⨝ (Rename ρ) Duplicate elimination δ Grouping and aggregation ɣ Sorting 𝛕 RA Extended RA All operators take in 1 or more relations as inputs and return another relation

Union and Difference R1 ∪ R2 R1 – R2 What do they mean over bags ? Only make sense if R1, R2 have the same schema What do they mean over bags ?

What about Intersection ? Derived operator using minus Derived using join R1 ∩ R2 = R1 – (R1 – R2) R1 ∩ R2 = R1 ⨝ R2

σc(R) Selection Returns all tuples which satisfy a condition Examples σSalary > 40000 (Employee) σname = “Smith” (Employee) The condition c can be =, <, <=, >, >=, <> combined with AND, OR, NOT σc(R) This is SQL’s WHERE clause, not SELECT (sorry)

SSN Name Salary 1234545 John 20000 5423341 Smith 60000 4352342 Fred 50000 Employee σSalary > 40000 (Employee) SSN Name Salary 5423341 Smith 60000 4352342 Fred 50000

π A1,…,An (R) Projection Different semantics over sets or bags! Why? Eliminates columns Example: project social-security number and names: πSSN, Name (Employee)  Answer(SSN, Name) π A1,…,An (R) This is SQL’s SELECT Different semantics over sets or bags! Why?

SSN Name Salary Name Salary Name Salary Employee SSN Name Salary 1234545 John 20000 5423341 60000 4352342 π Name,Salary (Employee) Name Salary John 20000 60000 Name Salary John 20000 60000 Set semantics Bag semantics Which is more efficient?

Composing RA Operators πzip,disease(Patient) Patient no name zip disease 1 p1 98125 flu 2 p2 heart 3 p3 98120 lung 4 p4 zip disease 98125 flu heart 98120 lung σdisease=‘heart’(Patient) πzip,disease(σdisease=‘heart’(Patient)) no name zip disease 2 p2 98125 heart 4 p4 98120 zip disease 98125 heart 98120

Cartesian Product Each tuple in R1 with each tuple in R2 Rare in practice; mainly used to express joins R1 × R2

Cross-Product Example Employee Dependent Name SSN John 999999999 Tony 777777777 EmpSSN DepName 999999999 Emily 777777777 Joe Employee X Dependent Name SSN EmpSSN DepName John 999999999 Emily 777777777 Joe Tony

R1 ⨝ R2 Natural Join Meaning: R1⨝ R2 = PA(sq (R1 × R2)) Where: Selection sq checks equality of all common attributes (i.e., attributes with same names) Projection PA eliminates duplicate common attributes This is another example of rewriting one operator using another Attributes with the same name must have the same type of course

Natural Join Example A B C X Z U V Y W R S A B X Y Z V B C Z U V W PABC(sR.B=S.B(R × S))

Natural Join Example 2 AnonPatient P Voters V age zip disease 54 98125 heart 20 98120 flu name age zip Alice 54 98125 Bob 20 98120 P V age zip disease name 54 98125 heart Alice 20 98120 flu Bob

R1 ⨝q R2 = sq (R1 X R2) Theta Join AnonPatient (age, zip, disease) Voters (name, age, zip) Theta Join A join that involves a predicate Here q can be any condition No projection in this case! For our voters/patients example: R1 ⨝q R2 = sq (R1 X R2) P ⨝ P.zip = V.zip and P.age >= V.age -1 and P.age <= V.age +1 V

R1 ⨝q R2 = sq (R1 × R2) Equijoin A theta join where q is an equality predicate By far the most used variant of join in practice What is the relationship with natural join? R1 ⨝q R2 = sq (R1 × R2)

Equijoin Example AnonPatient P Voters V age zip disease 54 98125 heart 20 98120 flu name age zip p1 54 98125 p2 20 98120 P P.age=V.age V P.age P.zip P.disease V.name V.age V.zip 54 98125 heart p1 20 98120 flu p2

Join Summary Theta-join: R ⨝ q S = σq (R × S) Join of R and S with a join condition θ Cross-product followed by selection θ No projection Equijoin: R ⨝ θ S = σθ (R × S) Join condition θ consists only of equalities Natural join: R ⨝ S = πA (σθ (R × S)) Equality on all fields with same name in R and in S Projection πA drops all redundant attributes

So Which Join Is It ? When we write R ⨝ S we usually mean an equijoin, but we often omit the equality predicate when it is clear from the context

More Joins Outer join Variants Include tuples with no matches in the output Use NULL values for missing attributes Does not eliminate duplicate columns Variants Left outer join Right outer join Full outer join

Outer Join Example AnonPatient P AnnonJob J age zip disease 54 98125 heart 20 98120 flu 33 lung job age zip lawyer 54 98125 cashier 20 98120 P.age P.zip P.disease J.job J.age J.zip 54 98125 heart lawyer 20 98120 flu cashier 33 lung null P ⋊ J Draw out the symbols for the other joins

Some Examples Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,qty,price) Name of supplier of parts with size greater than 10 πsname(Supplier ⨝ Supply ⨝ (σpsize>10 (Part)) Name of supplier of red parts or parts with size greater than 10 πsname(Supplier ⨝ Supply ⨝ (σ psize>10 (Part) ∪ σpcolor=‘red’ (Part) ) ) πsname(Supplier ⨝ Supply ⨝ (σ psize>10 ∨ pcolor=‘red’ (Part) ) ) Can be represented as trees as well

Representing RA Queries as Trees Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,qty,price) πsname(Supplier ⨝ Supply ⨝ (σpsize>10 (Part)) Answer πsname Supplier σpsize>10 Supply Part

Relational Algebra Operators Union ∪, intersection ∩, difference - Selection σ Projection π Cartesian product X, join ⨝ (Rename ρ) Duplicate elimination δ Grouping and aggregation ɣ Sorting 𝛕 RA Extended RA All operators take in 1 or more relations as inputs and return another relation

Extended RA: Operators on Bags Duplicate elimination d Turns bags into sets (no other arguments) Grouping g Takes in relation and a list of grouping operations (e.g., aggregates). Returns a new relation. Can also perform renames at the same time Sorting t Takes in a relation, a list of attributes to sort on, and an order. Returns a new relation. We discussed last lecture the FIVE RA operators: union, difference, selection, projection, join Three more operators that make sense only on bags (even lists). We will talk only about Grouping

Using Extended RA Operators sales(product, city, quantity) g city, sum(quantity)→q, count(*) → c s c > 100 P city, q Answer SELECT city, sum(quantity) FROM sales GROUP BY city HAVING count(*) > 100 T1(city,q,c) T2(city,q,c) Explain on this example what gamma does T1, T2 = temporary tables

Typical Plan for a Query (1/2) Answer SELECT fields FROM R, S, … WHERE condition πfields σselection condition SELECT-PROJECT-JOIN Query join condition How do we translate a general SELECT-FROM-WHERE query? It becomes a sequence of select-project-join operations. … join condition R S

Typical Plan for a Query (1/2) σhaving condition SELECT fields FROM R, S, … WHERE condition GROUP BY fields HAVING condition ɣfields, sum/count/min/max(fields) πfields σwhere condition How do we translate a SELECT FROM WHERE GROUP BY HAVING? join condition … …

How about Subqueries? SELECT Q.sno FROM Supplier Q Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) How about Subqueries? SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’ and not exists (SELECT * FROM Supply P WHERE P.sno = Q.sno and P.price > 100)

How about Subqueries? SELECT Q.sno FROM Supplier Q Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) How about Subqueries? SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’ and not exists (SELECT * FROM Supply P WHERE P.sno = Q.sno and P.price > 100) Correlation !

How about Subqueries? De-Correlation SELECT Q.sno FROM Supplier Q Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) How about Subqueries? De-Correlation SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’ and not exists (SELECT * FROM Supply P WHERE P.sno = Q.sno and P.price > 100) SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’ and Q.sno not in (SELECT P.sno FROM Supply P WHERE P.price > 100)

How about Subqueries? Un-nesting (SELECT Q.sno FROM Supplier Q Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) How about Subqueries? Un-nesting (SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’) EXCEPT (SELECT P.sno FROM Supply P WHERE P.price > 100) SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’ and Q.sno not in (SELECT P.sno FROM Supply P WHERE P.price > 100) EXCEPT = set difference

How about Subqueries? − Finally… (SELECT Q.sno FROM Supplier Q Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) How about Subqueries? Finally… (SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’) EXCEPT (SELECT P.sno FROM Supply P WHERE P.price > 100) − πsno πsno σsstate=‘WA’ σPrice > 100 Supplier Supply

Summary of RA and SQL RDBMS translate SQL  RA, then optimize RA SQL = a declarative language where we say what data we want to retrieve RA = an algebra where we say how we want to retrieve the data Theorem: SQL and RA can express exactly the same class of queries RDBMS translate SQL  RA, then optimize RA

Summary of RA and SQL SQL (and RA) cannot express ALL queries that we could write in, say, Java Example: Parent(p,c): find all descendants of ‘Alice’ No RA query can compute this! This is called a recursive query Next lecture: Datalog is an extension that can compute recursive queries