Relational Algebra Friday, 11/14/2003.

Slides:



Advertisements
Similar presentations
Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014.
Advertisements

Lecture 07: Relational Algebra
1 Relational Algebra. Motivation Write a Java program Translate it into a program in assembly language Execute the assembly language program As a rough.
CS 540 Database Management Systems
Lecture 13: Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data.
Query Optimization Goal: Declarative SQL query
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
1 Lecture 07: Relational Algebra. 2 Outline Relational Algebra (Section 6.1)
Relational Schema Design (end) Relational Algebra Finally, querying the database!
One More Normal Form Consider the dependencies: Product Company Company, State Product Is it in BCNF?
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
1 Introduction to Database Systems CSE 444 Lecture 20: Query Execution: Relational Algebra May 21, 2008.
Transactions, Relational Algebra, XML February 11 th, 2004.
CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.
Relational Algebra 2. Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declartive query language.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
CS4432: Database Systems II Query Processing- Part 2.
Lecture 13: Relational Decomposition and Relational Algebra February 5 th, 2003.
The Relational Model Lecture 16. Today’s Lecture 1.The Relational Model & Relational Algebra 2.Relational Algebra Pt. II [Optional: may skip] 2 Lecture.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Hash Tables and Query Execution March 1st, Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
1 CSE544: Lecture 7 XQuery, Relational Algebra Monday, 4/22/02.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
CS4432: Database Systems II Query Processing- Part 1 1.
Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins
Lecture 14: Relational Algebra Projects XML?
Relational Algebra.
CS 540 Database Management Systems
COP4710 Database Systems Relational Algebra.
CS 440 Database Management Systems
Database Management System
Relational Algebra at a Glance
Lecture 8: Relational Algebra
Chapter 12: Query Processing
Relational Algebra Chapter 4, Part A
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Examples of Physical Query Plan Alternatives
File Processing : Query Processing
Lecture 18 The Relational Model.
April 6th – relational algebra
Lecture 4: Updates, Views, Constraints and Other Fun Features
RDBMS RELATIONAL DATABASE MANAGEMENT SYSTEM.
Lecture 2- Query Processing (continued)
Lecture 33: The Relational Model 2
Lecture 25: Query Execution
Issues in Indexing Multi-dimensional indexing:
Lecture 24: Query Execution
Lecture 13: Query Execution
CS639: Data Management for Data Science
Lecture 23: Query Execution
Evaluation of Relational Operations: Other Techniques
Lecture 22: Query Execution
Lecture 3: Relational Algebra and SQL
Relational Schema Design (end) Relational Algebra SQL (maybe)
Syllabus Introduction Website Management Systems
Lecture 11: B+ Trees and Query Execution
Database Administration
CSE 544: Query Execution Wednesday, 5/12/2004.
Lecture 11: Functional Dependencies
Lecture 20: Query Execution
Presentation transcript:

Relational Algebra Friday, 11/14/2003

Outline Relational Algebra: 5.2, 5.3, 5.4 Reading assignment: Extensible hash tables: 13.4.5, 13.4.6 Linear hash tables: 13.4.7, 13.4.8

Some History The Relational Data Model Invented by Ted Codd, circa 1970 Main idea: Data = mathematical relations Queries = mathematical formulas (“abstract SQL”) Elegant, clean But practical ?

History Continued State of the art in databases in 1970s: The COBOL/CODASYL approach Data = a file of records, with pointers No queries, only pointer navigation Very efficient But hard to write queries/programs

More History For the relational model to prove itself it needed to show how to implement mathematical “queries” efficiently. Three ideas: Intermediate language = relational algebra Efficient implementation of each operator Optimization First proof of concept: System R (1979)

Relational Algebra Operates on relations, i.e. sets Five operators: Later: we discuss how to extend this to bags Five operators: Union:  Difference: - Selection: s Projection: P Cartesian Product:  Derived or auxiliary operators: Intersection, complement Joins (natural,equi-join, theta join, semi-join) Renaming: r

1. Union and 2. Difference R1  R2 Example: R1 – R2 ActiveEmployees  RetiredEmployees R1 – R2 AllEmployees – RetiredEmployees

What about Intersection ? It is a derived operator R1  R2 = R1 – (R1 – R2) Also expressed as a join (will see later) Example UnionizedEmployees  RetiredEmployees

3. Selection Returns all tuples which satisfy a condition Notation: sc(R) Examples sSalary > 40000 (Employee) sname = “Smithh” (Employee) The condition c can be =, <, , >, , <>

Find all employees with salary more than $40,000. s Salary > 40000 (Employee)

4. Projection Eliminates columns, then removes duplicates Notation: P A1,…,An (R) Example: project social-security number and names: P SSN, Name (Employee) Output schema: Answer(SSN, Name)

P SSN, Name (Employee)

5. Cartesian Product Each tuple in R1 with each tuple in R2 Notation: R1  R2 Example: Employee  Dependents Very rare in practice; mainly used to express joins

Renaming Changes the schema, not the instance Notation: r B1,…,Bn (R) Example: rLastName, SocSocNo (Employee) Output schema: Answer(LastName, SocSocNo)

LastName, SocSocNo (Employee) Renaming Example Employee Name SSN John 999999999 Tony 777777777 LastName, SocSocNo (Employee) LastName SocSocNo John 999999999 Tony 777777777

Natural Join Notation: R1 ⋈ R2 Meaning: R1 ⋈ R2 = PA(sC(R1  R2)) Where: The selection sC checks equality of all common attributes The projection eliminates the duplicate common attributes

Natural Join Example Employee Name SSN John 999999999 Tony 777777777 Dependents SSN Dname 999999999 Emily 777777777 Joe Employee Dependents = PName, SSN, Dname(s SSN=SSN2(Employee x rSSN2, Dname(Dependents)) Name SSN Dname John 999999999 Emily Tony 777777777 Joe

Natural Join R= S= R ⋈ S= A B X Y Z V B C Z U V W A B C X Z U V Y W

Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? Given R(A, B, C), S(D, E), what is R ⋈ S ? Given R(A, B), S(A, B), what is R ⋈ S ?

Theta Join A join that involves a predicate R1 ⋈ q R2 = s q (R1  R2) Here q can be any condition: =, <, , , > 

Eq-join A theta join where q is an equality R1 ⋈A=B R2 = s A=B (R1  R2) Example: Employee ⋈SSN=SSN Dependents Most useful join in practice

Semijoin R ⋉ S = P A1,…,An (R ⋈ S) Where A1, …, An are the attributes in R Example: Employee ⋉ Dependents

Semijoins in Distributed Databases Semijoins are used in distributed databases Dependents Employee SSN Dname Age . . . SSN Name . . . network Employee ⋈ssn=ssn (s age>71 (Dependents)) T = P SSN s age>71 (Dependents) R = Employee ⋉ T Answer = R ⋈ Dependents

Complex RA Expressions P name buyer-ssn=ssn pid=pid seller-ssn=ssn P ssn P pid sname=fred sname=gizmo Person Purchase Person Product

Operations on Bags A bag = a set with repeated elements Relational Engines work on bags, not sets ! All operations need to be defined carefully on bags {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d} sC(R): preserve the number of occurrences PA(R): no duplicate elimination Cartesian product, join: no duplicate elimination

Logical Operators in the Bag Algebra Union, intersection, difference Selection s Projection P Join ⋈ Duplicate elimination d Grouping g Sorting t Relational Algebra (on bags)

Example SELECT city, count(*) FROM sales P city, c GROUP BY city HAVING sum(price) > 100 P city, c s p > 100 T(city,p,c) g city, sum(price)→p, count(*) → c sales

Physical Operators Query Plan: logical tree SELECT S.buyer FROM Purchase P, Person Q WHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’ Purchase Person Buyer=name City=‘seattle’ phone>’5430000’ buyer (Simple Nested Loops)  (Table scan) (Index scan) Query Plan: logical tree implementation choice at every node scheduling of operations. Some operators are from relational algebra, and others (e.g., scan) are not.

Architecture of a Database Engine SQL query Parse Query Logical plan Select Logical Plan Query optimization Select Physical Plan Physical plan Query Execution

Question in Class Logical operator: Product(pname, cname) ⋈ Company(cname, city) Propose three physical operators for the join, assuming the tables are in main memory:

Question in Class Product(pname, cname) ⋈ Company(cname, city) 1000000 products 1000 companies How much time do the following physical operators take if the data is in main memory ? Nested loop join time = Sort and merge = merge-join time = Hash join time =

Cost Parameters In database systems the data is on disks, not in main memory The cost of an operation = total number of I/Os Cost parameters: B(R) = number of blocks for relation R T(R) = number of tuples in relation R V(R, a) = number of distinct values of attribute a

Cost Parameters Clustered table R: Unclustered table R: Blocks consists only of records from this table B(R)  T(R) / blockSize Unclustered table R: Its records are placed on blocks with other tables When R is unclustered: B(R)  T(R) When a is a key, V(R,a) = T(R) When a is not a key, V(R,a)

Cost Cost of an operation = number of disk I/Os needed to: read the operands compute the result Cost of writing the result to disk is not included on the following slides Question: the cost of sorting a table with B blocks ? Answer:

Scanning Tables The table is clustered: The table is unclustered Table-scan: if we know where the blocks are Index scan: if we have a sparse index to find the blocks The table is unclustered May need one read for each record

Sorting While Scanning Sometimes it is useful to have the output sorted Three ways to scan it sorted: If there is a primary or secondary index on it, use it during scan If it fits in memory, sort there If not, use multi-way merge sort

Cost of the Scan Operator Clustered relation: Table scan: Unsorted: B(R) Sorted: 3B(R) Index scan Unsorted: B(R) Sorted: B(R) or 3B(R) Unclustered relation Unsorted: T(R) Sorted: T(R) + 2B(R)