CS222: Principles of Data Management Notes #13 Set operations, Aggregation Instructor: Chen Li.

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Implementation of relational operations
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Implementation of Relational Operations CS186, Fall 2005 R&G - Chapter 14 First comes thought; then organization of that thought, into ideas and plans;
DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Evaluation of Relational Operators 198:541. Relational Operations  We will consider how to implement: Selection ( ) Selects a subset of rows from relation.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Implementation of Relational Operations: Joins.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Lec3/Database Systems/COMP4910/031 Evaluation of Relational Operations Chapter 14.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
CS4432: Database Systems II Query Processing- Part 2.
1 Database Systems ( 資料庫系統 ) December 19, 2004 Lecture #12 By Hao-hua Chu ( 朱浩華 )
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
Implementation of Relational Operations R&G - Chapter 14 First comes thought; then organization of that thought, into ideas and plans; then transformation.
Database Management Systems 1 Raghu Ramakrishnan Evaluation of Relational Operations Chpt 14.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Fan Qi Database Lab 1, com1 #01-08 CS3223 Tutorial 8.
SQL and Query Execution for Aggregation. Example Instances Reserves Sailors Boats.
Fan Qi Database Lab 1, com1 #01-08 CS3223 Tutorial 5.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Database Applications (15-415) DBMS Internals- Part VIII Lecture 17, Oct 30, 2016 Mohammad Hammoud.
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Chapter 12: Query Processing
Lecture 16: Relational Operators
Evaluation of Relational Operations: Other Operations
Relational Operations
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
CS222: Principles of Data Management Notes #09 Indexing Performance
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Database Query Execution
Query Processing.
Introduction to Database Systems
Lecture 2- Query Processing (continued)
One-Pass Algorithms for Database Operations (15.2)
Overview of Query Evaluation
Implementation of Relational Operations
Lecture 13: Query Execution
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Database Administration
Yan Huang - CSCI5330 Database Implementation – Query Processing
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #12 Set operations, Aggregation, Cost Estimation Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #08 Comparisons of Indexes and Indexing Performance Instructor: Chen Li.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall 2018 Notes #11 Join!
CS222P: Principles of Data Management UCI, Fall 2018 Notes #12 Set operations, Aggregation, Cost Estimation Instructor: Chen Li.
Presentation transcript:

CS222: Principles of Data Management Notes #13 Set operations, Aggregation Instructor: Chen Li

Relational Set Operations Intersection and cross-product special cases of join. Union (distinct) and Except similar; we’ll do union. Sorting-based approach to union: Sort both relations (on combination of all attributes). Scan sorted relations and merge them. Alternative: Merge runs from Pass 0+ for both relations (!). Hash-based approach to union (from Grace): Partition both R and S using hash function h1. For each S-partition, build in-memory hash table (using h2), then scan corresponding R-partition, adding truly new S tuples to hash table while discarding duplicates. 19

Sets versus Bags UNION, INTERSECT, and DIFFERENCE (EXCEPT or MINUS) use the set semantics. UNION ALL just combine two relations without considering any correlation. 19

Sets versus Bags Example: R = {1, 1, 1, 2, 2, 2, 2, 3} S = {1, 2, 2} R UNION S = {1, 2, 3} R UNION ALL S = {1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3} R INTERSECT S = {1, 2} R EXCEPT S = {3} Notice that some DBs such as MySQL don't support some of these operations. 19

Aggregate Operations (AVG, MIN, ...) Without grouping: In general, requires scanning the full relation. Given an index whose search key includes all attributes in the SELECT or WHERE clauses, can do an index-only scan. With grouping: Sort on the group-by attributes, then scan sorted result and compute the aggregate for each group. (Can improve on this by combining sorting and aggregate computations.) Or, similar approach via hashing on the group-by attributes. Given tree index whose search key includes all attributes in SELECT, WHERE and GROUP BY clauses, can do an index-only scan; if group-by attributes form prefix of search key, can retrieve data entries (tuples) in the group-by order. 20

Aggregation State Handling State: init( ), next(value), finalize( )  agg. value Consider the following query SELECT E.dname, avg(E.sal) FROM Emp E GROUP BY E.dname (dname count sum) Emp Toy 2 35K ename sal dname h(dname) mod 4 Joe 10K Toy Shoe 2 33K h Sue 20K Shoe Mike 13K Shoe Chris 25K Toy Aggregate state (per unfinished group): Count: # values Min: min value Max: max value Sum: sum of values Avg: count, sum of values Zoe 50K Book … … … 3

Operator Buffering Considerations If several operations are executing concurrently, estimating the “right” number of buffer pages is tough. (Intra- vs. inter-query considerations, too!) Repeated access patterns can interact with buffer pool’s page replacement policy as well. e.g., Inner relation scanned repeatedly in Simple Nested Loop Join. With enough buffer pages to hold the entire inner, replacement policy does not matter. Otherwise MRU best, LRU worst (sequential flooding)! Memory management for multi-user DBMSs is still a challenging problem (too many “knobs” today). 21