CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Implementation of relational operations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Optimization Goal: Declarative SQL query
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
1 Relational Query Optimization Module 5, Lecture 2.
Relational Query Optimization 198:541. Overview of Query Optimization  Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
Evaluation of Relational Operators 198:541. Relational Operations  We will consider how to implement: Selection ( ) Selects a subset of rows from relation.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
Overview of Query Evaluation R&G Chapter 12 Lecture 13.
Query Optimization: Transformations May 29 th, 2002.
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
CS186 Final Review Query Optimization.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Overview of Query Optimization v Plan : Tree of R.A. ops, with choice of alg for each op. –Each operator typically implemented using a `pull’ interface:
Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:
1 Implementation of Relational Operations: Joins.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Lec3/Database Systems/COMP4910/031 Evaluation of Relational Operations Chapter 14.
Database Systems/comp4910/spring20031 Evaluation of Relational Operations Why does a DBMS implements several algorithms for each algebra operation? What.
1 Database Systems ( 資料庫系統 ) December 3, 2008 Lecture #10.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Implementation of Database Systems, Jarek Gryz1 Relational Query Optimization Chapters 12.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Relational Operator Evaluation. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)
Database Applications (15-415) DBMS Internals- Part VIII Lecture 17, Oct 30, 2016 Mohammad Hammoud.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Query Optimization
Evaluation of Relational Operations: Other Operations
CS222: Principles of Data Management Notes #13 Set operations, Aggregation Instructor: Chen Li.
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
Relational Operations
Implementing Relational Operators Query Evaluation
CS222P: Principles of Data Management Notes #11 Selection, Projection
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Overview of Query Evaluation
Relational Query Optimization
Overview of Query Evaluation
Overview of Query Evaluation
Overview of Query Evaluation
Implementation of Relational Operations
Relational Query Optimization
CS222: Principles of Data Management Notes #11 Selection, Projection
Evaluation of Relational Operations: Other Techniques
Database Systems (資料庫系統)
Overview of Query Evaluation
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #12 Set operations, Aggregation, Cost Estimation Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #10 Selection, Projection Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall 2018 Notes #12 Set operations, Aggregation, Cost Estimation Instructor: Chen Li.
Relational Algebra Chpt 4a Xintao Wu Raghu Ramakrishnan
Presentation transcript:

CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li

Relational Set Operations Intersection and cross-product special cases of join. Union (distinct) and Except similar; we’ll do union. Sorting-based approach to union: Sort both relations (on combination of all attributes). Scan sorted relations and merge them. Alternative: Merge runs from Pass 0+ for both relations (!). Hash-based approach to union (from Grace): Partition both R and S using hash function h1. For each S-partition, build in-memory hash table (using h2), then scan corresponding R-partition, adding truly new S tuples to hash table while discarding duplicates. 19

Sets versus Bags UNION, INTERSECT, and DIFFERENCE (EXCEPT or MINUS) use the set semantics. UNION ALL just combine two relations without considering any correlation. 19

Sets versus Bags Example: R = {1, 1, 1, 2, 2, 2, 2, 3} S = {1, 2, 2} R UNION S = {1, 2, 3} R UNION ALL S = {1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3} R INTERSECT S = {1, 2} R EXCEPT S = {3} Notice that some DBs such as MySQL don't support some of these operations. 19

Aggregate Operations (AVG, MIN, ...) Without grouping: In general, requires scanning the full relation. Given an index whose search key includes all attributes in the SELECT or WHERE clauses, can do an index-only scan. With grouping: Sort on the group-by attributes, then scan sorted result and compute the aggregate for each group. (Can improve on this by combining sorting and aggregate computations.) Or, similar approach via hashing on the group-by attributes. Given tree index whose search key includes all attributes in SELECT, WHERE and GROUP BY clauses, can do an index-only scan; if group-by attributes form prefix of search key, can retrieve data entries (tuples) in the group-by order. 20

Aggregation State Handling State: init( ), next(value), finalize( )  agg. value Consider the following query SELECT E.dname, avg(E.sal) FROM Emp E GROUP BY E.dname (dname count sum) Emp Toy 2 35K ename sal dname h(dname) mod 4 Joe 10K Toy Shoe 2 33K h Sue 20K Shoe Mike 13K Shoe Chris 25K Toy Aggregate state (per unfinished group): Count: # values Min: min value Max: max value Sum: sum of values Avg: count, sum of values Zoe 50K Book … … … 3

Operator Buffering Considerations If several operations are executing concurrently, estimating the “right” number of buffer pages is tough. (Intra- vs. inter-query considerations, too!) Repeated access patterns can interact with buffer pool’s page replacement policy as well. e.g., Inner relation scanned repeatedly in Simple Nested Loop Join. With enough buffer pages to hold the entire inner, replacement policy does not matter. Otherwise MRU best, LRU worst (sequential flooding)! Memory management for multi-user DBMSs is still a challenging problem (too many “knobs” today). 21

Next: Costs of Query Plans

Component Roles Query Parser Query optimizer (often w/2 steps) Parse and analyze SQL query Produce data structure capturing SQL statement and the “objects” that it refers to in the system catalogs Query optimizer (often w/2 steps) Rewrite query logically (e.g., views/unnesting) Perform cost-based optimization (a la System R) Goal is to pick a “good” query plan considering Physical table structures Available access paths (indexes) Data statistics (if known) Cost model (for relational operations) (Cost differences can be orders of magnitude!)

Component Roles (cont.) Plan Executor + Relational Operators Runtime side of query processing Usually based on a pull-based “tree of iterators” model, e.g.: Nodes are relational operators (actually physical implementations of relational operators/combos) (1) getNextTuple( ) (2) (4) (3)

A Motivating Example Algebra Tree: R.bid=100 AND S.rating>5 Reserves Sailors sid=sid bid=100 rating > 5 sname A Motivating Example SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Cost: 500+500*1000 = 500,500 I/Os (Not the worst possible plan! ) Misses several opportunities: selections could have been `pushed’ earlier, no use is made of any available indexes, etc. Goal of optimization: To consider more efficient plans that compute the same logical answer. Sailors Reserves sid=sid bid=100 rating > 5 sname (Block Nested Loops) (On-the-fly) Plan: 4

Alternative Plan 1 (No Indexes) Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) temp T2) (Sort-Merge Join) Alternative Plan 1 (No Indexes) Main difference: push selects. With 5 buffers, cost of plan: Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution), with cost = 1010. Scan Sailors (500) + write temp T2 (250 pages, if 10 ratings), cost = 750. Join by sort T1 (2*2*10 = 40), sort T2 (2*4*250 = 2000), merge (10+250). Total: 4060 page I/Os. If we use block NL join, join cost = 10+10*250, total cost = 4270. If we “push” projections, T1 has only sid, T2 only sid and sname: T1 now fits in 3 pages, so block NL cost drops lower, … (etc.) 5

Alternative Plan 2 (Using Indexes) sname (On-the-fly) Alternative Plan 2 (Using Indexes) rating > 5 (On-the-fly) With clustered index on Reserves.bid , 100 boats: 100,000/100 = 1000 tuples on 1000/100 = 10 pages. INL join with pipelining (outer not materialized). (Index Nested Loops, sid=sid with pipelining ) (Use hash index; do bid=100 Sailors not write result to temp) Reserves (Projecting out unnecessary fields from outer doesn’t help.) Join column sid is a key for Sailors. At most one matching Sailors tuple, so unclustered index on sid is OK. Decision not to push rating>5 before the join based on the availability of the sid index on Sailors. Cost: Selection of Reserves tuples (10 I/Os); then, for each one, must get matching Sailors tuple (1000*1.2); total = 1210 I/Os. 6