Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]

Slides:



Advertisements
Similar presentations
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Advertisements

15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Query Execution Since our SQL queries are very high level the query processor does a lot of processing to supply all the details. An SQL query is translated.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
15.3 Nested-Loop Joins By: Saloni Tamotia (215). Introduction to Nested-Loop Joins  Used for relations of any side.  Not necessary that relation fits.
Lecture 24: Query Execution Monday, November 20, 2000.
1 Query Processing Two-Pass Algorithms Source: our textbook.
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
15.7 BUFFER MANAGEMENT Buffer Management Architecture The buffer manager controls main memory directly, as in many relational DBMS’s The buffer.
Query Processing and Optimization. Query Processing Efficient Query Processing crucial for good or even effective operations of a database Query Processing.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
Chapter 15.7 Buffer Management ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
CPS216: Advanced Database Systems Notes 06:Query Execution (Sort and Join operators) Shivnath Babu.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
DBMS 2001Notes 5: Query Processing1 Principles of Database Management Systems 5: Query Processing Pekka Kilpeläinen (partially based on Stanford CS245.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Lecture 24 Query Execution Monday, November 28, 2005.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CPS216: Advanced Database Systems Notes 07:Query Execution (Sort and Join operators) Shivnath Babu.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
CS 540 Database Management Systems
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Database Management System
Chapter 15 QUERY EXECUTION.
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Lecture 2- Query Processing (continued)
One-Pass Algorithms for Database Operations (15.2)
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Lecture 23: Query Execution
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Evaluation of Relational Operations: Other Techniques
Lecture 22: Query Execution
Lecture 11: B+ Trees and Query Execution
Lecture 20: Query Execution
Presentation transcript:

Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]

- Dr. Kalpakis CMSC Dr. Kalpakis 2 Query processing Query processing involves compilation parsing to construct parse tree optimization Query rewrite to generate a logical query plan Physical plan generation to make a physical query plan Execution Query plans are expression trees whose nodes are operators Extended relational algebra operators for logical query plans Physical operators for physical query plans Implement extended relational algebra operators Additional useful tasks

- Dr. Kalpakis CMSC Dr. Kalpakis 3 Types of operators Physical operators classified according to #passes over the tuples of their input One-pass Pass-and-a-half Two-pass Multi-pass Their cardinality and #input tuples required at once Unary Tuple-at-a-time Select, project full-relation Grouping/aggregates, duplicate elimination Binary Join, union, difference, intersection

- Dr. Kalpakis CMSC Dr. Kalpakis 4 Physical operators as iterators Physical operators are viewed as iterators, with methods Open() Initializes any structures for the operator Next() Returns the next tuple in the stream of output tuples Close() Destroys any structures created The iterator model allows to non-materialize the output of operators, unless necessary or useful It allows for complete materialization by doing all the work in the Open() method Some tasks fit naturally into the iterator model, some need tricks Sorting (multi-way merge-sort) does most of the work in the open method

- Dr. Kalpakis CMSC Dr. Kalpakis 5 Operator model Measure cost of operator in terms of block I/Os performed Assume that the output of each operator is immediately consumed by some other operator Cost of operator uses parameters Available #memory buffers (blocks) M For each argument (bag or set of tuples) R The #blocks B(R) The #tuples T(R) The #distinct values V(R,X) of the tuples of R on the attribute-list X To estimate cost of an operator need to know whether R is clustered, i.e. whether occupies about B(R) blocks or not A used index on R is a clustering index, i.e. tuples with same value on the indexed attributes occupy as few blocks as possible Safe to assume that output relations of operators are clustered! often times input relations will also be assumed clustered as well!!

- Dr. Kalpakis CMSC Dr. Kalpakis 6 Operators for scanning relations Read the contents of relation R Table-scan reads the tuples of R from disk, using a system map of its blocks Index-scan reads the tuples using an index to locate the blocks that contain its tuples Sort-scan Return its tuples in a sorted order What is the cost of each of these operators? Remember that we can

- Dr. Kalpakis CMSC Dr. Kalpakis 7 One-pass tuple-at-a-time unary operators Obvious algorithms for Project Select Have cost of B(R) regardless of M

- Dr. Kalpakis CMSC Dr. Kalpakis 8 One-pass full-relation unary operators Duplicate elimination Similarly for grouping/aggegation Keep sufficient data for each distinct group and output aggregate value for each group at Close() Assumes info for #groups fits M-1 buffers

- Dr. Kalpakis CMSC Dr. Kalpakis 9 One-pass binary operators Given relations R and S, with S being the smaller of the two Assume S fits in memory (M-1) buffers Bag union Read each tuple from S, output it; repeat for R Set union Algorithm 1. Read each tuple of S and build an in-memory dictionary for S 2. Output all of S 3. Read each tuple t of R and output it if t is not in the dictionary for S Set intersection, difference are similar Care is needed when computing set difference R-S S-R

- Dr. Kalpakis CMSC Dr. Kalpakis 10 One-pass binary operators Bag intersection Each tuple appears in the result a number of times equal to the minimum number of its occurrences in either input relation Algorithm Modify the algorithm for set union to Build dictionary for the tuples of S maintain the #occurrences of each tuple of S Read each tuple t of R and If t is in the dictionary, output t and decrement its count Remove t from the dictionary if its count is 0 Bag difference, Cartesian product, Natural join are similar The cost of all these operators is B(S)+B(R) What happens if M is “wrong”? Thrashing or unnecessary passes

- Dr. Kalpakis CMSC Dr. Kalpakis 11 Nested-Loop Join Natural join R(X,Y) S(Y,Z) Tuple-based nested-loop join for each tuple t s in S do begin for each tuple t R in R do begin if t s and t R join to make t output t S is called the outer relation and R the inner relation of the join Cost is T(R)T(S) Expensive since it examines every pair of tuples in the two relations Fits the iterator framework easily reopen the scan of R for each tuple of S! Requires no indices and can be used with any kind of join condition

- Dr. Kalpakis CMSC Dr. Kalpakis 12 Block-based nested-loop join Read a block instead of a single tuple at a time Join all tuples in the pair of blocks at hand If S fits in M-1 buffers, make it the inner relation Cost now is B(R)+B(S) If neither S nor R fit in memory and the S (the smallest) is the outer relation then the cost is

- Dr. Kalpakis CMSC Dr. Kalpakis 13 Two-pass algorithms Input relations may be too large for the one-pass algorithms to handle Consider two-pass algorithms Multi-pass algorithms can be obtained easily by induction/recursion from two-pass algorithms Focus on two-pass algorithms that are based on Sorting Hashing Indexing

- Dr. Kalpakis CMSC Dr. Kalpakis 14 Two-pass duplicate elimination using sorting Algorithm Partition R into at most M sublists each of size at most M, and write to disk each sorted sublist Read one block from each sorted sublist For each tuple t still in memory do Output t and then delete all its occurrences from any blocks in memory If a block becomes empty load it with another block from the same sorted sublist Delete t from each newly loaded block as well Cost is 2B(R) +B(R) = 3B(R)

- Dr. Kalpakis CMSC Dr. Kalpakis 15 Two-pass grouping and aggregation using sorting Similar to the duplicate elimination algorithm Sort based on the grouping attributes X only Compute aggregate value for each distinct value of X, in sorted order Instead of deleting all other occurrences of a tuple t as in duplicate elimination, update the aggregate information for the group t[X]

- Dr. Kalpakis CMSC Dr. Kalpakis 16 Two-pass set-union using sorting Two-pass set union algorithm using sorting Make sorted sublists for both R and S Use a buffer for each sorted sublist of either R or S Load each buffer with the 1 st block of its sublist Repeatedly, for each tuple t in memory Output t Delete t from the memory buffers If a buffer empties reloaded with the next block from the same sublist delete all occurrences of t from each newly loaded buffer as well

- Dr. Kalpakis CMSC Dr. Kalpakis 17 Two-pass intersection and difference using sorting Modify when t is output in the 2-pass sort-based set-union algorithm Set-intersection: output when t appears in both R and S Set-difference R-S: output when t appears in R but not in S Bag-intersection/difference keep track of #occurrences g(t,R) of t in each relation R while making the sorted sublists For intersection, output t min(g(t,S), g(t,R)) times For difference R-S, output t max(g(t,R)-g(t,S),0) times

- Dr. Kalpakis CMSC Dr. Kalpakis 18 Simple sort-based join Algorithm Sort R and S Load two input buffers with the 1 st block of R and S for each least value y of the join attributes Y in memory Find all tuples from R and S with value y on Y, and output their join use all remaining memory to read qualifying tuples from R and S Delete all tuples with value y on attributes Y from memory Reload any of the two input buffers with the next block from its corresponding relation Algorithm is a good choice when there are a lot a joinable tuples for each value y

- Dr. Kalpakis CMSC Dr. Kalpakis 19 More efficient sort-based join Previous algorithm can be modified by Writing the sorted sublists to disk instead of the sorted relations Allocating one buffer to each sorted sublist Advantage Disadvantage the algorithm is bad when there are very large number of tuples with a common value for the join attributes

- Dr. Kalpakis CMSC Dr. Kalpakis 20 Two-pass algorithms using hashing Data does not fit in memory Partition data into M buckets using a hash function so that A bucket or a pair of buckets with the same hash, one from each relation, fits in memory Perform the required operation in one-pass on a single bucket for unary operators On a pair of buckets for binary operators Duplicate elimination, set union, difference, intersection, and join can all be done this way

- Dr. Kalpakis CMSC Dr. Kalpakis 21 Index-based selection Equality-selection can be evaluated using an index (index-scan) Cost for Clustering index is Non-clustering index is Where h is the #block I/Os to access the index Typically h is <5 for either B-tree or hash indexes Additional selection predicates can be evaluated using certain indexes

- Dr. Kalpakis CMSC Dr. Kalpakis 22 Index-based join Use index to access tuples of the inner relation in tuple-based nested-loop join Cost is approximately T(R)B(S)/V(S,Y) or T(R)T(S)/V(S,Y) depending on whether index is clustering or not If index allows for sorted access to the tuples of a relation, eg B-tree index Use the 2 nd phase (“merge phase”) of sorted-based join algorithms Since relation(s) are already accessible in sorted order Cost is B(R)+B(S) if both the indexes are clustering Other operations (duplicate elimination, set difference/union/intersection, grouping and aggregation) can be done similarly

- Dr. Kalpakis CMSC Dr. Kalpakis 23 Buffer management Physical operators use some #buffers M The DBMS buffer manager allocates available buffers to operators/queries from a buffer pool in physical memory directly Virtual memory managed by the Operating System Buffer pool is limited so careful management is needed to avoid Thrashing Unnecessary performance degradation Buffer replacement strategies LRU FIFO MRU Clock System control – pinned blocks

- Dr. Kalpakis CMSC Dr. Kalpakis 24 Buffer manager & physical operators Considering the algorithm of a physical operator How does it respond to changes in the number M of available buffers? when M is not what it was assumed initially? changes during the execution of the operator? More buffers become available A used buffer is taken away from the operator by the buffer manager How can the operator affect the decisions of the buffer manager? Can it suggest a replacement strategy? a range of buffers needed for good performance? A function to forecast its performance for any parameter value M? Pin blocks?

- Dr. Kalpakis CMSC Dr. Kalpakis 25 Multi-pass algorithms The two-pass algorithms discussed generalize to k-passes using recursion/induction Sort-based algorithms Hashing-based algorithms

- Dr. Kalpakis CMSC Dr. Kalpakis 26 Models of parallel machines Parallel machines with p processors can be

- Dr. Kalpakis CMSC Dr. Kalpakis 27 Parallelism for physical operators Idea is to distribute the input relations to processors so that each one gets about 1/p th of the input Each processor works on its given data Ensure the communication overhead and synchronization barriers are under control sending a block over the network is typically faster than a block I/O

- Dr. Kalpakis CMSC Dr. Kalpakis 28 Parallel sort-based join Partition and ship data to processors Total #blocks shipped #block I/Os per processor Each processor Stores the tuples it receives at cost Applies sort-based join on its fragment of the data Total #block I/Os per processor A speedup by a factor of about p!!