Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each.

Slides:



Advertisements
Similar presentations
Choosing an Order for Joins
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
1 Lecture 23: Query Execution Friday, March 4, 2005.
Join Processing in Databases Systems with Large Main Memories
15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
Lecture 24: Query Execution Monday, November 20, 2000.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Parallel Algorithms for Relational Operations. Models of Parallelism There is a collection of processors. –Often the number of processors p is large,
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Choosing an Order for Joins (16.6) Neha Saxena (214) Instructor: T.Y.Lin.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
CSCE Database Systems Chapter 15: Query Execution 1.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
DBMS 2001Notes 5: Query Processing1 Principles of Database Management Systems 5: Query Processing Pekka Kilpeläinen (partially based on Stanford CS245.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Lecture 24 Query Execution Monday, November 28, 2005.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CSCE Database Systems Chapter 15: Query Execution 1.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Completing the Physical- Query-Plan and Chapter 16 Summary ( ) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
CS 540 Database Management Systems
1 Lecture 23: Query Execution Monday, November 26, 2001.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
15.1 – Introduction to physical-Query-plan operators
Query Processing Exercise Session 4.
Chapter 15 QUERY EXECUTION.
Database Systems Ch Michael Symonds
Query Execution Two-pass Algorithms based on Hashing
(Two-Pass Algorithms)
Chapters 15 and 16b: Query Optimization
One-Pass Algorithms for Database Operations (15.2)
Chapter 12 Query Processing (1)
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Yan Huang - CSCI5330 Database Implementation – Query Processing
Completing the Physical-Query-Plan and Chapter 16 Summary ( )
Presentation transcript:

Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each level of the tree. BASIS: Start with the pair of relations whose estimated join size is smallest. The join of these relations becomes the current tree. INDUCTION: Find, among all those relations not yet included in the current tree, the relation that, when joined with the current tree, yields the relation of smallest estimated size. The new current tree has the old current tree as its left argument and the selected relation as its right argument.

Example The basis step is to find the pair of relations that have the smallest join. This honor goes to the join T  U, with a cost of Thus, T  U is the "current tree." We now consider whether to join R or S into the tree next. We compare the sizes of (T  U)  R and (T  U)  S. The latter, with a size of 2000 is better than the former, with a size of 10,000. Thus, we pick as the new current tree (T  U)  S. Now there is no choice; we must join R at the last step, leaving us with a total cost of 3000, the sum of the sizes of the two intermediate relations.

Pipelining Versus Materialization Materialization: The naive way to execute a query plan is to order the operations appropriately: –An operation is not performed until the argument(s) below it have been performed, and –Store the result of each operation on disk until it is needed by another operation. This strategy is Pipelining: A more subtle, and generally more efficient, way to execute a query plan is to interleave the execution of several operations. –The tuples produced by one operation are passed directly to the operation that uses it, without ever storing the intermediate tuples on disk. –Typically is implemented by a network of iterators.

Pipelining Example (I) Let us consider physical query plans for the expression: (R(w,x)  S(x,y))  U(y,z) Assumptions: 1.R occupies 5000 blocks; S and U each occupy 10,000 blocks. 2.The intermediate result R  S occupies k blocks for some k. We can estimate k, based on the number of x-values in R and S and the size of (w,x,y) tuples compared to the (w,x) tuples of R and the (x,y) tuples of S. However, we want to see what happens as k varies, so we leave this constant open. 3.Both joins will be implemented as hash-joins, either one-pass or two-pass, depending on k. 4.There are 101 buffers available.

Example (II)

Example (III) First, consider the join R  S. Neither relation fits in main memory, so we need a two-pass hash-join. If the smaller relation R is partitioned into the maximum-possible 100 buckets on the first pass, then each bucket for R occupies 50 blocks. If R's buckets have 50 blocks, then the second pass of the hash-join R  S uses 51 buffers, leaving 50 buffers to use for the join of the result of R  S with U. Now,suppose that k  49; that is, the result of R  S occupies at most 49 blocks. –Then we can pipeline the result of R  S into 49 buffers, organize them for lookup as a hash table, and we have one buffer left to read each block of T in turn. –We may thus execute the second join as a one-pass join. The total number of disk I/O's is: –45,000 to perform the two-pass hash join of R and S. –10,000 to read U in the one-pass hash-join of (R  S)  U. –The total is 55,000 disk I/O's.

Example (IV) Now, suppose k > 49, but k < We can still pipeline the result R  S, but we need to use another strategy, in which this relation is joined with U in a 50-bucket, two-pass hash-join. 1.Before we start on R  S, we hash U into 50 buckets of 200 blocks. 2.Next, we perform a two-pass hash join of R and S using 51 buckets as before, but as each tuple of R  S is generated, we place it in one of the 50 remaining buffers that is used to help form the 50 buckets for the join of R  S with U. 3.Finally, we join R  S with U bucket by bucket. Since k < 5000, the buckets of R  S will be of size at most 100 blocks, so this join is feasible. The fact that buckets of U are of size 200 blocks is not a problem. We are using buckets of R  S as the build relation and buckets of U as the probe relation in the one-pass joins of buckets. The number of disk I/O's for this pipelined join is: a) 20,000 to read U and write its tuples into buckets. b) 45,000 to perform the two-pass hash-join R  S. c) k to write out the buckets of R  S. d) k + 10,000 to read the buckets of R  S and U in the final join. The total cost is thus 75, k.

Example (V) Last, let us consider what happens when k > Now, we cannot perform a two-pass join in the 50 buffers available if the result of R  S is pipelined. So, –Compute R  S using a two-pass hash join and materialize the result on disk. –Join R  S with U, also using a two-pass hash-join. Note that since B(U) = 10,000, we can perform a two-pass hash-join using 100 buckets regardless of how large k is. Technically, U should appear as the left argument of its join if we decide to make U the build relation for the hash join. Number of disk I/O's for this plan is: 1.45,000 for the two-pass join of R and S. 2.k to store R  S on disk. 3.30, k for the two-pass hash-join of U with R  S. Total cost is thus 75, k

Example (VI)

Notation for Physical Plans It can also be SortScan(R,L) if a sort based join is preferred.

Notation for Physical Plans