1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.

Slides:



Advertisements
Similar presentations
Choosing an Order for Joins
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Lecture 8 Join Algorithms. Intro Until now, we have used nested loops for joining data – This is slow, n^2 comparisons How can we do better? – Sorting.
1 Lecture 23: Query Execution Friday, March 4, 2005.
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each.
15.3 Nested-Loop Joins By: Saloni Tamotia (215). Introduction to Nested-Loop Joins  Used for relations of any side.  Not necessary that relation fits.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
Lecture 24: Query Execution Monday, November 20, 2000.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Choosing an Order for Joins Chapter 16.6 by: Chiu Luk ID: 210.
Relational Query Optimization (this time we really mean it)
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.
Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 1, 2005 Some slide content derived.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Choosing an Order for Joins (16.6) Neha Saxena (214) Instructor: T.Y.Lin.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Copyright © Curt Hill Query Evaluation Translating a query into action.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Chapter 13: Query Processing Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
1 Database Systems ( 資料庫系統 ) December 7, 2011 Lecture #11.
Chapter 13: Query Processing
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Lecture 24 Query Execution Monday, November 28, 2005.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CSCE Database Systems Chapter 15: Query Execution 1.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
1 Lecture 23: Query Execution Monday, November 26, 2001.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
1 Query Optimization. 2 Query Evaluation Problem: An SQL query is declarative – does not specify a query execution plan. A relational algebra expression.
CS 540 Database Management Systems
CS 440 Database Management Systems
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
An Overview of Query Optimization
Database Systems Ch Michael Symonds
Query Optimization Overview
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

1 Choosing an Order for Joins

2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join ABCD

3 Issues to consider for 2-way Joins Join attributes sorted or indexed? Can either relation fit into memory? Yes, evaluate in a single pass No, require Log M B passes. M is #of buffer pages and B is # of pages occupied by smaller of the two relations Algorithms (nested loop, sort, hash, index) Smaller relation as left argument, why?

4 Nested Loop Join L is left/outer relation Useful if no index, and not sorted on the join attribute Read as many pages of L as possible into memory Single pass over L, B(L)/(M-1) passes over R LR Read R one tuple at a time... Read R one tuple at a time

5 Sort Merge Join Useful if either relation is sorted. Result sorted on join attribute. Divide relation into M sized chunks Sort the each chunk in memory and write to disk Merge sorted chunks LR Memory L or R... Sorted Chunks Memory Sorted file... Sorted Chunks

6 Hash Join Hash (using join attribute as key) tuples in L and R into respective buckets Join by matching tuples in the corresponding buckets of L and R Use L as build relation and R as probe relation LR Hash L Buckets R X L1 X R1... X Ln X Rn X Li X Ri Hash Probe Build Memory

7 Index Join Join attribute is indexed in R Match tuples in R by performing an index lookup for each tuple in L If clustered b-tree index, can perform sort-join using only the index L R Index Lookup LR

8 Ordering N-way Joins Joins are commutative and associative (A B) C = A (B C) Choose order which minimizes the sum of the sizes of intermediate results Likely I/O and computationally efficient If the result of A B is smaller than B C, then choose (A B) C Alternative criteria: disk accesses, CPU, response time, network

9 Ordering N-way Joins Choose the shape of the join tree Equivalent to number of ways to parenthesize n- way joins Recurrence: T(1) = 1 T(n) = Σ T(i)T(n-i), T(6) = 42 Permutation of the leaves n! For n = 6, the number of join trees is 42*6! Or 30,240

10 Ordering N-way Joins T(1) = 1 T(2) = 1 T(3) = = 2 a(bc) or (ab)c T(4) = = 5 a(b(cd)) a((bc)d) (ab)(cd) (a(bc))d ((ab)c)d T(5) = = 14 T(6) = = 42

11 Shape of Join Tree A D C B DCBA Left-deep TreeBushy Tree

12 Shape of Join Tree A left-deep (right-deep) tree is a join tree in which the right-hand-side (left-hand-side) is a relation, not an intermediate join result Bushy tree is neither left nor right-deep Considering only left-deep trees is usually good enough Smaller search space Tend to be efficient for certain join algorithms (order relations from smallest to largest) Allows for pipelining Avoid materializing intermediate results on disk

13 Searching for the best join plan Exhaustively enumerating all possible join order is not feasible (n!) Dynamic programming use the best plan for (k-1)-way join to compute the best k-way join Greedy heuristic algorithm Iterative dynamic programming

14 Dynamic Programming The best way to join k relations is drawn from k plans in which the left argument is the least cost plan for joining k-1 relations BestPlan(A,B,C,D,E) = min of ( BestPlan(A,B,C,D) E, BestPlan(A,B,C,E) D, BestPlan(A,B,D,E) C, BestPlan(A,C,D,E) B, BestPlan(B,C,D,E) A )

15 Complexity Finds optimal join order but must evaluate all 2-way, 3-way, …, n-way joins (n choose k) Time O(n*2 n ), Space O(2 n ) Exponential complexity, but joins on > 10 relations rare

16 Dynamic Programming Choosing the best join order or algorithm for each subset of relations may not be the best decision Sort-join produces sorted output that may be useful (i.e., ORDER BY/GROUP BY, sorted on join attribute) Pipelining with nested-loop join Availability of indexes

17 Heuristic Approaches Dynamic programming may still be too expensive Sample heuristics: Join from smallest to largest relation Perform the most selective join operations first Index-joins if available Precede Cartesian product with selection Most systems use hybrid of heuristic and cost-based optimization