Download presentation
Presentation is loading. Please wait.
Published byCalvin Hubbard Modified over 9 years ago
1
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization
2
ICOM 6005Dr. Manuel Rodriguez Martinez2 Query Optimization Read : –Chapter 12, sec 12.4 –Chapter 15 –SAC+79 Pages Purpose: –Study different algorithms to optimize queries submitted to the DBMS
3
ICOM 6005Dr. Manuel Rodriguez Martinez3 Introduction SQL query gets translated into relational algebra expression Relational algebra expression is represented as tree –This is what DBMS “understands” how to process –Expression becomes a plan once we identify access methods for each operator Relational algebra expression might have an equivalent expression –Example: R(A,B,C), S(A, D, F) But, each expression might have different cost How do we find the cheapest expression?
4
ICOM 6005Dr. Manuel Rodriguez Martinez4 Relational DBMS Architecture Disk Space Management Buffer Management File and Access Methods Relational Operators Query Optimizer Query Parser Client API Client DB Execution Engine Concurrency and Recovery
5
ICOM 6005Dr. Manuel Rodriguez Martinez5 Query Optimizer Module in DBMS in charge of finding cheapest available plan to execute a query Building one is not easy! Optimizer searches for plans and compares then based on cost Cost can be: –Resource usage –Response time –Power consumption –Number of I/Os –Network transmission cost
6
ICOM 6005Dr. Manuel Rodriguez Martinez6 Query Plans Query plan specifies the operations to be executed Tree of operators –Each operator corresponds to a relational operator Leaf nodes usually represent base tables R S R S T S A,B S A>2
7
ICOM 6005Dr. Manuel Rodriguez Martinez7 Executing Query Plans Plans generated by the optimizer are fed to the execution engine Plans support iterator interface –Open – initialize the operator –Next – get next tuple from operator –Close – de-allocate resources from operator Execution engine invokes each method Invocation triggers cascade of calls –Each operator call the corresponding methods on child nodes –Example: open on join, causes call to open on outer table and call to open on inner table. Same for next and close.
8
ICOM 6005Dr. Manuel Rodriguez Martinez8 Pipelined vs Materialized Execution Pipelined –The output tuple from one operator immediately becomes input tuple to its parent operator in the tree Materialized –The output tuples from one operator must be stored to disk first (into a temporary table) –Once the operator finishes, its parent operator can access the materialed tuples Most execution engines use pipelined –Saving in I/O can be substantial! Some operator cannot be pipelined –Sorting, projections with duplicate elimination Query optimizer must be aware of this issue!
9
ICOM 6005Dr. Manuel Rodriguez Martinez9 Generation of Query Plans Optimizer generates query plan after a search finds the optimal one –According to some criteria Search is a search by construction –Alterative plans are built and compared –Cheapest one is kept Two major algorithms exist –Dynamic programming (SAC+79) Exhaustive search of plan space Finds the optimal –Randomized Algorithm Random search of plan space Quickly finds sub-optimal but good plan Optimization philosophy find a good plan quick, avoid bad ones!
10
ICOM 6005Dr. Manuel Rodriguez Martinez10 Left-Deep Join vs Bushy Plans Query Optimizer generate two major types of plans –Left-deep plans –Bushy plans Left-deep plans –Every join has a base table as the inner join table –Use in commercial systems (first in System R) –Good for dynamic programming –Good for optimizing resource usage Bushy plans –Joins might have intermediate tables as input to the join –Good for randomized search –Use in research prototypes for distributed databases –Good for optimizing response time
11
ICOM 6005Dr. Manuel Rodriguez Martinez11 Left-deep plans R S R S T R S T U Each join always has base table as inner table
12
ICOM 6005Dr. Manuel Rodriguez Martinez12 Bushy Plans R S T R S R S T U U V
13
ICOM 6005Dr. Manuel Rodriguez Martinez13 Left-Deep vs Bushy Plans Bushy plans –Enable parallelism in operator evaluation –Operator can execute at different rates Good in distributed environments –More complicate to build (harder optimizer) Left-deep plans –Joins are run in sequence Susceptible to bottleneck at some operator –Simpler to build (easier optimizer) –Good for single site systems Everything runs on the same machine Commercial DBMS systems use Left-deep plans
14
ICOM 6005Dr. Manuel Rodriguez Martinez14 Cost of a plan The cost of plan depends on the metric you wish to optimize Resource usage (CPU + I/O + Network) –Cost is the sum of the resources used by each operator Response time –Cost of the slowest path in the tree Number of I/Os –Cost is the sum of the I/Os generated by each operator Network cost –Cost is sum of cost in moving data between operators
15
ICOM 6005Dr. Manuel Rodriguez Martinez15 Organization of an optimizer Query Execution Engine Query Parser Parse Tree Query Plan SQL Query Catalog Manager Catalog Query Optimizer Plan Generator Cost Estimator
16
ICOM 6005Dr. Manuel Rodriguez Martinez16 Generating Alternatives Relational equivalences are used by the optimizer to generate different operators that do the same Selection equivalences: –Cascading selections –Commutative selections
17
ICOM 6005Dr. Manuel Rodriguez Martinez17 Equivalence Rules Projections –Cascading projections Joins –Commutative rule –Associative rule
18
ICOM 6005Dr. Manuel Rodriguez Martinez18 Equivalence Rules(2) Commute selections and projections Pushing selections Decomposing selections
19
ICOM 6005Dr. Manuel Rodriguez Martinez19 System R Optimizer Based on left-deep plans and dynamic programming –Most commercial systems use a System R type of optimizer Cost is based on resource usage –Cost = CPU Cost + I/O Cost –Given a plan P, cost of P is computed as Cost(P) = operatorCost(P.root) + Cost(P.root.leftChild) + Cost(P.root.rightChild)
20
ICOM 6005Dr. Manuel Rodriguez Martinez20 Estimating Cost of Operators Key feature for this is selectivity factors, selectivity, and join costs Example: –R has no index |R| = 100,000, ||R|| = 5000 –S has un-clustered Index on Join attribute |S| = 70,000, ||S|| = 2500 –What algorithm shall be use for join? Chose between: BNLJ, INLJ, e GHJ with 20 B –What is the cost? R S
21
ICOM 6005Dr. Manuel Rodriguez Martinez21 System R Search Algorithm Idea: –Build every possible plan and keep track of Cheapest plan (overall) Cheapest plans that bring data in sorted order (called interesting orders) Dynamic Programming (divide-and-conquer) –To find plan for n-way join you First find singe table plans Then find plans for all (n-1)-way join and find a plan to join missing table with an (n-1)-way join Plan for smaller joins are saved on a table
22
ICOM 6005Dr. Manuel Rodriguez Martinez22 System R Search Algorithm (2) Process: –For an n-way join between tables R1, R2, …, Rn: Find the access path to access each table –Plans access to get R1, R2, …, Rn »This includes application of selection and projections for each table Find the access path to compute 2-way joins –2-way joins for all possible pairs of tables Find the access path to compute 3-way joins –Add a table to a 2-way join (forms all possible 3-way joins) Find the access path to compute 4-way joins –Add a table to a 3-way join (forms all possible 4-way joins) … Find the access path to compute the n-way join –Add a table to a n-1 way join (forms all possible n-way joins)
23
ICOM 6005Dr. Manuel Rodriguez Martinez23 System R Search Algorithm (2) Plan SystemROptimizer(R1, R2, …, Rn){ for (int i = 0; i < n; ++i){ // single table access paths optPlan(Ri) = selectPlan(Ri); } for (int i=2; I < n; ++i){ // join access paths, start with 2 tables, then 3, …, for all S {R1, R2, R3, …Rn} s.t. |S| == i { // S is the next set to join bestPlan = dummy plan with infinite cost; for all Rj, Sj s.t. S = Sj {Rj} { // Sj & Rj are pieces of S P = joinPlan(optPlan(Sj), optPlan(Rj)) if (cost(P) < cost(bestPlan)){ bestPlan = P; } optPlan(Sj) = bestPlan; } S = {R1, R2, …, Rn); return optPlan(S); }
24
ICOM 6005Dr. Manuel Rodriguez Martinez24 Illustration How does the algorithm works for this case Tables –R - |R| = 100,000, ||R|| = 8,000 –S - |S| = 90,000, ||S|| = 6,000 –T - |T| = 120,000 ||T|| = 10,000 –U - |U| = 80,000 ||U|| = 4,000 All tables are stored in heap files DBMS has: Blocked nested loops join, Hash Join and 25 free buffers What is the best plan for query: –R S T U
25
ICOM 6005Dr. Manuel Rodriguez Martinez25 Illustration (2) How does the algorithm works for this case Tables –R - |R| = 100,000, ||R|| = 8,000 R is stored on a clustered B+tree matching join attribute with T –S - |S| = 90,000, ||S|| = 6,000 |R| is stored on –T - |T| = 120,000 ||T|| = 10,000 DBMS has: Blocked nested loops join, Indexed- nested loops join, and Hash Join and 3 free buffers What is the best plan for query: –R S T
26
ICOM 6005Dr. Manuel Rodriguez Martinez26 Illustration (3) How does the algorithm works for this case Tables –R - |R| = 100,000, ||R|| = 8,000 R is stored on a clustered B+tree matching join attribute with T –S - |S| = 90,000, ||S|| = 6,000 |R| is stored on –T - |T| = 120,000 ||T|| = 10,000 DBMS has: Blocked nested loops join, Indexed- nested loops join, and Hash Join and 25 free buffers What is the best plan for query: – A>3 B = ‘NY’ (R) S T –If SF A>3 =.10 and SF B= ‘NY ’ = 0.05 and A>3 matches index on R.
27
ICOM 6005Dr. Manuel Rodriguez Martinez27 Issues with SystemR optimizer Algorithm performs exhaustive search of left-deep plans Dynamic Programming is ill-suited for optimization of response time –Principle of optimality is not observed Difficult (but not impossible) to modify for bushy plans –Search space is huge –Need pruning techniques to cut on the number of plans stored Do we need exhaustive search? –Optimal plan vs sub-optimal that is good and quick to find Disaster avoidance – More important to avoid bad plans!!!
28
ICOM 6005Dr. Manuel Rodriguez Martinez28 Alternative approaches Randomized Query Optimization –Use randomized algorithms to build and search plans –Good for bushy plans Rule-based Query Optimization –Use rules to guide the search and better prune space –Good to apply special cases and pruning Parametric Query Optimization –Add run-time parameters to really capture the reality of the system Multiple-query Optimization –Optimizer takes 2 or more queries at a time for optimization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.