1 Query Compilation Evaluating Logical Query Plan Physical Query Plan Source: our textbook, slides by Hector Garcia-Molina.

Slides:



Advertisements
Similar presentations
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Advertisements

CS4432: Database Systems II
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
1 Query Compilation Parsing Logical Query Plan Source: our textbook, slides by Hector Garcia-Molina.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
1 Query Processing Two-Pass Algorithms Source: our textbook.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
Query Processing & Optimization
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
CSCE Database Systems Chapter 15: Query Execution 1.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Database Management 9. course. Execution of queries.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Completing the Physical- Query-Plan and Chapter 16 Summary ( ) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
15.1 – Introduction to physical-Query-plan operators
Database Management System
Prepared by : Ankit Patel (226)
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Lecture 27: Optimizations
Lecture 23: Query Execution
Evaluation of Relational Operations: Other Techniques
Lecture 27 Wednesday, December 5, 2001.
Lecture 20: Query Execution
Presentation transcript:

1 Query Compilation Evaluating Logical Query Plan Physical Query Plan Source: our textbook, slides by Hector Garcia-Molina

2 Outline uConvert SQL query to a parse tree wSemantic checking: attributes, relation names, types uConvert to a logical query plan (relational algebra expression) wdeal with subqueries uImprove the logical query plan wuse algebraic transformations wgroup together certain operators wevaluate logical plan based on estimated size of relations uConvert to a physical query plan wsearch the space of physical plans wchoose order of operations wcomplete the physical query plan

3 Estimating Sizes of Relations uUsed in two places: wto help decide between competing logical query plans wto help decide between competing physical query plans uNotation review: wT(R): number of tuples in relation R wB(R): minimum number of block needed to store R wV(R,a): number of distinct values in R of attribute a

4 Desiderata for Estimation Rules 1.Give accurate estimates 2.Are easy (fast) to compute 3.Are logically consistent: estimated size should not depend on how the relation is computed Here describe some simple heuristics. All we really need is a scheme that properly ranks competing plans.

5 Estimating Size of Projection uThis can be exactly computed uEvery tuple changes size by a known amount.

6 Estimating Size of Selection uSuppose selection condition is A = c, where A is an attribute and c is a constant. uA reasonable estimate of the number of tuples in the result is: wT(R)/V(R,A), i.e., original number of tuples divided by number of different values of A uGood approximation if values of A are evenly distributed uAlso good approximation in some other, common, situations (see textbook)

7 Estimating Size of Selection (cont'd) uIf condition is A < c: wa good estimate is T(R)/3; intuition is that usually you ask about something that is true of less than half the tuples uIf condition is A ≠ c: wa good estimate is T(R ) uIf condition is the AND of several equalities and inequalities, estimate in series.

8 Example uConsider relation R(a,b,c) with 10,000 tuples and 50 different values for attribute a. uConsider selecting all tuples from R with a = 10 and b < 20. uEstimate of number of resulting tuples is 10,000*(1/50)*(1/3) = 67.

9 Estimating Size of Selection (cont'd) If condition has the form C 1 OR C 2, use: 1.sum of estimate for C 1 and estimate for C 2, or 2.minimum of T(R) and the previous, or 3.assuming C 1 and C 2 are independent, T(R)*(1  (1  f 1 )*(1  f 2 )), where f 1 is fraction of R satisfying C 1 and f 2 is fraction of R satisfying C 2

10 Example uConsider relation R(a,b) with 10,000 tuples and 50 different values for a. uConsider selecting all tuples from R with a = 10 or b < 20. uEstimate for a = 10 is 10,000/50 = 200 uEstimate for b < 20 is 10,000/3 = 3333 uEstimate for combined condition is w = 3533 or  10,000*(1  (1  1/50)*(1  1/3)) = 3466

11 Estimating Size of Natural Join uAssume join is on a single attribute Y. uSome possibilities: 1.R and S have disjoint sets of Y values, so size of join is 0 2.Y is the key of S and a foreign key of R, so size of join is T(R) 3.All the tuples of R and S have the same Y value, so size of join is T(R)*T(S) uWe need some assumptions…

12 Common Join Assumptions uContainment of Value Sets: If R and S both have attribute Y and V(R,Y) ≤ V(S,Y), then every value of Y in R appears a value of Y in S wtrue if Y is a key of S and a foreign key of R uPreservation of Value Sets: After the join, a non-matching attribute of R has the same number of values as it does in R wtrue if Y is a key of S and a foreign key of R

13 Join Estimation Rule uExpected number of tuples in result is wT(R)*T(S) / max(V(R,Y),V(S,Y)) uWhy? Suppose V(R,Y) ≤ V(S,Y). wThere are T(R) tuples in R. wEach of them has a 1/V(S,Y) chance of joining with a given tuple of S, creating T(S)/V(S,Y) new tuples

14 Example uSuppose we have wR(a,b) with T(R) = 1000 and V(R,b) = 20 wS(b,c) with T(S) = 2000, V(S,b) = 50, and V(S,c) = 100 wU(c,d) with T(U) = 5000 and V(U,c) = 500 uWhat is the estimated size of R S U? wFirst join R and S (on attribute b): estimated size of result, X, is T(R)*T(S)/max(V(R,b),V(S,b)) = 40,000 by containment of value sets, number of values of c in X is the same as in S, namely 100 wThen join X with U (on attribute c): estimated size of result is T(X)*T(U)/max(V(X,c),V(U,c)) = 400,000

15 Example (cont'd) uIf the joins are done in the opposite order, still get the same estimated answer uDue to preservation of value sets assumption. uThis is desirable: we don't want the estimate to depend on how the result is computed

16 More About Natural Join uIf there are mutiple join attributes, the previous rule generalizes: wT(R)*T(S) divided by the larger of V(R,y) and V(S,y) for each join attribute y uConsider the natural join of a series of relations: wcontainment and preservation of value sets assumptions ensure that the same estimated size is achieved no matter what order the joins are done in

17 Summary of Estimation Rules uProjection: exactly computable uProduct: exactly computable uSelection: reasonable heuristics uJoin: reasonable heuristics uThe other operators are harder to estimate…

18 Additional Estimation Heuristics uUnion: wbag: exactly computable (sum) wset: estimate as larger plus half the smaller uIntersection: estimate as half the smaller  Difference: estimate R  S as T(R )  T(S)/2 uDuplicate elimination: T(R)/2 or product of all the V(R,a)'s, whichever is smaller uGrouping: T(R )/2 or product of V(R,a) for all grouping attributes a, whichever is smaller

19 Estimating Size Parameters uEstimating the size of a relation depended on knowing T(R) and V(R,a)'s uEstimating cost of a physical algorithm depends on also knowing B(R). uHow can the query compiler learn them? wScan relation to learn T, V's, and then calculate B wCan also keep a histogram of the values of attributes. Makes estimating join results more accurate wRecomputed periodically, after some time or some number of updates, or if DB administrator thinks optimizer isn't choosing good plans

20 Heuristics to Reduce Cost of LQP uFor each transformation of the tree being considered, estimate the "cost" before and after doing the transformation uAt this point, "cost" only refers to sizes of intermediate relations (we don't yet know about number of disk I/O's) uSum of sizes of all intermediate relations is the heuristic: if this sum is smaller after the transformation, then incorporate it

21 R(a,b)S(b,c) T(R ) = 5000T(S) = 2000 V(R,a) = 50 V(R,b) = 100V(S,b) = 200 V(S,c) = 100   a=10 R S Initial logical query plan: Modified logical query plan: move selection down should  be moved below join?   a=10 S R  R S vs vs. 1100

22 Outline uConvert SQL query to a parse tree wSemantic checking: attributes, relation names, types uConvert to a logical query plan (relational algebra expression) wdeal with subqueries uImprove the logical query plan wuse algebraic transformations wgroup together certain operators wevaluate logical plan based on estimated size of relations uConvert to a physical query plan wsearch the space of physical plans wchoose order of operations wcomplete the physical query plan

23 Deriving a Physical Query Plan uTo convert a logical query plan into a physical query plan, choose: wan order and grouping for sets of joins, unions, and intersections walgorithm for each operator (e.g., nest-loop join vs. hash join) wadditional operators (scanning, sorting, etc.) that are needed for physical plan but not explicitly in the logical plan whow to pass arguments (store intermediate result on disk vs. pipeline one tuple or buffer at time) uPhysical query plans are evaluated by their estimated cost…

24 Cost of Evaluating an Expression uMeasure by number of disk I/O's uInfluenced by: woperators in the chosen logical query plan wsizes of intermediate results wphysical operators used to implement the logical operators wordering of groups of similar operators (e.g., joins) wargument passing method

25 Enumerating Physical Plans uBaseline approach is exhaustive search, but not practical (too many options) uHeuristic selection: make a sequence of choices based on heuristics uVarious other approaches based on ideas from AI and algorithm analysis to search a space of possibilities uCompare plans by counting number of disk I/O's

26 Some Heuristics uTo implement selection on R with condition A = c: if R has an index on a, then use index-scan uTo implement join when one argument R has an index on the join attribute(s): use index-join with R in inner loop uTo implement join when one argument R is sorted on the join attribute(s): choose sort-join over hash-join uTo implement union or intersection of > 2 relations: group smallest relations first

27 Outline uConvert SQL query to a parse tree wSemantic checking: attributes, relation names, types uConvert to a logical query plan (relational algebra expression) wdeal with subqueries uImprove the logical query plan wuse algebraic transformations wgroup together certain operators wevaluate logical plan based on estimated size of relations uConvert to a physical query plan wsearch the space of physical plans wchoose order of operations wcomplete the physical query plan

28 Choosing Order for Joins uSuppose we have > 2 relations to be joined (naturally) uPay attention to asymmetry: wone-pass alg: left argument is smaller and is stored in main memory data structure wnested-loop alg: left argument is used in the outer loop windex-join: right argument has the index uCommon point: these algs work better if left argument is the smaller one

29 Choosing Join Order (cont'd) uTemplate for tree is given below: uChoices are which relations go where: R S U W U S R V vs.

30 Choosing Join Order (cont'd) uHow do we decide on the leaves? wTry all possibilities. Not a good idea: there are n! choices, where n is the number of relations to be joined wUse dynamic programming, a technique from analysis of algorithms. Works well for relatively small values of n wHeuristic approach with a greedy algorithm, works faster but doesn't always find the best ordering

31 Outline uConvert SQL query to a parse tree wSemantic checking: attributes, relation names, types uConvert to a logical query plan (relational algebra expression) wdeal with subqueries uImprove the logical query plan wuse algebraic transformations wgroup together certain operators wevaluate logical plan based on estimated size of relations uConvert to a physical query plan wsearch the space of physical plans wchoose order of operations wcomplete the physical query plan

32 Remaining Steps uChoose algorithms for remaining operators uDecide when intermediate results will be materialized (stored on disk in entirety) or pipelined (created only in main memory, in pieces)

33 Choosing Selection Method uSuppose selection condition is the AND of several equalities and inequalities, each involving an attribute and a constant wEx: a = 10 AND b < 20 uDecide between these algorithms: wdo a table scan and "filter" each tuple to check for the condition wdo an index scan on one attribute (which one?) and "filter" each retrieved tuple to check for the remaining parts of the condition uCompare number of disk I/O's

34 Disk I/O Costs uTable scan: wB(R) if R is clustered uIndex scan on an attribute that is part of an equality: wB(R)/V(R,a) if index is clustering uIndex scan on an attribute that is part of an inequality wB(R)/3 if the index is clustering T(R)  T(R)  T(R)  not

35 Example uAssumptions about R(x,y,z): w5000 tuples w200 blocks wV(R,x) = 100 wV(R,y) = 500 uSelect tuples satisfying x=1 AND y=2 AND z<5 uChoices and their costs: 1.table scan: B(R) = index scan on x: T(R)/V(R,x) = 50 3.index scan on y: T(R)/V(R,y) = 10 4.index scan on z: B(R)/3 = 67 wR is clustered windex on x is not clustering windex on y is not clustering windex on z is clustering

36 Choosing Join Method uIf we have good estimates of relation statistics (T(R), B(R), V(R,a)'s) and the number of main memory buffers available, use formulas from Ch. 15 regarding sort-join, hash-join, and index-join. uOtherwise, apply these principles: wtry one-pass join wtry nested-loop join wsort-join is good if one argument is already sorted on join attribute(s) or there are multiple joins on same attribute, so the cost of sorting can be amortized over additional join(s) wif joining R and S, R is small, and S has an index on the join attribute, then use index-join wif none of the above apply, use hash-join

37 Materialization vs. Pipelining uMaterialization: perform operations in series and write intermediate results to disk uPipelining: interleave execution of several operations. Tuples produced by one operation are passed directly to the operations that use them as input, bypassing the disk wsaves on disk I/O's wrequires more main memory

38 Notation for Physical Query Plan When converting logical query plan (tree) to physical query plan (tree): uleaves of LQP (stored relations) become scan operators uinternal nodes of LQP (operators) become one or more physical operations (algorithms) uedges of LQP are marked as "pipeline" or "materialize" w"materialize" choice implies a scan of the intermediate relation

39 Operators for Leaves uTableScan(R ) : all blocks holding tuples of R are read in arbitrary order uSortScan(R,L): all tuples of R are read in order, sorted according to attributes in L uIndexScan(R,C): tuples of R satisfying C are retrieved through an index on attribute A; C is a comparison condition involving A uIndexScan(R,A): all tuples of R are retrieved through an index on A

40 Physical Operators for Selection uIf there is no index on the attribute in the condition C, then use Filter(C) operator uIf the relation is on disk, then we must precede the Filter with TableScan or SortScan uIf the condition has the form A op c AND D, then use the physical operators IndexScan(R,A op c) followed by Filter(D)

41 Example Physical Query Plans Filter(x=1 AND z<5) IndexScan(R,y=2) two-pass hash-join 101 buffers two-pass hash-join 101 buffers TableScan(U) TableScan(R)TableScan(S) materialize R S U  x=1 AND y=2 AND z<5 (R)