Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

Slides:



Advertisements
Similar presentations
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Advertisements

Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
COMP 451/651 Optimizing Performance
Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Lecture 24: Query Execution Monday, November 20, 2000.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
Summary of query compilers (Section16.8) Varun Gupta Department of Computer Science ID-216 CS 257.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
T HE Q UERY C OMPILER Prepared by : Ankit Patel (226)
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Query Processing Presented by Aung S. Win.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
CSCE Database Systems Chapter 15: Query Execution 1.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Database Management 9. course. Execution of queries.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
CS4432: Database Systems II Query Processing- Part 2.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
CS 540 Database Management Systems
1 Lecture 23: Query Execution Monday, November 26, 2001.
Chapter 13: Query Processing
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
CS 440 Database Management Systems
Database Management System
Prepared by : Ankit Patel (226)
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Lecture 23: Query Execution
Evaluation of Relational Operations: Other Techniques
CPSC-608 Database Systems
Lecture 11: B+ Trees and Query Execution
Completing the Physical-Query-Plan and Chapter 16 Summary ( )
Lecture 20: Query Execution
Presentation transcript:

Completing the Physical- Query-Plan and Chapter 16 Summary ( ) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood

2 Outline 16.7 Completing the Physical-Query-Plan I. Choosing a Selection Method II. Choosing a Join Method III. Pipelining Versus Materialization IV. Pipelining Unary Operations V. Pipelining Binary Operations VI. Notation for Physical Query Plan VII. Ordering the Physical Operations 16.8 Summary of Chapter 16

3 Before complete Physical- Query-Plan  A query previously has been  Parsed and Preprocessed (16.1)  Converted to Logical Query Plans (16.3)  Estimated the Costs of Operations (16.4)  Determined costs by Cost-Based Plan Selection (16.5)  Weighed costs of join operations by choosing an Order for Joins

Completing the Physical- Query-Plan  3 topics related to turning LP into a complete physical plan 1.Choosing of physical implementations such as Selection and Join methods 2.Decisions regarding to intermediate results (Materialized or Pipelined) 3.Notation for physical-query-plan operators

5 I. Choosing a Selection Method (A)  Algorithms for each selection operators 1. Can we use an created index on an attribute?  If yes, index-scan. Otherwise table-scan) 2. After retrieve all condition-satisfied tuples in (1), then filter them with the rest selection conditions

6 Choosing a Selection Method(A) (cont.)  Recall  Cost of query = # disk I/O’s  How costs for various plans are estimated from σ C (R) operation 1. Cost of table-scan algorithm a)B(R) if R is clustered b)T(R) if R is not clustered 2. Cost of a plan picking an equality term (e.g. a = 10) w/ index-scan a)B(R) / V(R, a) clustering index b)T(R) / V(R, a) nonclustering index 3. Cost of a plan picking an inequality term (e.g. b < 20) w/ index-scan a)B(R) / 3 clustering index b)T(R) / 3 nonclustering index

7 Example Selection: σ x=1 AND y=2 AND z<5 (R) - Where parameters of R(x, y, z) are : T(R)=5000,B(R)=200, V(R,x)=100, andV(R, y)=500 -Relation R is clustered -x, y have nonclustering indexes, only index on z is clustering.

8 Example (cont.) Selection options: 1.Table-scan  filter x, y, z. Cost is B(R) = 200 since R is clustered. 2.Use index on x =1  filter on y, z. Cost is 50 since T(R) / V(R, x) is (5000/100) = 50 tuples, index is not clustering. 3.Use index on y =2  filter on x, z. Cost is 10 since T(R) / V(R, y) is (5000/500) = 10 tuples using nonclustering index. 4.Index-scan on clustering index w/ z < 5  filter x, y. Cost is about B(R) /3 = 67

9 Example (cont.)  Costs option 1 = 200 option 2 = 50 option 3 = 10 option 3 = 10 option 4 = 67 The lowest Cost is option 3.  Therefore, the preferred physical plan 1.retrieves all tuples with y = 2 2.then filters for the rest two conditions (x, z).

10 II. Choosing a Join Method  Determine costs associated with each join algorithms: 1. One-pass join, and nested-loop join devotes enough buffer to joining 2. Sort-join is preferred when attributes are pre-sorted or two or more join on the same attribute such as ( R(a, b) S(a, c)) T(a, d) - where sorting R and S on a will produce result of R S to be sorted on a and used directly in next join

11 3. Index-join for a join with high chance of using index created on the join attribute such as R(a, b) S(b, c) 4. Hashing join is the best choice for unsorted or non-indexing relations which needs multipass join. Choosing a Join Method (cont.)

12 III. Pipelining Versus Materialization  Materialization (naïve way)  store (intermediate) result of each operations on disk  Pipelining (more efficient way)  Interleave the execution of several operations, the tuples produced by one operation are passed directly to the operations that used it  store (intermediate) result of each operations on buffer, which is implemented on main memory

13  Unary = a-tuple-at-a-time or full relation  selection and projection are the best candidates for pipelining. IV. Pipelining Unary Operations R In buf Unary operation Out buf In buf Unary operation Out buf M-1 buffers

14 Pipelining Unary Operations (cont.)  Pipelining Unary Operations are implemented by iterators

15 V. Pipelining Binary Operations  Binary operations : , , -,, x  The results of binary operations can also be pipelined.  Use one buffer to pass result to its consumer, one block at a time.  The extended example shows tradeoffs and opportunities

16 Example  Consider physical query plan for the expression ( R(w, x) S(x, y)) U(y, z)  Assumption  R occupies 5,000 blocks, S and U each 10,000 blocks.  The intermediate result R S occupies k blocks for some k.  Both joins will be implemented as hash-joins, either one-pass or two-pass depending on k  There are 101 buffers available.

17 Example (cont.)  First consider join R S, neither relations fits in buffers  Needs two-pass hash-join to partition R into 100 buckets (maximum possible) each bucket has 50 blocks  The 2 nd pass hash-join uses 51 buffers, leaving the rest 50 buffers for joining result of R S with U.

18 Example (cont.)  Case 1: suppose k  49, the result of R S occupies at most 49 blocks.  Steps 1.Pipeline in R S into 49 buffers 2.Organize them for lookup as a hash table 3.Use one buffer left to read each block of U in turn 4.Execute the second join as one-pass join.

19 Example (cont.)  The total number of I/O’s is 55,000  45,000 for two-pass hash join of R and S  10,000 to read U for one- pass hash join of (R S) U. (R S) U.

20 Example (cont.)  Case 2: suppose k > 49 but 49 but < 5,000, we can still pipeline, but need another strategy which intermediate results join with U in a 50- bucket, two-pass hash-join. Steps are: 1.Before start on R S, we hash U into 50 buckets of 200 blocks each. 2.Perform two-pass hash join of R and U using 51 buffers as case 1, and placing results in 50 remaining buffers to form 50 buckets for the join of R S with U. 3.Finally, join R S with U bucket by bucket.

21 Example (cont.)  The number of disk I/O’s is:  20,000 to read U and write its tuples into buckets  45,000 for two-pass hash-join R S  k to write out the buckets of R S  k+10,000 to read the buckets of R S and U in the final join  The total cost is 75,000+2k.

22 Example (cont.)  Compare Increasing I/O’s between case 1 and case 2  k  49 (case 1)  Disk I/O’s is 55,000  k > 50  5000 (case 2)  k=50, I/O’s is 75,000+(2*50) = 75,100  k=51, I/O’s is 75,000+(2*51) = 75,102  k=52, I/O’s is 75,000+(2*52) = 75,104 Notice: I/O’s discretely grows as k increases from 49  50.

23 Example (cont.)  Case 3: k > 5,000, we cannot perform two-pass join in 50 buffers available if result of R S is pipelined. Steps are 1.Compute R S using two-pass join and store the result on disk. 2.Join result on (1) with U, using two-pass join.

24 Example (cont.)  The number of disk I/O’s is:  45,000 for two-pass hash-join R and S  k to store R S on disk  30,000 + k for two-pass join of U in R S  The total cost is 75,000+4k.

25 Example (cont.)  In summary, costs of physical plan as function of R S size.

26 VI. Notation for Physical Query Plans  Several types of operators: 1.Operators for leaves 2.(Physical) operators for Selection 3.(Physical) Sorts Operators 4.Other Relational-Algebra Operations  In practice, each DBMS uses its own internal notation for physical query plan.

27 Notation for Physical Query Plans (cont.) 1.Operator for leaves  A leaf operand is replaced in LQP tree  TableScan(R) : read all blocks  SortScan(R, L) : read in order according to L  IndexScan(R, C): scan index attribute A by condition C of form Aθc.  IndexScan(R, A) : scan index attribute R.A. This behaves like TableScan but more efficient if R is not clustered.

28 Notation for Physical Query Plans (cont.) 2.(Physical) operators for Selection  Logical operator σ C (R) is often combined with access methods.  If σ C (R) is replaced by Filter(C), and there is no index on R or an attribute on condition C  Use TableScan or SortScan(R, L) to access R  If condition C  Aθc AND D for condition D, and there is an index on R.A, then we may  Use operator IndexScan(R, Aθc) to access R and  Use Filter(D ) in place of the selection σ C (R)

29 Notation for Physical Query Plans (cont.) 3.(Physical) Sort Operators  Sorting can occur any point in physical plan, which use a notation SortScan(R, L).  It is common to use an explicit operator Sort(L) to sort relation that is not stored.  Can apply at the top of physical-query-plan tree if the result needs to be sorted with ORDER BY clause ( г ).

30 Notation for Physical Query Plans (cont.) 4.Other Relational-Algebra Operations  Descriptive text definitions and signs to elaborate  Operations performed e.g. Join or grouping.  Necessary parameters e.g. theta-join or list of elements in a grouping.  A general strategy for the algorithm e.g. sort- based, hashed based, or index-based.  A decision about number of passed to be used e.g. one-pass, two-pass or multipass.  An anticipated number of buffers the operations will required.

31 Notation for Physical Query Plans (cont.)  Example of a physical-query-plan  A physical-query-plan in example for the case k > 5000  TableScan  Two-pass hash join  Materialize (double line)  Store operator

32 Notation for Physical Query Plans (cont.)  Another example  A physical-query-plan in example for the case k < 49  TableScan  (2) Two-pass hash join  Pipelining  Different buffers needs  Store operator

33 Notation for Physical Query Plans (cont.)  A physical-query-plan in example  Use Index on condition y = 2 first  Filter with the rest condition later on.

34 VII. Ordering of Physical Operations  The PQP is represented as a tree structure implied order of operations.  Still, the order of evaluation of interior nodes may not always be clear.  Iterators are used in pipeline manner  Overlapped time of various nodes will make “ordering” no sense.

35 Ordering of Physical Operations (cont.)  3 rules summarize the ordering of events in a PQP tree: 1.Break the tree into sub-trees at each edge that represent materialization.  Execute one subtree at a time. 2.Order the execution of the subtree  Bottom-top  Left-to-right 3.All nodes of each sub-tree are executed simultaneously.

36 Summary of Chapter 16 In this part of the presentation I will talk about the main topics of Chapter 16.

37 COMPILATION OF QUERIES  Compilation means turning a query into a physical query plan, which can be implemented by query engine.  Steps of query compilation :  Parsing  Semantic checking  Selection of the preferred logical query plan  Generating the best physical plan

38 THE PARSER  The first step of SQL query processing.  Generates a parse tree  Nodes in the parse tree corresponds to the SQL constructs  Similar to the compiler of a programming language

39 VIEW EXPANSION  A very critical part of query compilation.  Expands the view references in the query tree to the actual view.  Provides opportunities for the query optimization.

40 SEMANTIC CHECKING  Checks the semantics of a SQL query.  Examines a parse tree.  Checks :  Attributes  Relation names  Types  Resolves attribute references.

41 CONVERSION TO A LOGICAL QUERY PLAN  Converts a semantically parsed tree to a algebraic expression.  Conversion is straightforward but sub queries need to be optimized.  Two argument selection approach can be used.

42 ALGEBRAIC TRANSFORMATION  Many different ways to transform a logical query plan to an actual plan using algebraic transformations.  The laws used for this transformation :  Commutative and associative laws  Laws involving selection  Pushing selection  Laws involving projection  Laws about joins and products  Laws involving duplicate eliminations  Laws involving grouping and aggregation

43 ESTIMATING SIZES OF RELATIONS  True running time is taken into consideration when selecting the best logical plan.  Two factors the affects the most in estimating the sizes of relation :  Size of relations ( No. of tuples )  No. of distinct values for each attribute of each relation  Histograms are used by some systems.

44 COST BASED OPTIMIZING  Best physical query plan represents the least costly plan.  Factors that decide the cost of a query plan :  Order and grouping operations like joins, unions and intersections.  Nested loop and the hash loop joins used.  Scanning and sorting operations.  Storing intermediate results.

45 PLAN ENUMERATION STRATEGIES  Common approaches for searching the space for best physical plan.  Dynamic programming : Tabularizing the best plan for each sub expression  Selinger style programming : sort-order the results as a part of table  Greedy approaches : Making a series of locally optimal decisions  Branch-and-bound : Starts with enumerating the worst plans and reach the best plan

46 LEFT-DEEP JOIN TREES  Left – Deep Join Trees are the binary trees with a single spine down the left edge and with leaves as right children.  This strategy reduces the number of plans to be considered for the best physical plan.  Restrict the search to Left – Deep Join Trees when picking a grouping and order for the join of several relations.

47 PHYSICAL PLANS FOR SELECTION  Breaking a selection into an index-scan of relation, followed by a filter operation.  The filter then examines the tuples retrieved by the index-scan.  Allows only those to pass which meet the portions of selection condition.

48 PIPELINING VERSUS MATERIALIZING  This flow of data between the operators can be controlled to implement “ Pipelining “.  The intermediate results should be removed from main memory to save space for other operators.  This techniques can implemented using “ materialization “.  Both the pipelining and the materialization should be considered by the physical query plan generator.  An operator always consumes the result of other operator and is passed through the main memory.

49 Questions & Answers

For your attention

51 Reference [1] H. Garcia-Molina, J. Ullman, and J. Widom, “Database System: The Complete Book,” second edition: p , Prentice Hall, New Jersey, 2008