Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.

Query Execution Optimizing Performance

Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional processing to supply all of the missing details. In practice, an SQL query is translated internally into a relational algebra expression. One advantage of using relational algebra is that it makes alternative forms of a query easier to explore. The different algebraic expressions for a query are called logical query plans. We will focus first on the methods for executing the operations of the relational algebra. Then we will focus on how transform logical query plans.

Preview Parsing: read SQL, output relational algebra tree Query rewrite: Transform tree to a form, which is more efficient to evaluate Physical plan generation: select implementation for each operator in tree, and for passing results up the tree. In this chapter we will focus on the implementation for each operator.

Relational Algebra recap RA union, intersection and difference correspond to UNION, INTERSECT, and EXCEPT in SQL Selection  corresponds to the WHERE -clause in SQL Projection  corresponds to SELECT -clause Product  corresponds to FROM -clause Join’s corresponds to JOIN, NATURAL JOIN, and OUTER JOIN in the SQL2 standard Duplicate elimination  corresponds to DISTINCT in SELECT -clause Grouping  corresponds to GROUP BY Sorting  corresponds to ORDER BY

A graphical picture A query plan is a decomposition of the original SQL into an expression tree. What is the initial plan? Is it the best? 5 π title, birthdate σ year=1996 AND gender = “F” AND starName = name × MovieStarStarsIn SELECT title, birthdate FROM MoviewStar, StarsIN WHERE year=1996 AND gender='F' AND starName=name;

A better plan 6 π title, birthdate  starName = name MovieStarStarsIn σ gender = “F” σ year=1996

Relational algebra for real SQL Keep in mind the following fact: - A relation in algebra is a set, while a relation in SQL is probably a bag - In short, a bag allows duplicates. - Not surprisingly, this can effect the cost of related operations.

Bag union, intersection, and difference Card(t,R) means the number of occurrences of tuple t in relation R Card(t, R  S) = Card(t,R) + Card(t,S) Card(t,R  S) = min{Card(t,R), Card(t,S)} Card(t,R–S) = max{Card(t,R)–Card(t,S), 0} Example: R= {A,B,B}, S = {C,A,B,C} R  S = {A,A,B,B,B,C,C} R  S = {A,B} R – S = {B}

Physical operators Physical query plans are built from physical operators. - In most cases, the physical operators are direct implementations of the relational algebra operators. However, there are several other physical operators for various supporting tasks. 1.Table-scan (the most basic operation we want to perform in a physical query plan) 2.Index-scan (E.g. if we have an index on some relation R we can retrieve the blocks of R by using the index) 3.Sort-scan (takes a relation R and a specification of the attributes on which the sort is to be made, and produces R in sorted order)

Computational Model When comparing algorithms for the same operations we do not consider the cost of writing the output. - Because the cost of writing the output on the disk depends on the size of the result, not on the way the result was computed. In other words, it is the same for any computational alternative. - Also, we can often pipeline the result to other operators when the result is constructed in main memory. So a final output phase may not even be required.

Iterators for Implementation of Physical Operators This is a group of three functions that allow a consumer of the result of a physical operation to get the result one tuple at a time. An iterator consists of three parts: Open: Initializes data structures. Doesn’t return tuples GetNext: Returns next tuple & adjusts the data structures Close: Cleans up afterwards We assume these to be overloaded names of methods.

Iterator for tablescan operator Open(R) { b := the first block of R; t := the first first tuple of block b; Found := TRUE; } GetNext(R) { IF (t is past the last tuple on block b) { increment b to the next block; IF (there is no next block) { Found := FALSE; RETURN; } ELSE /*b is a new block*/ t := first tuple on block b; } oldt := t; /*Now we are ready to return t and increment*/ increment t to the next tuple of b; RETURN oldt; } Close(R) {}

Iterator for Bag Union of R and S Open(R,S) { R.open(); CurRel := R; } GetNext(R,S) { IF (CurRel = R) { t := R.GetNext(); IF(Found) /*R is not exhausted*/ RETURN t; ELSE /*R is exhausted*/ { S.Open(); CurRel := S; } /*Here we read from S*/ RETURN S.GetNext(); /*If s is exhausted Found will be set to FALSE by S.GetNext */ } Close(R,S) { R.Close(); S.Close() }

Iterator for sort-scan In an iterator for sort-scan –Open has to do all of 2PMMS, except the merging –GetNext outputs the next tuple from the merging phase

Algorithms for implementing RA-operators Classification of algorithms –Sort based methods –Hash based methods –Index based methods Degree of difficultness of algorithms –One pass (when one relation can fit into main memory) –Two pass (when no relation can fit in main memory, but again the relations are not very extremely large) –Multi pass (when the relations are very extremely large) Classification of operators –Tuple-at-a-time, unary operations( ,  ) –Full-relation, unary operations ( ,  ) –Full-relation, binary operations (union, join,…)

Cost parameters In order to evaluate or estimate the cost of query resolution, we need to identify the relevant parameters. Typical cost parameters include: - R = the relation on disk - M = number of main memory buffers available (1buffer = 1block) - B(R) = number of blocks of R - T(R) = number of tuples of R - V(R, a) = number of distinct values in column a of R - V(R, L) = number of different tuples in R (where L is a list of attributes or columns) Simple cost estimate: - Basic scan: B(R) - 2PMMS: 3B(R) Recall that final output is not counted

One pass, tuple-at-a-time Selection and projection Cost = B(R) or T(R) (if the relation is not clustered) Space requirement: M  1 block Principle: –Read one block (or one tuple if the relation is not clustered) at a time –Filter in or out the tuples of this block.

One pass, unary full-relation operations Duplicate elimination: for each tuple decide: seen before: ignore new: output Principle: –It is the first time we have seen this tuple, in which case we copy it to the output. –We have seen the tuple before,in which case we must not output this tuple. We need a Main Memory hash-table to be efficient. Requirement: 

Grouping: Accumulate the information on groups in main memory. Need to keep one entry for each value of the grouping attributes, through a main memory search structure (hash table). Then, we need for each group to keep an aggregated value (or values if the query asks for more than one aggregation). –For MIN/MAX we keep the min or max value seen so far for the group. –For COUNT aggregation we keep a counter which is incremented each time we encounter a tuple belonging to the group. –For SUM, we add the value if the tuple belongs to the group. –For AVG? MM requirement. Typically, a (group) tuple will be smaller than a tuple of the input relation, Typically, the group number will be smaller than the number of tuples in the input relation. This is their number: How you would do an iterator for grouping? One pass, unary full-relation operations

One pass, binary operators Requirement: min(B(R),B(S)) ≤ M Exception: bag union Cost: B(R) + B(S) Assume R is larger than S. How to perform the operations below: –Set union, set intersection, set difference –Bag intersection, bag difference –Cartesian product, natural join All these operators require reading the smaller of the relations into main memory using there a search scheme (e.g. main memory hash table) for easy search and insertion.

Set Union Let R and S be sets. We read S into M-1 buffers of main memory. All these tuples are also copied to the output. We then read each block of R into the M th buffer, one at a time. For each tuple t of R we see if t is in S, and if not, we copy t to output.

Set Intersection Let R and S be sets or bags. The result will be set. We read S into M-1 buffers of main memory. We then read each block of R into the M-th buffer, one at a time. For each tuple t of R we see if t is in S, and if so, we copy t to output. At the same time we delete t from S in Main Memory.

Set Difference Let R and S be sets. Since difference is not a commutative operator, we must distinguish between R-S and S-R. Read S into M-1 buffers of main memory. Then read each block of R into the M th buffer, one at a time. To compute R-S: for each tuple t of R we see if t is not in S, and if so, we copy t to output. To compute S-R: for each tuple t of R we see if t is is in S, we delete t from S in such a case. At the end we output those tuples of S that remain.

Bag Intersection Let R and S be bags. Read S into M-1 buffers of main memory. Also, associate with each tuple a count, which initially measures the number of times the tuple occurs in S. Then read each block of R into the M-th buffer, one at a time. For each tuple t of R we see if t is in S. If not we ignore it. Otherwise, if the counter is more than zero, we output t and decrement the counter.

Bag Difference We read S into M-1 buffers of main memory. Also, we associate with each tuple a count, which initially measures the number of times the tuple occur in S. We then read each block of R into the M-th buffer, one at a time. To compute S-R: for each tuple t of R we see if t is is in S, we decrement its counter. At the end we output those tuples of S that remain with counter positive. To compute R-S: we may think of the counter c for tuple t as having c reasons to not output t. Now, when we process a tuple of R we check to see if that tuple appears in S. If not we output t. Otherwise, we check to see the counter c of t. If it is 0 we output t. If not, we don’t output t, and we decrement c.

Product We read S into M-1 buffers of main memory. No special structure is needed. We then read each block of R into the M-th buffer, one at a time. And combine each tuple with all the tuples of S.

Natural Join We read S into M-1 buffers of main memory and build a search structure where the search key is the shared attributes of R and S. We then read each block of R into the M-th buffer, one at a time. For each tuple t of R we see if t is in S, and if so, we copy t to output.

Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.

Similar presentations

Presentation on theme: "Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.

Similar presentations

Presentation on theme: "Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional."— Presentation transcript:

Similar presentations

About project

Feedback