Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.

Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park

Outline  Overview(already covered)  Measures of Query Cost(already covered)  Selection Operation(already covered)  Sorting(already covered)  Join Operation  Other Operations  Evaluation of Expressions

Join Operation  Several different algorithms to implement joins  Nested-loop join  Block nested-loop join  Indexed nested-loop join  Merge-join  Hash-join  Choice based on cost estimate  Examples use the following information  Number of records of student5,000takes10,000  Number of blocks of student100takes400

Nested-Loop Join (1/2)  To compute the theta join r  s for each tuple t r in r do begin for each tuple t s in s do begin test pair (t r, t s ) to see if they satisfy the join condition  if they do, add t r t s to the result end end  r is called the outer relation and s the inner relation of the join  Requires no indices and can be used with any kind of join condition  Expensive since it examines every pair of tuples in the two relations

Nested-Loop Join (2/2)  In the worst case, if the buffer can hold only one block of each relation, the estimated cost is n r  b s + b r block accesses  If the smaller relation fits entirely in memory, use that as the inner relation. The cost is reduced to b r + b s block accesses  Assuming worst case memory availability, cost estimate is  5000  400 + 100 = 2,000,100 block accesses with student as outer relation  10000  100 + 400 = 1,000,400 block accesses with takes as outer relation  If smaller relation (student) fits entirely in memory, the cost estimate will be 500 block accesses  Block nested-loop join algorithm (next slide) is preferable

Block Nested-Loop Join  Variant of nested-loop join in which every block of inner relation is paired with every block of outer relation for each block B r of r do begin for each block B s of s do begin for each tuple t r in B r do begin for each tuple t s in B s do begin Check if (t r, t s ) satisfy the join condition if they do, add t r t s to the result end end end end  Worst case estimate: b r  b s + b r block accesses

Indexed Nested-Loop Join  Index lookups can replace file scans if  Join is an equi-join or natural join and  An index is available on the inner relation’s join attribute  For each tuple t r in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple t r  Cost of the join: b r + n r  c  Where c is the cost of a single selection using the join condition  If indices are available on join attributes of both r and s, use the relation with fewer tuples as the outer relation

Merge-Join (1/2)  Sort both relations on their join attribute (if not already sorted on the join attributes)  Merge the sorted relations to join them

Merge-Join (2/2)  Can be used only for equi-joins and natural joins  Each block needs to be read only once  Thus number of block accesses for merge-join is b r + b s + the cost of sorting if relations are unsorted

Hash-Join (1/2)  Applicable for equi-joins and natural joins  A hash function h is used to partition tuples of both relations  h maps JoinAttrs values to {0,1,…,n} where JoinAttrs denotes the common attributes of r and s used in the natural join  r 0, r 1,..., r n denote partitions of r tuples  s 0, s 1..., s n denote partitions of s tuples  r tuples in r i need only to be compared with s tuples in s i

Hash-Join (2/2)

Other Operations  Duplicate elimination can be implemented via sorting or hashing  On sorting, duplicates will come adjacent to each other  Hashing is similar – duplicates will come into the same bucket  Projection is implemented by performing projection on each tuple, which gives a relation that could have duplicate records, and then removing duplicate records  Aggregation can be implemented in a manner similar to duplicate elimination  Sorting or hashing can be used to bring tuples in the same group together, and then the aggregate functions can be applied on each group

Evaluation of Expressions  So far, we have studied how individual operations are carried out; now we consider how to evaluate an expression containing multiple operations  The obvious way is simply to evaluate one operation at a time, in an appropriate order; the result of each evaluation is materialized in a temporary relation for subsequent use  A disadvantage to this method is the need to build the temporary relation, which (unless they are small) must be written to disk  An alternative approach is to evaluate several operations simultaneously in a pipeline with the results of one operation passed on to the next, without the need to store a temporary relation

Materialization  Evaluate one operation at a time, starting at the lowest-level; store intermediate results into temporary relations to evaluate next-level operations  Consider the expression: ∏ name (  building = “Watson” (department) instructor)

Pipelining  Evaluate several operations simultaneously, passing the results of one operation to the next  Consider the previous example again:  Don’t store result of  building=“Watson” (department)  Instead, pass tuples directly to the join  Similarly, don’t store result of join, pass tuples directly to the projection  Much cheaper than materialization  Pipelining may not always be possible

Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.

Similar presentations

Presentation on theme: "Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.

Similar presentations

Presentation on theme: "Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park."— Presentation transcript:

Similar presentations

About project

Feedback