Yan Huang - CSCI5330 Database Implementation – Access Methods Query Processing II 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Yan Huang - CSCI5330 Database Implementation – Access Methods Join Operation Several different algorithms to implement joins Block nested-loop join Indexed nested-loop join Merge-join Hash-join 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Block Nested-Loop Join for each block Br of r do begin for each block Bs of s do begin for each tuple tr in Br do begin for each tuple ts in Bs do begin Check if (tr,ts) satisfy the join condition if they do, add tr • ts to the result. end end end end r :outer relation s: inner relation cost = br bs + br, Assuming one buffer for each 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Block Nested-Loop Join r … 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Block Nested-Loop Join Requires no indices Can be used with any join condition cost = br bs + br Which relation shall be out relations? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Allocate More Buffers to Outer Loop Cost? s r … 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Allocate More Buffers to Inner Loop Cost? r s How to allocate M buffers to inner loop/out loop/result? … … 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Indexed Nested-Loop Join Index available on the inner relation’s join attribute Cost = br + nr c Where c is the cost of traversing index and fetching all matching s tuples for one tuple or r Index available on the outer relation’s join attribute We do not use it, can we? r … s … … 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Indices available on join attributes of both r and s Which one should be the out loop? r … s … … 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Yan Huang - CSCI5330 Database Implementation – Access Methods Exercise Compute depositor customer, with depositor as the outer relation. Customer has a secondary B+-tree index on customer-name Blocking factor 20 keys #customer = 400b/10,000t #depositor =100b/5,000t Block nested loops 400*100 + 100 = 40,100 Indexed nested loops 100 + 5000 * 5 = 25,100 disk accesses. CPU cost likely to be less than that for block nested loops join 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Merge-Join Sort both relations on their join attribute (if not already sorted on the join attributes). Merge the sorted relations to join them Join step is similar to the merge stage of the sort-merge algorithm. Every pair with same value on join attribute must be matched Cost = br + bs + the cost of sorting if relations are unsorted 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
? Merge-Join But there is a problem! s r A, 1 A, 2 A, 3 B, 1 B, 2 C,1 D, 1 A, 5 A, 3 A, 1 A, 7 A, 2 C,1 C, 2 C, 3 ? We assume records with same value are in the same block! 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Yan Huang - CSCI5330 Database Implementation – Access Methods Merge-Join Can be used only for equi-joins and natural joins Why not on inequality, great than, less than joins? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Yan Huang - CSCI5330 Database Implementation – Access Methods Exercise Compute depositor customer, with depositor as the outer relation. Customer has a secondary B+-tree index on customer-name Blocking factor 20 keys #customer = 400b/10,000t #depositor =100b/5,000t Merge join log(400) + log(100) + 400 + 100 = 516 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Yan Huang - CSCI5330 Database Implementation – Access Methods Hybrid Merge-join If one relation is sorted, and the other has a secondary B+-tree index on the join attribute Merge the sorted relation with the leaf entries of the B+-tree . Sort the result on the addresses of the unsorted relation’s tuples Scan the unsorted relation in physical address order and merge with previous result, to replace addresses by the actual tuples Sequential scan more efficient than random lookup 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
Yan Huang - CSCI5330 Database Implementation – Access Methods Hybrid Merge-join r … s … … 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods