Download presentation
Presentation is loading. Please wait.
Published byRiitta Toivonen Modified over 5 years ago
1
Join Implementation How is it done? Copyright © Curt Hill
2
Introduction We should have seen the join from relational algebra
We now consider how the join works when using either a BTree or Hash index Copyright © Curt Hill
3
Ways to Think of a Join We have usually considered the Join as a sequence of algebra operations Cartesian product Selection Optional project This is not always the best way, especially from an implementation perspective The best alternative is the zipper view Copyright © Curt Hill
4
The Zipper Approach Consider two files
Faculty The key is naid Schedule The key is dept, number, section Another candidate key is naid, time The join fields are naid If both files are sorted on the join field the join resembles a Match-Merge Copyright © Curt Hill
5
Sorted The default index for a table is usually a BTree
With a BTree the leaves are in the primary key’s sorted order If the primary key is not what is being looked at then either the table may be sorted or a secondary key may be used Now consider the match merge Copyright © Curt Hill
6
Match Merge The match merge is the means of updating a sorted master file with sorted transactions Of course, both sorted on the same kind of key This was well understood since the 1950s or before The action to perform is based on the relationship of the master to transaction keys Copyright © Curt Hill
7
Actions Read in one item from both, then do the following until done:
Transaction = Master Update the master Get new transaction Master < Transaction Write old master Read a new master Transaction < Master Declare an error Read new transaction Copyright © Curt Hill
8
Revisited The idea of the match merge is to make one pass through a master file and transaction file to do an update Contrast with Cartesian Product This only works if both are sorted and by the same key The same thing will work in database if both tables have an index for the joined field We will consider the SQL for creating indices later Copyright © Curt Hill
9
Zipper Join Picture Faculty Schedule 1024 a 1024 r 1024 s 1024 t
1092 v 1092 b 1092 w 1233 c 1279 x 1279 d 1279 y 1279 z Copyright © Curt Hill
10
Inners and Outers The last picture suggests two types of joins: inner and outer What we have considered so far is the inner join Only things that match on key are worth considering However, those things in either relation that match nothing in the other are also interesting This is the outer joins Copyright © Curt Hill
11
Continuing If both files are sorted on the join the previously mentioned zipper join is the best one to use However, if the join field is not the primary key sorting the relation on this field it may be expensive if Especially so if the outer join is larger than an inner join The number of joined records is small compared to either relation size Copyright © Curt Hill
12
Hash Join Recall that a Cartesian Product makes all possible combinations of records from two relations This could mean reading all of the blocks multiple times That is exactly what we want to avoid Hash join partitions two relations into pieces based on a hash function Then only joins partitions that reacted similarly to the hash function Of course, only works on Equi-Joins Copyright © Curt Hill
13
Process Hash the smaller of the two files on the join field
Read in the other file Hash each key into a bucket The only candidates for equality are here Produce the output Smaller but still substantial Copyright © Curt Hill
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.