Download presentation
Presentation is loading. Please wait.
Published bySugiarto Hartanto Modified over 5 years ago
1
Slides adapted from Donghui Zhang, UC Riverside
JOIN PROCESSING E0 261 Prasad Deshpande, Jayant Haritsa Computer Science and Automation Indian Institute of Science Slides adapted from Donghui Zhang, UC Riverside
2
Today’s Paper Computing joins in systems with “large” amounts of main memory From |S| to |S| ACM Transactions on Database Systems Vol. 11, No. 3, Sep 1986 Basic Issues IO and computational costs How to use available memory to minimize these costs
3
External Sort Merge Join
Average run length = 2*|S|
4
External Sort-Merge Join (cont.)
Optimization: omit the final pass of merge sort by pipelining the sort result to join; If buffer size , can sort by reading R and S twice; E.g. page size=8KB, each relation has 10,000 pages (80MB), buffer size=100 pages (<1MB), two passes are enough.
5
Cost of Sort-Merge Make use of any extra memory beyond |S| to save IO
6
Classic Hash Join Works when the smaller relation R fits in memory. Build a in-memory hash table for the smaller relation; For each record in the larger relation, probe the hash table.
7
Simple Hash Join for each logical bucket j for each record r in R
if r is in bucket j then insert r into the hash table; for each record s in S if s is in bucket j then probe the hash table; Classic hash join is a special case, with one bucket; Optimization: write the tuples not in bucket j to disk; Works good when memory is large (nearly as large as |R|).
8
Simple Hash Number of passes =
9
Cost of Simple Hash
10
GRACE Hash Join partition R into n buckets so that each bucket fits in memory; partition S into n buckets; for each bucket j do for each record r in Rj do insert into a hash table; for each record s in Sj do probe the hash table. Works good when memory is small.
11
Grace Hash
12
Cost of Grace Hash
13
Hybrid Hash Join Hybrid of simple hash join and GRACE;
When partitioning R, keep the records of the first bucket in memory as a hash table; When partitioning S, for records of the first bucket, probe the hash table directly; Saving: no need to write R1 and S1 to disk or read back to memory. Works good for large and small memory.
14
Hybrid Hash Join
15
Hybrid Hash Join Cost q=
16
Comparison Hybrid dominates simple hash Hybrid dominates GRACE hash
Grace dominates Sort-Merge In terms of computation cost
17
Handle Partition Overflow
Case 1, overflow on disk: an R partition is larger than memory size (note: don’t care about the size of S partitions). Solution (a) small partitions first and combine before join; Solution (b) recursive partition. Case 2, overflow in memory: the in-memory hash table of R becomes too large. Solution: revise the partitioning scheme and keep a smaller partition in memory.
18
Conclusions Addressed equi-join problem in the external-memory environment; With decreasing cost of memory, hash-based join is better than nested-loop and sort-merge joins; Proposed three hash-based algorithms (simple hash join, GRACE join and hybrid join), out of which the hybrid hash join is the best.
19
END JOINS E0 261
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.