Download presentation
Presentation is loading. Please wait.
1
Inspector Joins IC-65 Advances in Data Management Systems 1 Inspector Joins By Shimin Chen, Anastassia Ailamaki, Phillip, and Todd C. Mowry VLDB 2005 Rammohan Narendula
2
Inspector Joins IC-65 Advances in Data Management Systems 2 Introduction Query execution is I/O bound- so most of the research concentrates on main memory Goal- reduce no. of page faults thus reduce no. of disk I/Os However, hash join is a special class of techniques where hash-join becomes CPU bound given sufficient I/O bandwidth and employing Advanced I/O techniques (I/O prefetching) Goal- reduce no. of cache misses
3
Inspector Joins IC-65 Advances in Data Management Systems 3 Exploiting Information about Data Ability to improve query depends on information quality General stats on relations are inadequate –May lead to incorrect decisions for specific queries –Especially true for join queries Previous approaches exploiting dynamic information –Collecting information from previous queries Multi-query optimization [Sellis’88] Materialized views [Blakeley et al. 86] Join indices [Valduriez’87] –Dynamic re-optimization of query plans [Kabra&DeWitt’98] [Markl et al. 04] This study exploits the inner structure of hash joins
4
Inspector Joins IC-65 Advances in Data Management Systems 4 Exploiting Multi-Pass Structure of Hash Joins Idea: –Examine the actual data in I/O partitioning phase –Extract useful information to improve join phase I/O Partitioning Join Extra information greatly helps phase 2 Inspection
5
Inspector Joins IC-65 Advances in Data Management Systems 5 Using Extracted Information Enable a new join phase algorithm –Reduce the primary performance bottleneck in hash joins i.e. Poor CPU cache performance –Optimized for multi-processor systems Choose the most suitable join phase algorithm for special input cases I/O Partitioning decide Cache Partitioning Cache Prefetching Simple Hash Join Inspection Join Phase New Algorithm Extracted Information
6
Inspector Joins IC-65 Advances in Data Management Systems 6 Outline Motivation Previous hash join algorithms Hash join performance on SMP systems Inspector join Experimental results Conclusions
7
Inspector Joins IC-65 Advances in Data Management Systems 7 Hash Table Join Phase: (simple hash join) –Build hash table, then probe hash table GRACE Hash Join I/O Partitioning Phase: –Divide input relations into partitions with a hash function Build Probe Build Probe Random memory accesses cause poor CPU cache performance Over 70% execution time stalled on cache misses!
8
Inspector Joins IC-65 Advances in Data Management Systems 8 Cache Partitioning Recursively produce cache-sized partitions after I/O partitioning Avoid cache misses when joining cache-sized partitions Overhead of re-partitioning Build Probe Memory-sized Partitions Cache-sized Partitions
9
Inspector Joins IC-65 Advances in Data Management Systems 9 Cache Prefetching Reduce impact of cache misses –Exploit available memory bandwidth –Overlap cache misses and computations –Insert cache prefetch instructions into code Still incurs the same number of cache misses Hash Table Probe Build
10
Inspector Joins IC-65 Advances in Data Management Systems 10 Outline Motivation Previous hash join algorithms Hash join performance on SMP systems Inspector join Experimental results Conclusions
11
Inspector Joins IC-65 Advances in Data Management Systems 11 Hash Joins on SMP Systems Previous studies mainly focus on uni-processors Memory bandwidth is precious –It becomes the bottleneck in cache-prefetching techniques Each processor joins a pair of partitions in join phase Main Memory Shared bus Cache CPU Cache CPU Cache CPU Cache CPU Build 1 Probe 1 Build 4 Probe 4 Build 2 Probe 2 Build 3 Probe 3
12
Inspector Joins IC-65 Advances in Data Management Systems 12 Inspector Joins Extracted information: summary of matching relationships –Every K contiguous pages in a build partition forms a sub-partition –Tells which sub-partition(s) every probe tuple matches Build Partition Sub-partition 0 Sub-partition 1 Sub-partition 2 Probe Partition I/O Partitioning Join Summary of Matching Relationship
13
Inspector Joins IC-65 Advances in Data Management Systems 13 Cache-Stationary Join Phase Recall cache partitioning: re-partition cost I/O Partitioning Join Build Partition Probe Partition Hash Table CPU Cache We want to achieve zero copying Copying cost
14
Inspector Joins IC-65 Advances in Data Management Systems 14 Cache-Stationary Join Phase Joins a sub-partition and its matching probe tuples Sub-partition is small enough to fit in CPU cache Cache prefetching for the remaining cache misses Zero copying for generating recursive cache-sized partitions I/O Partitioning Join Build Partition Probe Partition Hash Table CPU Cache Sub-partition 0 Sub-partition 1 Sub-partition 2
15
Inspector Joins IC-65 Advances in Data Management Systems 15 Filters in I/O Partitioning How to extract the summary efficiently? Extend filter scheme in commercial hash joins Conventional single-filter scheme –Represent all build join keys –Filter out probe tuples having no matches Build Relation Filter Mem-sized Partitions Construct Test I/O Partitioning Join Probe Relation
16
Inspector Joins IC-65 Advances in Data Management Systems 16 Background: Bloom Filter A bit vector –A key is hashed d (e.g. d=3) times and represented by d bits Construct: for every build join key, set its 3 bits in vector Test: given a probe join key, check if all its 3 bits are 1 –Discard the tuple if some bits are 0 –May have false positives 00011100011001000001 Bit 0 =H 0 (key)Bit 1 =H 1 (key)Bit 2 =H 2 (key) Filter
17
Inspector Joins IC-65 Advances in Data Management Systems 17 Multi-Filter Scheme Single filter: a probe tuple entire build relation Our goal: a probe tuple sub-partitions Construct a filter for every sub-partition Replace a single large filter with multiple small filters Single Filter Build Relation Partition 0 Partition 1 Partition 2 Sub0,0 Sub0,1 Sub0,2 Sub1,0 Sub1,1 Sub1,2 Sub2,0 Sub2,1 Sub2,2 Multi-Filter I/O Partitioning Join
18
Inspector Joins IC-65 Advances in Data Management Systems 18 Testing Multi-Filters When partitioning the probe relation Test a probe tuple against all the filters of a partition Tells which sub-partition(s) the tuple may have matches Store summary of matching relationships in partitions –This information is used to extract probe tuples in the order of partition IDs. A special array is constructed using count sort technique for this purpose. Probe Relation Partition 0 Partition 1 Partition 2 Multi- Filter Test I/O Partitioning Join
19
Inspector Joins IC-65 Advances in Data Management Systems 19 Cont’d… Extracting probe tuple information for every sub-partition using counting sort –One array for each sub partition. Size of the array is number of matching probe tuples for that partition. –The tuples are never visited or copied in the coutning sort. Joining pair of build and probe sub-partitions
20
Inspector Joins IC-65 Advances in Data Management Systems 20 Minimizing Cache Misses for Testing Filters Single filter scheme: –Compute 3 bit positions –Test 3 bits Multi-filter scheme: if there are S sub-partitions in a partition –Compute 3 bit positions –Test the same 3 bits for every filter, altogether 3*S bits May cause 3*S cache misses ! Test Probe Relation Partition 0 Partition 1 Partition 2 Multi- Filter 001 111 011 S filters
21
Inspector Joins IC-65 Advances in Data Management Systems 21 Vertical Filters for Testing Bits at the same position are contiguous in memory 3 cache misses instead of 3*S cache misses! Horizontal vertical conversion after partitioning build relation – Very small overhead in practice Probe Relation Partition 0 Partition 1 Partition 2 Test 0 0 1 1 1 1 0 1 1 S filters Contiguous in memory I/O Partitioning Join
22
Inspector Joins IC-65 Advances in Data Management Systems 22 Outline Motivation Previous hash join algorithms Hash join performance on SMP systems Inspector join Experimental results Conclusions
23
Inspector Joins IC-65 Advances in Data Management Systems 23 Experimental Setup Relation schema: 4-byte join attribute + fixed length payload No selection, no projection 50MB memory per CPU available for the join phase Same join algorithm run on every CPU joining different partitions Detailed cycle-by-cycle simulations –A shared-bus SMP system with 1.5GHz processors –Memory hierarchy is based on Itanium 2 processor
24
Inspector Joins IC-65 Advances in Data Management Systems 24 Partition Phase Wall-Clock Time I/O partitioning can take advantage of multiple CPUs –Cut input relations into equal-sized chunks –Partition one chunk on every CPU –Concatenate outputs from all CPUs Enhanced cache partitioning: cache partitioning + advanced prefetching Inspection incurs very small overhead –Ratio of execution time with best algo- 0.88 to 0.94 –Mainly computation cost of converting horizontal filters to vertical and testing GRACE Cache prefetching Cache partitioning Enhanced cache partitioning Inspector join 500MB joins 2GB 100B tuples, 4B keys 50% probe tuples no matches A build matches 2 probe tuples Number of CPUs used
25
Inspector Joins IC-65 Advances in Data Management Systems 25 Join Phase Aggregate Time Inspector join achieves significantly better performance when 8 or more CPUs are used –Because of local optimization + catch prefetching –1.7-2.1X speedups over cache prefetching Memory B/W becomes bottleneck when more no of processors are used –1.6-2.0X speedups over enhanced cache partitioning 500MB joins 2GB 100B tuples, 4B keys 50% probe tuples no matches A build matches 2 probe tuples Number of CPUs used GRACE Cache prefetching Cache partitioning Enhanced cache partitioning Inspector join
26
Inspector Joins IC-65 Advances in Data Management Systems 26 Results on Choosing Suitable Join Phase Case #1: a large number of duplicate build join keys –Choose enhanced cache partitioning –When a probe tuple on average matches 4 or more sub-partitions Case #2: nearly sorted input relations –Surprisingly: cache-stationary join is very good I/O Partitioning decide Cache Partitioning Cache Prefetching Simple Hash Join Inspection Join Phase Cache Stationary Extracted Info
27
Inspector Joins IC-65 Advances in Data Management Systems 27 Conclusions Exploit multi-pass structure for higher quality info about data Achieve significantly better cache performance –1.6X speedups over previous cache-friendly algorithms –When 8 or more CPUs are used Choose most suitable algorithms for special input cases Idea may be applicable to other multi-pass algorithms
28
Inspector Joins IC-65 Advances in Data Management Systems 28 Thank You !
29
Inspector Joins IC-65 Advances in Data Management Systems 29 Previous Algorithms on SMP Systems Join phase performance of joining a 500MB and a 2GB relations (details later in the talk) Aggregate performance degrades dramatically over 4 CPUs Reduce data movement (memory to memory, memory to cache) Wall clock timeAggregate time on all CPUs Re-partition cost Bandwidth- sharing
30
Inspector Joins IC-65 Advances in Data Management Systems 30 More Details in Paper Moderate memory space requirement for filters Summary information representation in intermediate partitions Preprocessing for cache-stationary join phase Prefetching for improving efficiency and robustness
31
Inspector Joins IC-65 Advances in Data Management Systems 31 Partition Phase Wall-Clock Time I/O partitioning can take advantage of multiple CPUs –Cut input relations into equal-sized chunks –Partition one chunk on every CPU –Concatenate outputs from all CPUs Inspection incurs very small overhead 500MB joins 2GB 100B tuples, 4B keys 50% probe tuples no matches A build matches 2 probe tuples Number of CPUs used GRACE Cache prefetching Cache partitioning Inspector join
32
Inspector Joins IC-65 Advances in Data Management Systems 32 Join Phase Aggregate Time Inspector join achieves significantly better performance when 8 or more CPUs are used –1.7-2.1X speedups over cache prefetching –1.6-2.0X speedups over enhanced cache partitioning 500MB joins 2GB 100B tuples, 4B keys 50% probe tuples no matches A build matches 2 probe tuples Number of CPUs used GRACE Cache prefetching Cache partitioning Inspector join
33
Inspector Joins IC-65 Advances in Data Management Systems 33 CPU-Cache-Friendly Hash Joins Recent studies focus on CPU cache performance –I/O partitioning gives good I/O performance –Random memory accesses cause poor CPU cache performance Cache Partitioning [Shatdal et al. 94] [Boncz et al.’99] [Manegold et al.’00] –Recursively produce cache-sized partitions from memory-sized partitions –Avoid cache misses during join phase –Pay re-partitioning cost Cache Prefetching [Chen et al. 04] –Exploit memory system parallelism –Use prefetches to overlap multiple cache misses and computations Hash Table Probe Build
34
Inspector Joins IC-65 Advances in Data Management Systems 34 Example Special Input Cases Example case #1: a large number of duplicate build join keys –Count the average number of sub-partitions a probe tuple matches –Must check the tuple against all possible sub-partitions –If too large, cache stationary join works poorly Example case #2: nearly sorted input relations –A merge-based join phase might be better? Build Partition Probe Partition Sub-partition 0 Sub-partition 1 Sub-partition 2 A probe tuple
35
Inspector Joins IC-65 Advances in Data Management Systems 35 Varying Number of Duplicates per Build Join Key Join phase aggregate performance Choose enhanced cache part –When a probe tuple on average matches 4 or more sub-partitions
36
Inspector Joins IC-65 Advances in Data Management Systems 36 Nearly Sorted Cases Sort both input relations, then randomly move 0%-5% of tuples Join phase aggregate performance Surprisingly: cache-stationary join is very good –Even better than merge join when over 1% tuples are out-of-order
37
Inspector Joins IC-65 Advances in Data Management Systems 37 Analyzing Nearly Sorted Case Partitions are also nearly sorted Probe tuples matching a sub-partition are almost contiguous Similar memory behavior as merge join No cost for sorting out-of-order tuples Build Partition Probe Partition Sub-partition 0 Sub-partition 1 Sub-partition 2 A probe tuple Nearly Sorted
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.