Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz Data Distilleries B.V. Amsterdam The Netherlands Stefan Manegold Martin Kersten CWI Amsterdam The Netherlands
2 Contents How Memory Access works Simple Scan Experiment Consequences for DBMS –Data Structures: vertical decomposition –Algorithms: tune random memory access Partitioned Join Algorithms –Monet Experiments –Accurate Cost Models Conclusion
3 CPU Speed vs. Memory Speed Moore’s Law: CPU speed doubles every 3 years
4 Memory Access in Hierarchical Systems
5 Simple Scan Experiment
6 Consequences for DBMS Memory access is a bottleneck Prevent cache & TLB misses Cache lines must be used fully DBMS must optimize –Data structures –Algorithms (focus: join)
7 Vertical Decomposition in Monet
8 Partitioned Joins Cluster both input relations Create clusters that fit in memory cache Join matching clusters Two algorithms: –Partitioned hash-join –Radix-Join (partitioned nested-loop)
9 Partitioned Joins: Straightforward Clustering Problem: Number of clusters exceeds number of –TLB entries ==> TLB trashing –Cache lines ==> cache trashing Solution: Multi-pass radix-cluster
10 Partitioned Joins: Multi-Pass Radix-Cluster Multiple clustering passes Limit number of clusters per pass Avoid cache/TLB trashing Trade memory cost for CPU cost Any data type (hashing)
11 Monet Experiments: Setup Platform: –SGI Origin2000 (MIPS R10000, 250 MHz) System: –Monet DBMS Data sets: –Integer join columns –Join hit-rate of 1 –Cardinalities: 15, ,000,000 Hardware event counters –to analyze cache & TLB misses
12 Monet Experiments: Radix-Cluster (64,000,000 tuples)
13 Accurate Cost Modeling: Radix-Cluster
14 Monet Experiments: Partitioned Hash-Join
15 Monet Experiments: Radix-Join
16 Monet Experiments: Overall Performance (64,000,000 tuples)
17 Conclusion Problem: –Memory access is increasingly the most important bottleneck for database performance Solutions: –Vertical decomposition improves column-wise data access –Radix-algorithms optimize join performance General: –Algorithms can be tuned to achieve optimal memory access –Detailed and accurate estimation of memory cost is possible Monet homepage:
18 Introduction: Hardware Trends CPU speed has been, is, and will be growing rapidly Main-memory access latency has hardly improved over the last decade Wider busses and new DRAM standards improve only the memory bandwidth Cache memories reduce the access latencies only if the accessed data is in the cache There is a main-memory access bottleneck and it will remain in the foreseeable future
19 Consequences for MM-DBMS: Overview Data structures: full vertical table fragmentation –Reduce record width, and thus –Optimize column-wise data access Query processing algorithms –Avoid random memory access pattern beyond cache limits –Minimize number of cache & TLB misses Example: partitioned hash-join –Create clusters that fit in memory cache –Perform hash-join on matching clusters