Download presentation
Presentation is loading. Please wait.
1
Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes
2
Introduction What is x100 ? A new query processing engine developed for MonetDB
3
Contents Introduction CWI Database Group Motivation MonetDB/x100 Architecture Highlights Optimizing CPU performance Exploiting cache memories Enhancing disk bandwidth Conclusions Discussion
4
CWI Database Group Database Architecture DBMS design, implementation, evaluation Wide area; many sub-areas Data structures Query processing algorithms Modern computer architectures MonetDB 1994-2004 at CWI open-source high-performance DBMS Future: X100, MonetDB 5.0
5
Motivation Multimedia retrieval TREC Video: 130 hours of news, growing each year Task: search for a given text (speech recognition) or video similar to a given image 3 TB of data (!)
6
Motivation Similar areas Data-mining OLAP, data warehousing Scientific applications (astronomy, biology…) Challenge: process really large datasets within DBMS efficiently
7
x100 Highlights Use computer architecture to guide this talk
8
CPU Actual data processing
9
CPU From CISC to hyper-pipelined 1986: 8086: CISC 1990: 486: 2 execution units 1992: Pentium: 2 x 5-stage pipelined units 1996: Pentium3: 3 x 7-stage pipelined units 2000: Pentium4: 12 x 20-stage pipelined execution units Each instruction executes in multiple steps… A -> A1, …, An … in (multiple) pipelines:
10
CPU But only, if the instructions are independent! Otherwise: Problems: branches in program logic accessing recently modified memory [ailamaki99, …] DBMSs bad at filling pipelines
11
x100: vectorized processing *(int,int): int *(int[],int[]) : int[]
12
x100: vectorized processing Primitives: vector at a time very basic functionality independent loop iterations simple code Optimization levels: Compiler loop pipelining CPU full pipelines *(int,int): int *(int[],int[]) : int[]
13
x100: results (TPC-H Q1) Few CPU cycles per tuple e.g. MySQL spends ~100 cycles for such operators
14
Main memory Large, but not unlimited
15
Cache Faster, but very limited storage
16
Cache Memory Bottleneck Cache to hide memory access cost Different costs at different levels: L1 cache access: 1-2 cycles L2 cache access: 6-20 cycles main-memory access: 100-400 cycles Consequences: random access into main-memory very expensive DBMS must buffer for CPU cache, not RAM
17
Cache Memory Bottleneck Cache to hide memory access cost Different costs at different levels: L1 cache access: 1-2 cycles L2 cache access: 6-20 cycles main-memory access: 100-400 cycles Consequences : random access into main-memory very expensive DBMS must buffer for CPU cache, not RAM cache-conscious query processing MonetDB research [VLDB99,00,02,04]
18
x100: pipelining Vectors fill the CPU cache main-memory access only at the data sources and sinks - * + Project( ) 0.19 - X100 query processor CPU Cache RAM X100 buffer mgr disk MonetDB uses much more main memory bandwidth
19
x100: pipelining Vectors fill the CPU cache main-memory access only at the data input and output - * + Project( ) 0.19 - X100 query processor CPU Cache RAM X100 buffer mgr disk x100 MonetDB
20
Disk Slow, but unlimited ( ) storage
21
Disk Random access hopeless Size grows faster than bandwidth
22
x100: problem - bandwidth MonetDB/x100 too fast for disks TPC-H queries need 200-600MB/s
23
Bandwidth improvements Three ideas: Vertical Fragmentation (MonetDB) new: Lightweight Compression new: Cooperative Scans
24
Vertical fragmentation DBMS disk access in data-intensive applications Only the relevant data is read – reduced disk bandwidth requirements
25
Lightweight Compression Compression introduced not to reduce storage space but to increase disk bandwidth: Due to efficient code for disk-based data only few percents of CPU time are used Part of this extra time can be spent on decompressing data
26
Lightweight Compression Rationale: - Disk RAM transfer uses DMA and does not need CPU - (de)compress only vector-at-a-time when data is needed - * + Project( ) 0.19 - X100 query processor CPU Cache RAM X100 buffer mgr disk Compress on the CPU cache RAM boundary
27
Lightweight Compression Standard compression won’t do Compresses too well => too slow (100MB/s) Research Question devise lightweight (de)compression algorithms Results so far compression factor relatively small, up to 3.5 decompression speed – 3GB/sec (!) compression speed – 1GB/sec (!!!) perceived bandwidth 3 times bigger
28
Cooperative Scans Idea: use I/O bandwidth to satisfy multiple queries Cooperative Scans Active Buffer Manager, is aware of concurrent scans on the same table Research Question: devise adaptive buffer management strategies Benefits: I/O Bandwidth is re-used by multiple queries Concurrent queries don’t fight anymore for the disk arm
29
Cooperative Scans x100 and Cooperative Scans: >30 queries without performance degradation
30
x100 summary Original MonetDB successful in the same application areas, however Sub-optimal CPU utilization Only efficient if problem fits RAM x100 improves architecture on all levels Better CPU utilization Better cache utilization Scales to non-memory resident datasets Improves I/O bandwidth using compression and cooperative scans
31
Example results Performance close to hand-written C functions TPCH SF-1x100OracleMonetDB Q10.54s30s9.4s Q30.24s10s2.5s Q60.15s1.5s2.5s Q140.13s2s1.2s
32
x100 status First proof-of-concept implemented Full TPC-H benchmark executes Future work: lots of engineering new buffer manager more vectorized algorithms memory footprint tuning (for small devices) SQL front-end
33
More information www.cwi.nl/~boncz/x100.html CIDR’05 paper: “MonetDB/X100: Hyper-pipelining query execution”
34
Discussion ?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.