Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Similar presentations


Presentation on theme: "Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes."— Presentation transcript:

1 Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes

2 Introduction What is x100 ? A new query processing engine developed for MonetDB

3 Contents Introduction CWI Database Group Motivation MonetDB/x100 Architecture Highlights Optimizing CPU performance Exploiting cache memories Enhancing disk bandwidth Conclusions Discussion

4 CWI Database Group Database Architecture DBMS design, implementation, evaluation Wide area; many sub-areas  Data structures  Query processing algorithms  Modern computer architectures MonetDB 1994-2004 at CWI open-source high-performance DBMS Future: X100, MonetDB 5.0

5 Motivation Multimedia retrieval TREC Video: 130 hours of news, growing each year Task: search for a given text (speech recognition) or video similar to a given image 3 TB of data (!)

6 Motivation Similar areas Data-mining OLAP, data warehousing Scientific applications (astronomy, biology…) Challenge: process really large datasets within DBMS efficiently

7 x100 Highlights Use computer architecture to guide this talk

8 CPU Actual data processing

9 CPU From CISC to hyper-pipelined 1986: 8086: CISC 1990: 486: 2 execution units 1992: Pentium: 2 x 5-stage pipelined units 1996: Pentium3: 3 x 7-stage pipelined units 2000: Pentium4: 12 x 20-stage pipelined execution units Each instruction executes in multiple steps… A -> A1, …, An … in (multiple) pipelines:

10 CPU But only, if the instructions are independent! Otherwise: Problems: branches in program logic accessing recently modified memory [ailamaki99, …]  DBMSs bad at filling pipelines

11 x100: vectorized processing *(int,int): int  *(int[],int[]) : int[]

12 x100: vectorized processing Primitives: vector at a time very basic functionality independent loop iterations  simple code Optimization levels: Compiler  loop pipelining CPU  full pipelines *(int,int): int  *(int[],int[]) : int[]

13 x100: results (TPC-H Q1) Few CPU cycles per tuple e.g. MySQL spends ~100 cycles for such operators

14 Main memory Large, but not unlimited

15 Cache Faster, but very limited storage

16 Cache  Memory Bottleneck Cache to hide memory access cost Different costs at different levels: L1 cache access: 1-2 cycles L2 cache access: 6-20 cycles main-memory access: 100-400 cycles Consequences: random access into main-memory very expensive DBMS must buffer for CPU cache, not RAM

17 Cache  Memory Bottleneck Cache to hide memory access cost Different costs at different levels: L1 cache access: 1-2 cycles L2 cache access: 6-20 cycles main-memory access: 100-400 cycles Consequences : random access into main-memory very expensive DBMS must buffer for CPU cache, not RAM  cache-conscious query processing MonetDB research [VLDB99,00,02,04]

18 x100: pipelining Vectors fill the CPU cache main-memory access only at the data sources and sinks - * + Project( ) 0.19 - X100 query processor  CPU Cache RAM X100 buffer mgr disk MonetDB uses much more main memory bandwidth

19 x100: pipelining Vectors fill the CPU cache main-memory access only at the data input and output - * + Project( ) 0.19 - X100 query processor  CPU Cache RAM X100 buffer mgr disk x100 MonetDB

20 Disk Slow, but unlimited ( ) storage

21 Disk Random access hopeless Size grows faster than bandwidth

22 x100: problem - bandwidth MonetDB/x100 too fast for disks TPC-H queries need 200-600MB/s

23 Bandwidth improvements Three ideas: Vertical Fragmentation (MonetDB) new: Lightweight Compression new: Cooperative Scans

24 Vertical fragmentation DBMS disk access in data-intensive applications Only the relevant data is read – reduced disk bandwidth requirements

25 Lightweight Compression Compression introduced not to reduce storage space but to increase disk bandwidth: Due to efficient code for disk-based data only few percents of CPU time are used Part of this extra time can be spent on decompressing data

26 Lightweight Compression Rationale: - Disk  RAM transfer uses DMA and does not need CPU - (de)compress only vector-at-a-time when data is needed - * + Project( ) 0.19 - X100 query processor CPU Cache RAM X100 buffer mgr disk Compress on the CPU cache  RAM boundary

27 Lightweight Compression Standard compression won’t do Compresses too well => too slow (100MB/s) Research Question devise lightweight (de)compression algorithms Results so far compression factor relatively small, up to 3.5 decompression speed – 3GB/sec (!) compression speed – 1GB/sec (!!!) perceived bandwidth 3 times bigger

28 Cooperative Scans Idea: use I/O bandwidth to satisfy multiple queries Cooperative Scans Active Buffer Manager, is aware of concurrent scans on the same table Research Question: devise adaptive buffer management strategies Benefits: I/O Bandwidth is re-used by multiple queries Concurrent queries don’t fight anymore for the disk arm

29 Cooperative Scans x100 and Cooperative Scans: >30 queries without performance degradation

30 x100 summary Original MonetDB successful in the same application areas, however Sub-optimal CPU utilization Only efficient if problem fits RAM x100 improves architecture on all levels Better CPU utilization Better cache utilization Scales to non-memory resident datasets Improves I/O bandwidth using compression and cooperative scans

31 Example results Performance close to hand-written C functions TPCH SF-1x100OracleMonetDB Q10.54s30s9.4s Q30.24s10s2.5s Q60.15s1.5s2.5s Q140.13s2s1.2s

32 x100 status First proof-of-concept implemented Full TPC-H benchmark executes Future work: lots of engineering new buffer manager more vectorized algorithms memory footprint tuning (for small devices) SQL front-end

33 More information www.cwi.nl/~boncz/x100.html CIDR’05 paper: “MonetDB/X100: Hyper-pipelining query execution”

34 Discussion ?


Download ppt "Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes."

Similar presentations


Ads by Google