Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes

Introduction What is x100 ? A new query processing engine developed for MonetDB

Contents Introduction CWI Database Group Motivation MonetDB/x100 Architecture Highlights Optimizing CPU performance Exploiting cache memories Enhancing disk bandwidth Conclusions Discussion

CWI Database Group Database Architecture DBMS design, implementation, evaluation Wide area; many sub-areas  Data structures  Query processing algorithms  Modern computer architectures MonetDB 1994-2004 at CWI open-source high-performance DBMS Future: X100, MonetDB 5.0

Motivation Multimedia retrieval TREC Video: 130 hours of news, growing each year Task: search for a given text (speech recognition) or video similar to a given image 3 TB of data (!)

Motivation Similar areas Data-mining OLAP, data warehousing Scientific applications (astronomy, biology…) Challenge: process really large datasets within DBMS efficiently

x100 Highlights Use computer architecture to guide this talk

CPU Actual data processing

CPU From CISC to hyper-pipelined 1986: 8086: CISC 1990: 486: 2 execution units 1992: Pentium: 2 x 5-stage pipelined units 1996: Pentium3: 3 x 7-stage pipelined units 2000: Pentium4: 12 x 20-stage pipelined execution units Each instruction executes in multiple steps… A -> A1, …, An … in (multiple) pipelines:

CPU But only, if the instructions are independent! Otherwise: Problems: branches in program logic accessing recently modified memory [ailamaki99, …]  DBMSs bad at filling pipelines

x100: vectorized processing *(int,int): int  *(int[],int[]) : int[]

x100: vectorized processing Primitives: vector at a time very basic functionality independent loop iterations  simple code Optimization levels: Compiler  loop pipelining CPU  full pipelines *(int,int): int  *(int[],int[]) : int[]

x100: results (TPC-H Q1) Few CPU cycles per tuple e.g. MySQL spends ~100 cycles for such operators

Main memory Large, but not unlimited

Cache Faster, but very limited storage

Cache  Memory Bottleneck Cache to hide memory access cost Different costs at different levels: L1 cache access: 1-2 cycles L2 cache access: 6-20 cycles main-memory access: 100-400 cycles Consequences: random access into main-memory very expensive DBMS must buffer for CPU cache, not RAM

Cache  Memory Bottleneck Cache to hide memory access cost Different costs at different levels: L1 cache access: 1-2 cycles L2 cache access: 6-20 cycles main-memory access: 100-400 cycles Consequences : random access into main-memory very expensive DBMS must buffer for CPU cache, not RAM  cache-conscious query processing MonetDB research [VLDB99,00,02,04]

x100: pipelining Vectors fill the CPU cache main-memory access only at the data sources and sinks - * + Project( ) 0.19 - X100 query processor  CPU Cache RAM X100 buffer mgr disk MonetDB uses much more main memory bandwidth

x100: pipelining Vectors fill the CPU cache main-memory access only at the data input and output - * + Project( ) 0.19 - X100 query processor  CPU Cache RAM X100 buffer mgr disk x100 MonetDB

Disk Slow, but unlimited ( ) storage

Disk Random access hopeless Size grows faster than bandwidth

x100: problem - bandwidth MonetDB/x100 too fast for disks TPC-H queries need 200-600MB/s

Bandwidth improvements Three ideas: Vertical Fragmentation (MonetDB) new: Lightweight Compression new: Cooperative Scans

Vertical fragmentation DBMS disk access in data-intensive applications Only the relevant data is read – reduced disk bandwidth requirements

Lightweight Compression Compression introduced not to reduce storage space but to increase disk bandwidth: Due to efficient code for disk-based data only few percents of CPU time are used Part of this extra time can be spent on decompressing data

Lightweight Compression Rationale: - Disk  RAM transfer uses DMA and does not need CPU - (de)compress only vector-at-a-time when data is needed - * + Project( ) 0.19 - X100 query processor CPU Cache RAM X100 buffer mgr disk Compress on the CPU cache  RAM boundary

Lightweight Compression Standard compression won’t do Compresses too well => too slow (100MB/s) Research Question devise lightweight (de)compression algorithms Results so far compression factor relatively small, up to 3.5 decompression speed – 3GB/sec (!) compression speed – 1GB/sec (!!!) perceived bandwidth 3 times bigger

Cooperative Scans Idea: use I/O bandwidth to satisfy multiple queries Cooperative Scans Active Buffer Manager, is aware of concurrent scans on the same table Research Question: devise adaptive buffer management strategies Benefits: I/O Bandwidth is re-used by multiple queries Concurrent queries don’t fight anymore for the disk arm

Cooperative Scans x100 and Cooperative Scans: >30 queries without performance degradation

x100 summary Original MonetDB successful in the same application areas, however Sub-optimal CPU utilization Only efficient if problem fits RAM x100 improves architecture on all levels Better CPU utilization Better cache utilization Scales to non-memory resident datasets Improves I/O bandwidth using compression and cooperative scans

Example results Performance close to hand-written C functions TPCH SF-1x100OracleMonetDB Q10.54s30s9.4s Q30.24s10s2.5s Q60.15s1.5s2.5s Q140.13s2s1.2s

x100 status First proof-of-concept implemented Full TPC-H benchmark executes Future work: lots of engineering new buffer manager more vectorized algorithms memory footprint tuning (for small devices) SQL front-end

More information www.cwi.nl/~boncz/x100.html CIDR’05 paper: “MonetDB/X100: Hyper-pipelining query execution”

Discussion ?

Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Similar presentations

Presentation on theme: "Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Similar presentations

Presentation on theme: "Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes."— Presentation transcript:

Similar presentations

About project

Feedback