Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Slides:



Advertisements
Similar presentations
DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.
Advertisements

Ingres/VectorWise Doug Inkster – Ingres Development.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
IiWAS2002, Bandung, Indonesia Teaching and Learning Databases Dr. Stéphane Bressan National University of Singapore.
Analysis of Database Workloads on Modern Processors Advisor: Prof. Shan Wang P.h.D student: Dawei Liu Key Laboratory of Data Engineering and Knowledge.
Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Benchmarking XML storage systems Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML.
FAWN: A Fast Array of Wimpy Nodes Presented by: Aditi Bose & Hyma Chilukuri.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
DaMoN 2011 Paper Preview Organized by Stavros Harizopoulos and Qiong Luo Athens, Greece Jun 13, 2011.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Lecture 11: DMBS Internals
High Throughput Compression of Double-Precision Floating-Point Data Martin Burtscher and Paruj Ratanaworabhan School of Electrical and Computer Engineering.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz Data Distilleries B.V. Amsterdam The Netherlands Stefan.
© Stavros Harizopoulos 2006 Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos MIT CSAIL joint work with: Velen Liang, Daniel Abadi,
Breaking the Memory Wall in MonetDB
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Computer Systems Organization CS 1428 Foundations of Computer Science.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
MonetDB/X100 hyper-pipelining query execution Peter Boncz, Marcin Zukowski, Niels Nes.
Efficient and Flexible Information Retrieval Using MonetDB/X100 Sándor Héman CWI, Amsterdam Marcin Zukowski, Arjen de Vries, Peter Boncz January 08, 2007.
Ingres/VectorWise Doug Inkster – Ingres Development.
Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts.
© Stavros Harizopoulos 2006 Performance Tradeoffs in Read- Optimized Databases: from a Data Layout Perspective Stavros Harizopoulos MIT CSAIL Modified.
IT253: Computer Organization
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
What is cache memory?. Cache Cache is faster type of memory than is found in main memory. In other words, it takes less time to access something in cache.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CS4432: Database Systems II Query Processing- Part 2.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 4 Computer Systems Review.
A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Performance Tuning John Black CS 425 UNR, Fall 2000.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
EECS 322 March 18, 2000 RISC - Reduced Instruction Set Computer Reduced Instruction Set Computer  By reducing the number of instructions that a processor.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
The Processor & its components. The CPU The brain. Performs all major calculations. Controls and manages the operations of other components of the computer.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Cache Advanced Higher.
Computer Organization
Memory COMPUTER ARCHITECTURE
Parallel Data Laboratory, Carnegie Mellon University
CS1251 Computer Architecture
Database Performance Tuning and Query Optimization
Introduction to Query Optimization
Lecture 11: DMBS Internals
Spare Register Aware Prefetching for Graph Algorithms on GPUs
Chapter 15 QUERY EXECUTION.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Introduction to Computer Systems
STUDY AND IMPLEMENTATION
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
(A Research Proposal for Optimizing DBMS on CMP)
CSE 373 Data Structures and Algorithms
Chapter 11 Database Performance Tuning and Query Optimization
Memory System Performance Chapter 3
CSE 373: Data Structures and Algorithms
Presentation transcript:

Dutch-Belgium DataBase Day University of Antwerp, MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes

Introduction What is x100 ? A new query processing engine developed for MonetDB

Contents Introduction CWI Database Group Motivation MonetDB/x100 Architecture Highlights Optimizing CPU performance Exploiting cache memories Enhancing disk bandwidth Conclusions Discussion

CWI Database Group Database Architecture DBMS design, implementation, evaluation Wide area; many sub-areas  Data structures  Query processing algorithms  Modern computer architectures MonetDB at CWI open-source high-performance DBMS Future: X100, MonetDB 5.0

Motivation Multimedia retrieval TREC Video: 130 hours of news, growing each year Task: search for a given text (speech recognition) or video similar to a given image 3 TB of data (!)

Motivation Similar areas Data-mining OLAP, data warehousing Scientific applications (astronomy, biology…) Challenge: process really large datasets within DBMS efficiently

x100 Highlights Use computer architecture to guide this talk

CPU Actual data processing

CPU From CISC to hyper-pipelined 1986: 8086: CISC 1990: 486: 2 execution units 1992: Pentium: 2 x 5-stage pipelined units 1996: Pentium3: 3 x 7-stage pipelined units 2000: Pentium4: 12 x 20-stage pipelined execution units Each instruction executes in multiple steps… A -> A1, …, An … in (multiple) pipelines:

CPU But only, if the instructions are independent! Otherwise: Problems: branches in program logic accessing recently modified memory [ailamaki99, …]  DBMSs bad at filling pipelines

x100: vectorized processing *(int,int): int  *(int[],int[]) : int[]

x100: vectorized processing Primitives: vector at a time very basic functionality independent loop iterations  simple code Optimization levels: Compiler  loop pipelining CPU  full pipelines *(int,int): int  *(int[],int[]) : int[]

x100: results (TPC-H Q1) Few CPU cycles per tuple e.g. MySQL spends ~100 cycles for such operators

Main memory Large, but not unlimited

Cache Faster, but very limited storage

Cache  Memory Bottleneck Cache to hide memory access cost Different costs at different levels: L1 cache access: 1-2 cycles L2 cache access: 6-20 cycles main-memory access: cycles Consequences: random access into main-memory very expensive DBMS must buffer for CPU cache, not RAM

Cache  Memory Bottleneck Cache to hide memory access cost Different costs at different levels: L1 cache access: 1-2 cycles L2 cache access: 6-20 cycles main-memory access: cycles Consequences : random access into main-memory very expensive DBMS must buffer for CPU cache, not RAM  cache-conscious query processing MonetDB research [VLDB99,00,02,04]

x100: pipelining Vectors fill the CPU cache main-memory access only at the data sources and sinks - * + Project( ) X100 query processor  CPU Cache RAM X100 buffer mgr disk MonetDB uses much more main memory bandwidth

x100: pipelining Vectors fill the CPU cache main-memory access only at the data input and output - * + Project( ) X100 query processor  CPU Cache RAM X100 buffer mgr disk x100 MonetDB

Disk Slow, but unlimited ( ) storage

Disk Random access hopeless Size grows faster than bandwidth

x100: problem - bandwidth MonetDB/x100 too fast for disks TPC-H queries need MB/s

Bandwidth improvements Three ideas: Vertical Fragmentation (MonetDB) new: Lightweight Compression new: Cooperative Scans

Vertical fragmentation DBMS disk access in data-intensive applications Only the relevant data is read – reduced disk bandwidth requirements

Lightweight Compression Compression introduced not to reduce storage space but to increase disk bandwidth: Due to efficient code for disk-based data only few percents of CPU time are used Part of this extra time can be spent on decompressing data

Lightweight Compression Rationale: - Disk  RAM transfer uses DMA and does not need CPU - (de)compress only vector-at-a-time when data is needed - * + Project( ) X100 query processor CPU Cache RAM X100 buffer mgr disk Compress on the CPU cache  RAM boundary

Lightweight Compression Standard compression won’t do Compresses too well => too slow (100MB/s) Research Question devise lightweight (de)compression algorithms Results so far compression factor relatively small, up to 3.5 decompression speed – 3GB/sec (!) compression speed – 1GB/sec (!!!) perceived bandwidth 3 times bigger

Cooperative Scans Idea: use I/O bandwidth to satisfy multiple queries Cooperative Scans Active Buffer Manager, is aware of concurrent scans on the same table Research Question: devise adaptive buffer management strategies Benefits: I/O Bandwidth is re-used by multiple queries Concurrent queries don’t fight anymore for the disk arm

Cooperative Scans x100 and Cooperative Scans: >30 queries without performance degradation

x100 summary Original MonetDB successful in the same application areas, however Sub-optimal CPU utilization Only efficient if problem fits RAM x100 improves architecture on all levels Better CPU utilization Better cache utilization Scales to non-memory resident datasets Improves I/O bandwidth using compression and cooperative scans

Example results Performance close to hand-written C functions TPCH SF-1x100OracleMonetDB Q10.54s30s9.4s Q30.24s10s2.5s Q60.15s1.5s2.5s Q140.13s2s1.2s

x100 status First proof-of-concept implemented Full TPC-H benchmark executes Future work: lots of engineering new buffer manager more vectorized algorithms memory footprint tuning (for small devices) SQL front-end

More information CIDR’05 paper: “MonetDB/X100: Hyper-pipelining query execution”

Discussion ?