Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Stavros Harizopoulos 2006 Performance Tradeoffs in Read- Optimized Databases: from a Data Layout Perspective Stavros Harizopoulos MIT CSAIL Modified.

Similar presentations


Presentation on theme: "© Stavros Harizopoulos 2006 Performance Tradeoffs in Read- Optimized Databases: from a Data Layout Perspective Stavros Harizopoulos MIT CSAIL Modified."— Presentation transcript:

1 © Stavros Harizopoulos 2006 Performance Tradeoffs in Read- Optimized Databases: from a Data Layout Perspective Stavros Harizopoulos MIT CSAIL Modified by Jianlin Feng massachusetts institute of technology

2 © Stavros Harizopoulos 2006 massachusetts institute of technology2 Read-optimized databases 45 … 37 Joe … Sue 1 … 2 column stores 1 Joe 45 … … … 2 Sue 37 row stores Sybase IQ MonetDB CStore SQL Server DB2 Oracle Materialized views, multiple indices, compression Read optimizations: How does column-orientation affect performance?

3 © Stavros Harizopoulos 2006 massachusetts institute of technology3 Rows vs. columns column datarow data 1 Joe 45 2 Sue 37 … … … single file project Joe 45 12…12… Joe Sue 45 37 … … 3 files Joe 45 reconstruct Joe 45 Study performance tradeoffs solely in data storage seek

4 © Stavros Harizopoulos 2006 massachusetts institute of technology4 Target Questions (1) As the number of columns accessed by a query increase, how does that affect the performance of a column store? How is performance affected by the use of disk and L2 cache prefetching? On a modern workstation, under what workloads are column and row stores I/O bound?

5 © Stavros Harizopoulos 2006 massachusetts institute of technology5 Target Questions (2) How do parameters such as selectivity, number of projected attributes, tuple width, and compression affect column store performance? How are the relative performance tradeoffs of column and row stores affected by the presence of competition for I/O and memory bandwidth along with CPU cycles from competing queries?

6 © Stavros Harizopoulos 2006 massachusetts institute of technology6 Performance study Methodology Built both a row- and column-oriented storage manager from scratch Measure their performance with an identical set of relational operators –i.e., no column-wise optimization Mainly consider sequential scans on the fact table in a star-schema. Analyze time spent in CPU, disk and memory

7 © Stavros Harizopoulos 2006 massachusetts institute of technology7 Performance Consideration in Read-Optimized Databases An important goal is to minimize the number of bytes read from the disk when scanning a relation. For a given acess plan, two ways to achieve the goal –Minimize unnecessary data read. Densepack a data page –Store data in a compressed form.

8 © Stavros Harizopoulos 2006 massachusetts institute of technology8 Implementing a Read-Optimized Engine Block-iterator operators –Single-threaded, C++, Linux AIO No buffer pool –Use filesystem, bypass OS cache Three major components –Disk Storage for Columns and Rows –Row and Column Table Scanners –Query Engine and I/O Architecture

9 © Stavros Harizopoulos 2006 massachusetts institute of technology9

10 © Stavros Harizopoulos 2006 massachusetts institute of technology10 Compression methods Dictionary Bit-pack –Pack several attributes inside a 4-byte word –Use as many bits as max-value Delta –Base value per page –Arithmetic differences No Run-Length Encoding … ‘low’ … … ‘high’ … … ‘low’ … … ‘normal’ … … 00 … … 10 … … 00 … … 01 …

11 © Stavros Harizopoulos 2006 massachusetts institute of technology11

12 © Stavros Harizopoulos 2006 massachusetts institute of technology12 I/O Architecture Use the Asynchronous I/O (AIO) interface to implement –A non-blocking prefetching mechanism –Using the libaio library on Linux 2.6 AIO performs reads at the granularity of an I/O unit of 128KB Depth of prefetching –How many I/O units

13 © Stavros Harizopoulos 2006 massachusetts institute of technology13 Platform 3.2GHz CPUL2RAM 1MB 1GB 180 MB/sec 3.2 GB/sec DISKS direct IO 100ms read 10ms seek L2 cache prefetching read 128 bytes (striped) prefetching:

14 © Stavros Harizopoulos 2006 massachusetts institute of technology14 Workload LINEITEM (wide) –60m rows → 9.5 GB ORDERS (narrow) –60m rows → 1.9 GB Query 150 bytes52 bytes 32 bytes12 bytes SELECT a1, a2, a3, … WHERE a1 yields variable selectivity

15 © Stavros Harizopoulos 2006 massachusetts institute of technology15 Wide tuple: 10% selectivity selected bytes per tuple time (sec) Large prefetch hides disk seeks in columns Row Row (CPU only) Column (CPU only) Column 25B10B69B int 4B text char 1B

16 © Stavros Harizopoulos 2006 massachusetts institute of technology16 Wide tuple: 10% sel. (CPU) time (sec) row store # attributes selected column store Row-CPU suffers from memory stalls

17 © Stavros Harizopoulos 2006 massachusetts institute of technology17 Column-CPU efficiency with lower selectivity Wide tuple: 10% sel. (CPU) 0.1% # attributes selected column store time (sec) row store

18 © Stavros Harizopoulos 2006 massachusetts institute of technology18 Narrow tuple: 10% selectivity Memory stalls disappear in narrow tuples Compression: similar to narrow (not shown) time (sec) selected bytes per tuple # attributes selected row storecolumn store

19 © Stavros Harizopoulos 2006 massachusetts institute of technology19 Varying prefetch size No prefetching hurts columns in single scans time (sec) no competing disk traffic selected bytes per tuple Row (any prefetch size) Column 48 (x 128KB) Column 16 Column 8 Column 2

20 © Stavros Harizopoulos 2006 massachusetts institute of technology20 Varying prefetch size No prefetching hurts columns in single scans Under competing traffic, columns outperform rows for any prefetch size no competing disk traffic with competing disk traffic selected bytes per tuple time (sec)

21 © Stavros Harizopoulos 2006 massachusetts institute of technology21 Conclusions Given enough space for prefetching, columns outperform rows in most workloads Competing traffic favors columns Memory-bandwidth bottleneck in rows Future work –Column scanners, random I/O, write performance

22 © Stavros Harizopoulos 2006 massachusetts institute of technology22 References Stavros Harizopoulos, Velen Liang, Daniel Abadi, and Samuel Madden.Performance Tradeoffs in Read-Optimized Databases. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), Seoul, Korea, September 2006. PDF [354K] PPT (Slides) [340K]PDFPPT (Slides)


Download ppt "© Stavros Harizopoulos 2006 Performance Tradeoffs in Read- Optimized Databases: from a Data Layout Perspective Stavros Harizopoulos MIT CSAIL Modified."

Similar presentations


Ads by Google