© Stavros Harizopoulos 2006 Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos MIT CSAIL joint work with: Velen Liang, Daniel Abadi,

Slides:



Advertisements
Similar presentations
Arjun Suresh S7, R College of Engineering Trivandrum.
Advertisements

SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.
Storing Data: Disks and Files: Chapter 9
Query Processing and Optimizing on SSDs Flash Group Qingling Cao
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
1 HYRISE – A Main Memory Hybrid Storage Engine By: Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, Samuel Madden, VLDB.
Shimin Chen Big Data Reading Group.  Energy efficiency of: ◦ Single-machine instance of DBMS ◦ Standard server-grade hardware components ◦ A wide spectrum.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Last Time –Main memory indexing (T trees) and a real system. –Optimize for CPU, space, and logging. But things have changed drastically! Hardware trend:
1 CS143: Disks and Files. 2 System Architecture CPU Main Memory Disk Controller... Disk Word (1B – 64B) ~ x GB/sec Block (512B – 50KB) ~ x MB/sec System.
The Relational Model (cont’d) Introduction to Disks and Storage CS 186, Spring 2007, Lecture 3 Cow book Section 1.5, Chapter 3 (cont’d) Cow book Chapter.
Dutch-Belgium DataBase Day University of Antwerp, MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--
Optimizing RAM-latency Dominated Applications
Analyzing the Energy Efficiency of a Database Server Hanskamal Patel SE 521.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Introduction to Column-Oriented Databases Seminar: Columnar Databases, Nov 2012, Univ. Helsinki.
Slide 1 Windows PC Accelerators Reporter :吳柏良. Slide 2 Outline l Introduction l Windows SuperFetch l Windows ReadyBoost l Windows ReadyDrive l Conclusion.
Lecture 11: DMBS Internals
C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Key Perf considerations & bottlenecks Windows Azure VM characteristics Monitoring TroubleshootingBest practices.
On Windows File Access Modes : A Performance Study Jalil Boukhobza & Claude Timsit laboratory Versailles Saint Quentin University.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
VectorWise The world’s fastest database GIUA, 13 September 2011.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos * MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts.
© Stavros Harizopoulos 2006 Performance Tradeoffs in Read- Optimized Databases: from a Data Layout Perspective Stavros Harizopoulos MIT CSAIL Modified.
MIT DB GROUP. People Sam Madden Daniel Abadi (Yale)Daniel Abadi Magdalena Balazinska (U. Wash.)Magdalena Balazinska.
Column-Stores vs. Row-Stores How Different are they Really? Daniel J. Abadi, Samuel Madden, and Nabil Hachem, SIGMOD 2008 Presented By, Paresh Modak( )
David J. DeWitt Microsoft Jim Gray Systems Lab Madison, Wisconsin © 2009 Microsoft Corporation. All rights reserved. This presentation.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
Column Oriented Database Vs Row Oriented Databases By Rakesh Venkat.
CS Operating System & Database Performance Tuning Xiaofang Zhou School of Computing, NUS Office: S URL:
Lecture 5 Cost Estimation and Data Access Methods.
Computer Organization & Assembly Language © by DR. M. Amer.
EECS 262a Advanced Topics in Computer Systems Lecture 16 C-Store / DB Cracking October 22 nd, 2012 John Kubiatowicz and Anthony D. Joseph Electrical Engineering.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
6.830 Lecture 6 9/28/2015 Cost Estimation and Indexing.
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon David DeWitt, Mark Hill, and Marios Skounakis University of Wisconsin-Madison.
EECS 262a Advanced Topics in Computer Systems Lecture 16 C-Store / DB Cracking October 28 th, 2013 John Kubiatowicz and Anthony D. Joseph Electrical Engineering.
CS 440 Database Management Systems Lecture 6: Data storage & access methods 1.
ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by.
CS 540 Database Management Systems
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 7 – Buffer Management.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
Storage Tuning for Relational Databases Philippe Bonnet – Spring 2015.
Select Operation Strategies And Indexing (Chapter 8)
Indexing strategies and good physical designs for performance tuning Kenneth Ureña /SpanishPASSVC.
Introducing Hekaton The next step in SQL Server OLTP performance Mladen Prajdić
CS 540 Database Management Systems
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Database Management Systems (CS 564)
Hustle and Bustle of SQL Pages
Lecture 11: DMBS Internals
Oracle Storage Performance Studies
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Troubleshooting Techniques(*)
Column-Stores vs. Row-Stores: How Different Are They Really?
John Kubiatowicz Electrical Engineering and Computer Sciences
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Hybrid Buffer Pool The Good, the Bad and the Ugly
Presentation transcript:

© Stavros Harizopoulos 2006 Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts institute of technology

© Stavros Harizopoulos 2006 massachusetts institute of technology2 Read-optimized databases 45 … 37 Joe … Sue 1 … 2 column stores 1 Joe 45 … … … 2 Sue 37 row stores Sybase IQ MonetDB CStore SQL Server DB2 Oracle Materialized views, multiple indices, compression Read optimizations: How does column-orientation affect performance?

© Stavros Harizopoulos 2006 massachusetts institute of technology3 Rows vs. columns column datarow data 1 Joe 45 2 Sue 37 … … … single file project Joe 45 12…12… Joe Sue … … 3 files Joe 45 reconstruct Joe 45 Study performance tradeoffs solely in data storage seek

© Stavros Harizopoulos 2006 massachusetts institute of technology4 Performance study Methodology –Built storage manager from scratch –Sequential scans –Analyze CPU, disk, memory Findings –Columns are generally more I/O efficient –Competing traffic favors columns –Conditions where columns are CPU-constrained –Conditions where rows are MemBW-constrained

© Stavros Harizopoulos 2006 massachusetts institute of technology5 Talk outline System architecture Workload and Experiments Analysis Conclusions

© Stavros Harizopoulos 2006 massachusetts institute of technology6 System architecture Block-iterator operators –Single-threaded, C++, Linux AIO No buffer pool –Use filesystem, bypass OS cache Compression Dense-pack 60% full 100% full

© Stavros Harizopoulos 2006 massachusetts institute of technology7 Compression methods Dictionary Bit-pack –Pack several attributes inside a 4-byte word –Use as many bits as max-value Delta –Base value per page –Arithmetic differences … ‘low’ … … ‘high’ … … ‘low’ … … ‘normal’ … … 00 … … 10 … … 00 … … 01 …

© Stavros Harizopoulos 2006 massachusetts institute of technology8 Storage engine S SELECT name, age WHERE age > 40 apply predicate(s) Joe 45 … S S #POS 45 #POS … Joe 45 … apply predicate #1 row scannercolumn scanner age name

© Stavros Harizopoulos 2006 massachusetts institute of technology9 Platform 3.2GHz CPUL2RAM 1MB 1GB 180 MB/sec 3.2 GB/sec DISKS direct IO 100ms read 10ms seek L2 cache prefetching read 128 bytes (striped) prefetching:

© Stavros Harizopoulos 2006 massachusetts institute of technology10 Workload LINEITEM (wide) –60m rows → 9.5 GB ORDERS (narrow) –60m rows → 1.9 GB Query 150 bytes50 bytes 32 bytes12 bytes SELECT a1, a2, a3, … WHERE a1 yields variable selectivity

© Stavros Harizopoulos 2006 massachusetts institute of technology11 Wide tuple: 10% selectivity selected bytes per tuple time (sec) Large prefetch hides disk seeks in columns Row Row (CPU only) Column (CPU only) Column 25B10B69B int 4B text char 1B

© Stavros Harizopoulos 2006 massachusetts institute of technology12 Wide tuple: 10% sel. (CPU) time (sec) row store # attributes selected column store Row-CPU suffers from memory stalls

© Stavros Harizopoulos 2006 massachusetts institute of technology13 Column-CPU efficiency with lower selectivity Wide tuple: 10% sel. (CPU) 0.1% # attributes selected column store time (sec) row store

© Stavros Harizopoulos 2006 massachusetts institute of technology14 Narrow tuple: 10% selectivity Memory stalls disappear in narrow tuples Compression: similar to narrow (not shown) time (sec) selected bytes per tuple # attributes selected row storecolumn store

© Stavros Harizopoulos 2006 massachusetts institute of technology15 Varying prefetch size No prefetching hurts columns in single scans time (sec) no competing disk traffic selected bytes per tuple Row (any prefetch size) Column 48 (x 128KB) Column 16 Column 8 Column 2

© Stavros Harizopoulos 2006 massachusetts institute of technology16 Varying prefetch size No prefetching hurts columns in single scans Under competing traffic, columns outperform rows for any prefetch size no competing disk traffic with competing disk traffic selected bytes per tuple time (sec)

© Stavros Harizopoulos 2006 massachusetts institute of technology17 Analysis Central parameter in analysis: cycles per disk byte (cpdb) What can it model: More / fewer disks More / fewer CPUs CPU / disk competing traffic Trends in cpdb: 10 → 30 from 1995 to 2006 Further increase with multicore chips

© Stavros Harizopoulos 2006 massachusetts institute of technology18 Analysis Rows favored by narrow tuples and low cpdb –Disk-bound workloads have higher cpdb 10% selectivity 50% projection tuple width cycles per disk byte speedup of cols over rows – – – – 0.8 (cpdb)

© Stavros Harizopoulos 2006 massachusetts institute of technology19 See our paper for the rest CPU time breakdowns, L2 prefetcher Disk prefetching implementation Compression results Non-pipelined column scanner Analysis

© Stavros Harizopoulos 2006 massachusetts institute of technology20 Conclusions Given enough space for prefetching, columns outperform rows in most workloads Competing traffic favors columns Memory-bandwidth bottleneck in rows Future work –Column scanners, random I/O, write performance

© Stavros Harizopoulos 2006 massachusetts institute of technology21 Thank you db.csail.mit.edu/projects/cstore

© Stavros Harizopoulos 2006 massachusetts institute of technology22 Analysis SizeFile various DB schemas TupleWidth MemBytesCycle memory bus speed f # of selected attributes I CPU work cpdb (cycles per disk byte) more / fewer disks more / fewer CPUs CPU / disk competing traffic parameterwhat it can model