OLTP on NVM: YMMV @andy_pavlo.

Slides:



Advertisements
Similar presentations
C.R.E.A.M. C ACHE R ULES E VERYTHING A ROUND M E.
Advertisements

@andy_pavl o OLTP on NVM: YM MV. The Last Six Months ? PDL Retreat October 2013 PDL Visit Day May 2014.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Anti-Caching in Main Memory Database Systems Justin DeBrabant Brown University
1 CSE544 Database Architecture Tuesday, February 1 st, 2011 Slides courtesy of Magda Balazinska.
Computers in the real world Objectives Understand what is meant by memory Difference between RAM and ROM Look at how memory affects the performance of.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Mayuresh Varerkar ECEN 5613 Current Topics Presentation March 30, 2011.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Memory Systems How to make the most out of cheap storage.
Main memory DB PDT Ján GENČI. 2 Obsah Motivation DRDBMS MMDBMS DRDBMS versus MMDBMS Commit processing Support in commercial systems.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
@andy_pavl o OLTP on the NVM SDV: YMMV January Retreat Thesis Defense December Retreat Job Interviews Moved to CMU.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Intel “Big Data” Science and Technology Center Michael Stonebraker.
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
Persistent Memory (PM)
sponsored by HP Enterprise
Hathi: Durable Transactions for Memory using Flash
Failure-Atomic Slotted Paging for Persistent Memory
Memory Management.
Free Transactions with Rio Vista
Chapter 2 Memory and process management
Memory COMPUTER ARCHITECTURE
CS 540 Database Management Systems
Lecture 12 Virtual Memory.
Virtual Memory - Part II
Storage and Disks.
Lecture 16: Data Storage Wednesday, November 6, 2006.
FileSystems.
Database Management Systems (CS 564)
SQL Server Internals Overview
Swapping Segmented paging allows us to have non-contiguous allocations
Database Performance Tuning and Query Optimization
Lecture 11: DMBS Internals
Adda Quinn 1974 Nancy Wheeler Jenkins 1978.
Lecture 9: Data Storage and IO Models
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Real world In-Memory OLTP
SQL 2014 In-Memory OLTP What, Why, and How
Predictive Performance
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Anti-Caching in Main Memory Database Systems
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Hybrid Indexes Reducing the Storage Overhead of
Introduction to Database Systems
MICROPROCESSOR MEMORY ORGANIZATION
Free Transactions with Rio Vista
In Memory OLTP Not Just for OLTP.
Benchmarking Cloud Serving Systems with YCSB
Memory Organization.
Persistence: hard disk drive
File Storage and Indexing
File Storage and Indexing
Contents Memory types & memory hierarchy Virtual memory (VM)
Lecture 20: Intro to Transactions & Logging II
Chapter 11 Database Performance Tuning and Query Optimization
Indexing 4/11/2019.
CSE 373: Data Structures and Algorithms
Data Warehousing Concepts
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Department of Computer Science
SQL Statement Logging for Making SQLite Truly Lite
CS 295: Modern Systems Storage Technologies Introduction
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Virtual Memory 1 1.
Presentation transcript:

OLTP on NVM: YMMV @andy_pavlo

Prison Life GOOD EVIL Washing Dishes Not Fighting Repentant Cafeteria Thievery Shankings Making Pruno When you’re in prison, you have to make choice about how you want to live your life in there. On one hand, you can try to lead a good life. That means helping out with washing dishes, not getting in fights or battles with other inmates, and generally having a conciliatory demeanor. You want the parole board to think that you are truly repentant about your crimes so that you can get out sooner. The problem with this approach is that the other inmates will see this and then you’ll probably get beat up. On the other end of the spectrum, you can continue to live a hard life in prison. This means doing things like stealing forks and other contraband from the cafeteria and then using them to make shanks to stab other roommates to take their cigarettes. You can then also use the food that you still from the cafeteria to start brewing pruno, or prison wine in your cell. The best pruno is usually made from “fruit” and ketchup. Oh just as a word of advice, you want to avoid making pruno from potatoes. A lot of prisoners think that they’re going to make pruno vodka with them but you just end up with a bout of botulism. The advantage of leading this lifestyle is that you are obviously going to not get beat up by other prisoners because they’re going to be afraid of you, but it also means that the parole board is going to be hard on you and you’re going to end up serving your full sentence. The trick that I learned is that you actually want to be in the middle. You want a little bit from both lifestyles. Maybe you still want to still stuff from the cafeteria but you want to just take stuff that will help you make pruno, and then you can share your pruno with other prisoners. That will keep you in good graces with the parole board but also keep you from getting beaten with a piece of soap inside of a tube sock while taking a shower.

NVM OLTP DRAM SSD/HDD Lightweight CC Logical Logging Snapshots Heavyweight CC ARIES Logging Making Pruno What’s really remarkable about this philosophy is that it’s the same decision that we face when deciding what kind of DBMS architecture to use for running OLTP workloads on non-volatile memory devices. On one end of the spectrum we have DRAM-oriented systems that get really good performance using a lightweight concurrency control scheme, logical logging schemes. But they have a longer recovery time when the system crashes, since now you need to load in the last snapshot that you took from disk and replay the log to get you back to the state of the database that you had before the crash. Then at the other end, you have the disk-oriented systems backed by a solid-state drive or a spinning disk drive. With these systems, you have to use a heavyweight concurrency control scheme because any txn at any time could try to access a record that’s not in the buffer pool and the DBMS has to go out to disk to get it. They also employ a heavy-weight recovery scheme, something like ARIES, which has to record a lot of information about changes to the DB out disk. All of this takes a lot of time, so while you are waiting for a disk-oriented DBMS to process transactions you could still make a small batch of pruno in your server room. But with NVM, it’s not as black and white. The devices are going to be much, much faster than an SSD, so you probably don’t want to use the heavyweight mechanisms found in a disk-oriented system, but they’re not quite as fast as DRAM, so we can’t adopt all of the components of a DRAM system.

Overview Understand the performance characteristics of NVM to develop an optimal DBMS architecture for OLTP workloads. And that’s what our current research as part of the Big Data ISTC is all about. We’re trying to understand the performance characteristics of next-generation NVM devices so that we can design a new DBMS architecture that is likely to borrows ideas from the main memory-oriented DBMSs and the traditional, disk-oriented DBMSs. So much of this work is very preliminary. We have only been running on Intel’s NVM SDV for about a month now and we’re still porting our software to work on it, but I want to give you a glimpse of what we’ve done so far, and our current thoughts of where we are heading with the new system.

Intel NVM Emulator Instrumented motherboard that slows down access to the memory controller. Two execution interfaces: NUMA (NVM-only) PMFS (DRAM+NVM)

NUMA Interface – NVM-Only Virtual CPU where all memory access uses the NVM portion of DRAM. No change to application code.

PMFS Interface – DRAM+NVM Special filesystem designed for byte-addressable NVM. Avoids overhead of traditional filesystems.

DBMS Architectures Disk-oriented. Main memory-oriented.

Disk-oriented DBMS Pessimistic assumption that the data a txn needs is not in memory Based on the design assumptions made in the 1970s. Ingres (Berkeley) System R (IBM)

Application DRAM PMFS WAL

Memory-oriented DBMS Assume that all data fits in memory. Avoid the overhead of concurrency control + recovery. SmallBase (AT&T) Hekaton (Microsoft) H-Store/VoltDB (Me & others…)

Application NUMA CMD Log PMFS

Experimental Evaluation Compare the DBMS architectures on the two NVM interfaces. Yahoo! Cloud Serving Benchmark: 10 million records (~10GB) 8x database / memory Variable skew What I want to share with you is two sets of experiments that we’ve done to evaluate the performance of this new version of H-Store. We’re going to compare the performance of H-Store with the MMAP storage manager against an installation of MySQL that we’ve tuned for OLTP workloads. We’re going to use the YCSB benchmark with 10 million records. Each record is about 1KB so that comes out to be about 10GB. For H-Store, we’re going to allow the system to allocate enough memory from PMFS to store the entire database. For MySQL, we’re going to set the buffer pool size such that only an eighth of the database fits in DRAM. This ensures that the systems are reading and writing to PMFS enough for their systems.

Evaluated Systems NVM-Only NVM+DRAM H-Store (v2014) MySQL (v5.5) H-Store + Anti-Caching (v2014)

Skew Amount (high→Low) NUMA Interface (NVM-Only) Read-Only Workload 2x Latency Relative to DRAM YCSB // H-Store MySQL txn/sec Skew Amount (high→Low)

Skew Amount (high→Low) PMFS Interface (NVM+DRAM) Read-Only Workload 2x Latency Relative to DRAM YCSB // Anti-Caching MySQL txn/sec Skew Amount (high→Low)

Skew Amount (high→Low) NUMA Interface (NVM-Only) Write-Heavy Workload 2x Latency Relative to DRAM YCSB // H-Store MySQL txn/sec Skew Amount (high→Low)

Skew Amount (high→Low) PMFS Interface (NVM+DRAM) Write-Heavy Workload 2x Latency Relative to DRAM YCSB // Anti-Caching MySQL txn/sec Skew Amount (high→Low)

Discussion NVM latency did not make a big difference in performance. Logging is major bottleneck in DBMS performance on NVM. Also wears out device quickly MySQL wastes NVM space.

N-STORE nstore.cs.cmu.edu

N-Store First DBMS for NVM-only operating environment. OLTP/OLAP hybrid Column-store that supports fast in-place updates. The first possible architecture that we are considering is to keep DRAM in the equation and build a hybrid system that can support high-performance OLTP transactions and longer running analytical operations in the same DBMS. To do this, we would use DRAM as a place to store hot data in a row-oriented format. Overtime the system will migrate tuples into column-oriented storage stored in the NVM. This process will be completely transparent the application. This is sort of the same idea proposed by SAP HANA, but they keep everything in memory data structures. One interesting aspect of this approach is that we could explore the development of new types of indexes that can store keys in different ways based on what storage layer the record is residing in. For example, the keys that correspond to in-memory records will be stored in a regular b-tree but then the data that is out in the NVM can be stored in a different data structure that is more amenable to compression but will be slow to update.

Justin DeBrabant Joy Arulraj Rajesh Sankaran Subramanya Dulloor Andy Pavlo Mike Stonebraker Stan Zdonik Jeff Parkhurst

END @ANDY_PAVLO