Failure-Atomic Slotted Paging for Persistent Memory

Slides:



Advertisements
Similar presentations
Crash Recovery John Ortiz. Lecture 22Crash Recovery2 Review: The ACID properties  Atomicity: All actions in the transaction happen, or none happens 
Advertisements

IDA / ADIT Lecture 10: Database recovery Jose M. Peña
Better I/O Through Byte-Addressable, Persistent Memory
Recovery CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
CSCI 3140 Module 8 – Database Recovery Theodore Chiasson Dalhousie University.
Crash Recovery. Review: The ACID properties A A tomicity: All actions in the Xaction happen, or none happen. C C onsistency: If each Xaction is consistent,
Recovery 10/18/05. Implementing atomicity Note, when a transaction commits, the portion of the system implementing durability ensures the transaction’s.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Resolving Journaling of Journal Anomaly in Android I/O: Multi-Version B-tree with Lazy Split Wook-Hee Kim 1, Beomseok Nam 1, Dongil Park 2, Youjip Won.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Concurrency Control. Objectives Management of Databases Concurrency Control Database Recovery Database Security Database Administration.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
ThyNVM Enabling Software-Transparent Crash Consistency In Persistent Memory Systems Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and.
Transactional Recovery and Checkpoints. Difference How is this different from schedule recovery? It is the details to implementing schedule recovery –It.
1 Transactional Flash Vijayan Prabhakaran, Thomas L. Rodeheffer, Lidong Zhou Microsoft Research, Silicon Valley Presented by Sudharsan.
NVWAL: Exploiting NVRAM in Write-Ahead Logging
The very Essentials of Disk and Buffer Management.
File System Consistency
Persistent Memory (PM)
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.

Hathi: Durable Transactions for Memory using Flash
CS 540 Database Management Systems
DURABILITY OF TRANSACTIONS AND CRASH RECOVERY
CS422 Principles of Database Systems Failure Recovery
Free Transactions with Rio Vista
Managing Multi-User Databases
Module 11: File Structure
Transactions and Reliability
Transactional Recovery and Checkpoints
J. Xu et al, FAST ‘16 Presenter: Rohit Sarathy April 18, 2017
Enforcing the Atomic and Durable Properties
CS522 Advanced database Systems
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Applications (15-415) DBMS Internals- Part XIII Lecture 22, November 15, 2016 Mohammad Hammoud.
File Processing : Recovery
Faster Data Structures in Transactional Memory using Three Paths
Database Management Systems (CS 564)
Cache Memory Presentation I
Better I/O Through Byte-Addressable, Persistent Memory
Lecture 11: DMBS Internals
Lecture 10: Buffer Manager and File Organization
Persistency for Synchronization-Free Regions
Disk Storage, Basic File Structures, and Buffer Management
CRASH RECOVERY (CHAPTERS 14, 16) (Joint Collaboration with Prof
Database Implementation Issues
Database Recovery Techniques
Disk storage Index structures for files
Introduction to Database Systems
Database Applications (15-415) DBMS Internals- Part XIII Lecture 25, April 15, 2018 Mohammad Hammoud.
Free Transactions with Rio Vista
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Outline Introduction Background Distributed DBMS Architecture
Chapter 13: Data Storage Structures
DATABASE IMPLEMENTATION ISSUES
Contents Memory types & memory hierarchy Virtual memory (VM)
Lecture 20: Intro to Transactions & Logging II
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
ICOM 5016 – Introduction to Database Systems
Database Recovery 1 Purpose of Database Recovery
File Organization.
Database Applications (15-415) DBMS Internals- Part XIII Lecture 24, April 14, 2016 Mohammad Hammoud.
Database Implementation Issues
SQL Statement Logging for Making SQLite Truly Lite
Concurrency Control.
Chapter 13: Data Storage Structures
Chapter 13: Data Storage Structures
Database Implementation Issues
Presentation transcript:

Failure-Atomic Slotted Paging for Persistent Memory ASPLOS’17, Failure-Atomic Slotted Paging for Persistent Memory Jihye Seo, Wook-Hee Kim, Woongki Baek, Beomseok Nam, Sam H. Noh UNIST (Ulsan National Institute of Science and Technology)

Contents Introduction Slotted Page Structure in DBMS Leveraging Persistent Memory in DBMS Slotted Page Structure in DBMS Failure-Atomic Slotted Paging Failure-atomic in-place commit scheme Failure-atomic slot-header logging scheme Evaluation Conclusion

Persistent Memory (PM) Persistent memory is expected to replace both DRAM & NAND NAND STT-MRAM PCM DRAM Non-volatility o x Read (ns) 2.5 X 104 5 - 30 20 – 70 10 Write (ns) 2 X 105 10 - 100 150 - 220 Byte-addressable Density 185.8 Gbit/cm2 0.36 Gbit/cm2 13.5 Gbit/cm2 9.1 Gbit/cm2 K. Suzuki and S. Swanson. “A Survey of Trends in Non-Volatile Memory Technologies: 2000-2014”, IMW 2015 Non-volatile High performance Persistent Memory

How can DBMS benefit from persistent memory? Traditional Database Management System Rely on redundant write operations Copies to volatile memory first and then to persistent secondary storage Query Volatile (DRAM) Buffer Cache Block Device Storage update(EAST) N O R T H N O R T H DB File WAL File Volatile Memory

How can DBMS benefit from persistent memory? Traditional Database Management System Rely on redundant write operations Copies to volatile memory first and then to persistent secondary storage Query Volatile (DRAM) Buffer Cache Block Device Storage update(EAST) N O R T H E O R T H E A R T H E A S T H E A S T N O R T H DB File WAL File

How can DBMS benefit from persistent memory? Traditional Database Management System Rely on redundant write operations Copies to volatile memory first and then to persistent secondary storage Query Volatile (DRAM) Buffer Cache Block Device Storage update(EAST) commit E A S T E A S T N O R T H DB File WAL File

How can DBMS benefit from persistent memory? Traditional Database Management System Rely on redundant write operations Copies to volatile memory first and then to persistent secondary storage Query Volatile (DRAM) Buffer Cache Block Device Storage update(EAST) commit checkpointing E A S T N O R T H DB File E A S T Redundant Copies required in legacy DBMS WAL File

How can DBMS benefit from persistent memory? Considering PM as main memory Do we need a Log file when DB buffer cache is non-volatile? Query Persistent (PM) Buffer Cache Block Device Storage update(EAST) commit Goal Let’s keep a single copy and eliminate any redundant write operations E A S T N O R T H DB File E A S T WAL File

How can DBMS benefit from persistent memory? Challenge How to guarantee ACID with persistent buffer caching? E.g.) Updates must be invisible until the transaction commits Query Persistent (PM) Buffer Cache Block Device Storage update(EAST) N O R T H N O R T H DB File E A R T H If system crash is caused, committed data is lost!

Contents Introduction Slotted Page Structure The most widely used database page format for variable-length records Failure-Atomic Slotted Paging Failure-atomic in-place commit scheme Failure-atomic slot-header logging scheme Evaluation Conclusion

Slotted Page Structure Slot Header # of records record offset array for the ordering of keys Record Content Area record contents (Append only) Free Space Slot Header Metadata Free space Free space 1024 Number of Records

Slotted Page Structure Slot Header # of records record offset array for the ordering of keys Record Content Area record contents (Append only) Free Space Sorted Keys 30 Slot Header Slot Header Metadata Metadata Record Offset Array Record Content Area 1 Free space 1000 Free space Free space Key = 30 1000 1024 Number of Records

Slotted Page Structure Slot Header # of records record offset array for the ordering of keys Record Content Area record contents (Append only) Free Space Sorted Keys 30 50 Slot Header Slot Header Metadata Record Offset Array Record Offset Array Record Content Area Record Content Area 1 2 1000 900 Free space Free space Key = 50 Key = 30 900 1000 1024 Number of Records

Slotted Page Structure Slot Header # of records record offset array for the ordering of keys Record Content Area record contents (Append only) Free Space 40 Sorted Keys 30 50 Slot Header Slot Header Metadata Record Offset Array Record Offset Array Record Content Area Record Content Area 1 2 3 1000 1000 1000 900 900 800 Free space Free space Free space Free space Key = 40 Key = 50 Key = 30 800 900 1000 1024 Number of Records

Contents Introduction Slotted Page Structure Failure-Atomic Slotted Paging Failure-atomic in-place commit scheme Failure-atomic slot-header logging scheme Evaluation Conclusion

Failure-atomic In-place Commit Scheme Single write operation guarantees the failure-atomicity for DB When a record (key = 20) is inserted 900 1000 Slot Header 900 1000 Free space Record Content Area Key = 10 Key = 30 2 invisible Free space Key = 20 800 ① Writing the record into the record content area Since the slot header is not updated, the dirty record is invisible.

Failure-atomic In-place Commit Scheme Single write operation guarantees the failure-atomicity for DB When a record (key = 20) is inserted 900 1000 Slot Header 900 Free space Record Content Area Key = 10 Key = 30 3 1000 visible 800 Free space Free space Key = 20 800 ② Writing to the slot header Slot header is used as a commit mark PM is expected to guarantee failure-atomic writes of 8-bytes. How do we atomically update the slot header?

Failure-atomic In-place Commit Scheme Hardware Transactional Memory (HTM) We guarantee failure-atomic cache line write operation using HTM Persistent Memory Dirty Record of Slotted Page Slot Header DB Buffer Cache Write combining store buffer (64B) in Private L1 Cache Insert XBEGIN Slot header XEND 3 CLWBs/ MFENCE CLWB/ MFENCE

What if multiple pages are modified? A DB transaction may modify multiple records in multiple tables With HTM, we guarantee the failure-atomic write operations for only a single cache line When a transaction updates multiple pages, in-place commit scheme is not enough for inserting data atomically Bank account Transaction Seller_account += 100; Buyer_account -= 100; Sales_item_in_stop -= 1; Sales item

Failure-atomic Slot-header Logging Scheme Example: Move data with key 20 from page A to page B Page A 800 900 1000 Slot Header 900 800 Record Content Area Key = 10 Key = 30 Key = 20 Free space 1000 3 Page B 900 1000 Slot Header 1000 900 Record Content Area Key = 50 Key = 40 Free space 2

Failure-atomic Slot-header Logging Scheme Example: Move data with key 20 from page A to page B Page A 800 900 1000 Slot Header 900 800 Record Content Area Key = 10 Key = 30 Key = 20 Free space 1000 3 Page B ① Writing the record 900 1000 Slot Header 1000 900 Record Content Area Key = 50 Key = 40 Free space 2 Key = 20 invisible

Failure-atomic Slot-header Logging Scheme Example: Move data with key 20 from page A to page B Page A ② Updating the slot header A 800 900 1000 Slot Header 900 1000 Record Content Area Key = 10 Key = 30 Key = 20 Free space 2 invisible Page B 900 1000 Slot Header 1000 900 Record Content Area Key = 50 Key = 40 Free space 2 Key = 20 invisible

Failure-atomic Slot-header Logging Scheme Example: Move data with key 20 from page A to page B Page A 800 900 1000 Slot Header 900 1000 Record Content Area Key = 10 Key = 30 Key = 20 Free space 2 invisible Page B ③ Updating the slot header B 800 900 1000 Slot Header 800 900 Record Content Area Key = 50 Key = 40 Free space 3 1000 Key = 20 visible

Failure-atomic Slot-header Logging Scheme Same example in system failure Page A 800 900 1000 Slot Header 900 800 Record Content Area Key = 10 Key = 30 Key = 20 Free space 1000 3 Page B 900 1000 Slot Header 1000 900 Record Content Area Key = 50 Key = 40 Free space 2

Failure-atomic Slot-header Logging Scheme Same example in system failure Page A 800 900 1000 Slot Header 900 800 Record Content Area Key = 10 Key = 30 Key = 20 Free space 1000 3 Page B ① Writing the record 900 1000 Slot Header 1000 900 Record Content Area Key = 50 Key = 40 Free space 2 Key = 20 invisible

Failure-atomic Slot-header Logging Scheme Same example in system failure Page A ② Updating the slot header A 800 900 1000 Slot Header 900 1000 Record Content Area Key = 10 Key = 30 Key = 20 Free space 2 invisible If system crash is caused in the second step, data whose key is 20 is lost! Page B 900 1000 Slot Header 1000 900 Record Content Area Key = 50 Key = 40 Free space 2 Key = 20 invisible

Failure-atomic Slot-header Logging Scheme We propose small slot-header logging w/o duplicating the records Persistent (PM) Buffer Cache A 3 20 10 30 B 2 20 50 40 dirty record Slot Header Log A 2 B 3 commit

Failure-atomic Slot-header Logging Scheme We propose small slot-header logging w/o duplicating the records Persistent (PM) Buffer Cache A 3 20 10 30 10 30 B 2 20 50 40 20 50 40 dirty record Slot Header Log A 2 B 3 commit

Failure-atomic Slot-header Logging Scheme We propose small slot-header logging w/o duplicating the records How slot-header logging works CLWB/ MFENCE Persistent Memory Dirty Slotted Page A Dirty Slotted Page B Slot Header of Page A Slot Header of Page B DB Buffer Cache Slot-Header Log A B 2 3 commit Insert() logHeader() logHeader() commit() checkpoint() Recovery Ignore dirty records Ignore slot-header logs Do checkpointing

Contents Introduction Slotted Page Structure Failure-Atomic Slotted Paging Failure-atomic in-place commit scheme Failure-atomic slot-header logging scheme Evaluation Conclusion

Experimental Environment Testbed Intel Xeon Haswell-EX E7-8860 v3 (2.2GHz) 256 GB DDR3 Memory We implemented Failure-Atomic Slotted Paging in SQLite 3.8 FASH : Failure-Atomic Slot-Header logging FAST : Failure-Atomic Slot-header with in-place commiT with HTM(RTM) We compared FASH and FAST with NVWAL* NVWAL proposed for use in hybrid memory (DRAM+PM) We used software persistent memory emulator - Quartz for read latency For write latency, we injected additional delay after clflush instruction * NVWAL: Exploiting NVRAM in Write-Ahead-Logging. ASPLOS, 2016.

Experimental Environment Differences between FAST, FASH and NVWAL NVWAL FASH FAST Single page update Differential logging Slot-header logging In-place commit Multiple page update Buffer cache In DRAM In PM Log DRAM Persistent Storage Hybrid memory architecture PM-only architecture Persistent Memory DB File WAL File Volatile Buffer Cache Persistent Buffer Cache VS

Breakdown of Time Spent for B-tree Insertion 2.1x 2.6x Both failure-atomic slotted paging schemes outperform NVWAL NVWAL incurs considerable commit overhead due to multiple copies

Breakdown of Time Spent for B-tree Insertion 1.2x Both failure-atomic slotted paging schemes outperform NVWAL NVWAL incurs considerable commit overhead due to multiple copies FAST is faster than FASH FAST doesn’t have logging overhead if overflow or underflow does not happen

Insertion Performance FAST and FASH consistently outperform NVWAL FAST and FASH do not duplicate write operations for records NVWAL generates large log frames for large records FASH calls more clflush instructions for small record sizes The reason is that with smaller records, the slotted-page can hold more records FAST calls about 3 clflush instructions when the record is smaller than 64 bytes The slot-header size of FAST must be less than 64bytes.

Insertion Performance FAST and FASH consistently outperform NVWAL FAST and FASH do not duplicate write operations for records NVWAL generates large log frames for large records FASH calls more clflush instructions for small record sizes The reason is that with smaller records, the slotted-page can hold more records FAST calls about 3 clflush instructions when the record is smaller than 64 bytes The slot-header size of FAST must be less than 64bytes

Transaction Throughput 31% 15% 33% Real world effect – mobibench Compared to NVWAL, the transaction throughput of FASH and FAST are 31% and 33% higher, respectively These results show database transactions are less sensitive to PM latency

Contents Introduction Slotted Page Structure Failure-Atomic Slotted Paging Failure-atomic in-place commit scheme Failure-atomic slot-header logging scheme Evaluation Conclusion

Conclusion We proposed a novel “failure-atomic slotted paging” for PM In-place commit scheme Minimizes redundant write operations Slot-header logging scheme Eliminates unnecessary redundant copies Failure-atomic slotted paging outperforms NVWAL In-place commit scheme shows optimal performance Only 3 cache line flushes for database transactions that insert a single record ( < 64B ) Slot-header logging reduces the logging overhead At least 1/4 compared to NVWAL PM-only memory systems can perform faster than hybrid memory systems that consist of both PM and DRAM

Data Intensive Computing Lab Thank You Data Intensive Computing Lab http://dicl.unist.ac.kr/ jihye@unist.ac.kr