A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:

Slides:



Advertisements
Similar presentations
Paper by: Yu Li, Jianliang Xu, Byron Choi, and Haibo Hu Department of Computer Science Hong Kong Baptist University Slides and Presentation By: Justin.
Advertisements

Crash Recovery John Ortiz. Lecture 22Crash Recovery2 Review: The ACID properties  Atomicity: All actions in the transaction happen, or none happens 
CS 440 Database Management Systems Lecture 10: Transaction Management - Recovery 1.
Trading Flash Translation Layer For Performance and Lifetime
Loose-Ordering Consistency for Persistent Memory
CSCI 3140 Module 8 – Database Recovery Theodore Chiasson Dalhousie University.
Chapter 19 Database Recovery Techniques
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
Chapter 11: File System Implementation
G Robert Grimm New York University Recoverable Virtual Memory.
Boost Write Performance for DBMS on Solid State Drive Yu LI.
ICS (072)Database Recovery1 Database Recovery Concepts and Techniques Dr. Muhammad Shafique.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Chapter 19 Database Recovery Techniques. Slide Chapter 19 Outline Databases Recovery 1. Purpose of Database Recovery 2. Types of Failure 3. Transaction.
G Robert Grimm New York University Recoverable Virtual Memory.
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Remote Backup Systems.
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
1 The Google File System Reporter: You-Wei Zhang.
Logging in Flash-based Database Systems Lu Zeping
JOURNALING VERSUS SOFT UPDATES: ASYNCHRONOUS META-DATA PROTECTION IN FILE SYSTEMS Margo I. Seltzer, Harvard Gregory R. Ganger, CMU M. Kirk McKusick Keith.
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.
Recovery System By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
Resolving Journaling of Journal Anomaly in Android I/O: Multi-Version B-tree with Lazy Split Wook-Hee Kim 1, Beomseok Nam 1, Dongil Park 2, Youjip Won.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.
Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by Chris Homan.
Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by RuBao Li, Zinan Li.
Operating Systems CMPSC 473 I/O Management (3) December 07, Lecture 24 Instructor: Bhuvan Urgaonkar.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
CS333 Intro to Operating Systems Jonathan Walpole.
Outline for Today Journaling vs. Soft Updates Administrative.
연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin
Lecture 22 SSD. LFS review Good for …? Bad for …? How to write in LFS? How to read in LFS?
Lecture 20 FSCK & Journaling. FFS Review A few contributions: hybrid block size groups smart allocation.
JOURNALING VERSUS SOFT UPDATES: ASYNCHRONOUS META-DATA PROTECTION IN FILE SYSTEMS Margo I. Seltzer, Harvard Gregory R. Ganger, CMU M. Kirk McKusick Keith.
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
XIP – eXecute In Place Jiyong Park. 2 Contents Flash Memory How to Use Flash Memory Flash Translation Layers (Traditional) JFFS JFFS2 eXecute.
Application-Managed Flash
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
1 Transactional Flash Vijayan Prabhakaran, Thomas L. Rodeheffer, Lidong Zhou Microsoft Research, Silicon Valley Presented by Sudharsan.
CPSC 426: Building Decentralized Systems Persistence
Jun-Ki Min. Slide Purpose of Database Recovery ◦ To bring the database into the last consistent stat e, which existed prior to the failure. ◦
File System Consistency

Remote Backup Systems.
FlashTier: A Lightweight, Consistent and Durable Storage Cache
Failure-Atomic Slotted Paging for Persistent Memory
Jonathan Walpole Computer Science Portland State University
Shiqin Yan, Huaicheng Li, Mingzhe Hao,
HashKV: Enabling Efficient Updates in KV Storage via Hashing
CS 632 Lecture 6 Recovery Principles of Transaction-Oriented Database Recovery Theo Haerder, Andreas Reuter, 1983 ARIES: A Transaction Recovery Method.
Address-Value Delta (AVD) Prediction
Overview Continuation from Monday (File system implementation)
Overview: File system implementation (cont)
Recovery System.
Parallel Garbage Collection in Solid State Drives (SSDs)
File System Implementation
Remote Backup Systems.
Design Tradeoffs for SSD Performance
The Design and Implementation of a Log-Structured File System
Presentation transcript:

A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx: 1 Tsinghua University 2 Carnegie Mellon University

Problem: Flash-based SSDs support transactions naturally (with out-of-place updates) but inefficiently: – Only a limited set of isolation levels are supported (inflexible) – Identifying transaction status is costly (heavyweight) Goal: a lightweight design to support flexible transactions Observations and Key Ideas: – Simultaneous updates can be written to different physical pages, and the FTL mapping table determines the ordering => (Flexibility) make commit protocol page-independent – Transactions have birth and death, and the near-logged update way enables efficient tracking => (Lightweight) track recently updated flash blocks, and retire the dead transactions Results: up to 20.6% performance improvement, stable GC overhead, fast recovery with negligible persistence overhead Executive Summary 2

3 FTL (Flash Translation Layer) Address mapping, garbage collection, wear leveling Out-of-place Update (address mapping) Pages are updated to new physical pages instead of overwriting original pages Internal Parallelism New pages are allocated from different pkgs/planes Page metadata (OOB): ( )Bytes SSD Basics

4 Simultaneous updates and FTL ordering (Out-of-place update) pages for the same LBA can be updated simultaneously (Ordering in mapping table) Only when the mapping table is updated, the write is visible to the external Near-logged update way Pages are allocated from blocks over different parallel units Pages are sequentially allocated from each block Block Two Observations

Outline Executive Summary Background Traditional Software Transactions Existing Hardware Transactions LightTx Design Evaluation Conclusions 5

HDD SSD Traditional S/W Transaction 6 Logical View FTL Mapping Table Transaction: Atomicity and Durability Software Transaction – Duplicate writes – Synchronization for ordering Data AreaLog Area Data AreaLog Area

7 We have both old and new versions in the SSD (out-of-place update). Why shall we write the log? Why not support transactions inside the SSD?

8 Existing H/W Approaches Atomic-Write [HPCA’11] – Log-structured FTL – Commit protocol: Tag the last page “1”, while the others “0” – Limited Parallelism: one tx at a time – High mapping persistence overhead: persistence on each commit SCC/BPCC (Cyclic commit protocols) [OSDI’08] – Commit Protocol: Link all flash pages in a cyclic list by keeping pointers in page metadata – High overhead in differentiate broken cyclic lists for partial erased committed txs and aborted txs – SCC forces aborted pages erased before writing the new one – BPCC delays the erase of pages to its previous aborted pages are erased – Limited Parallelism: txs without overlapped accesses are allowed [HPCA’11] Beyond block i/o: Rethinking traditional storage primitives [OSDI’08] Transactional flash

Problems: Tx support is inflexible (limited parallelism) – Cannot meet the flexible demands from software – Cannot fully exploit the internal parallelism of SSDs Tx state tracking causes high overhead in the device Our Goal: A lightweight design to support flexible transactions 9

Outline Executive Summary Background LightTx Design Design Overview Page Independent Commit Protocol Zone-based Transaction State Tracking Evaluation Conclusions 10

Goal Flexible Page-independent commit protocol: support simultaneous updates, to enable flexible isolation level choices in the system Lightweight Zone-based transaction state tracking scheme: track only blocks that have live txs and retire the dead ones, to reduce lower the cost A lightweight design to support flexible transactions 11

Page-independent Commit Protocol Observations: – Simultaneous Updates (Out-of-place update) – Version order (FTL mapping table) A BC D E Version number Page How to support this? – Extend the storage interface – Make commit protocol page-independent 12

Design Overview Transaction Primitive – BEGIN( TxID ) – COMMIT( TxID ) – ABORT( TxID ) – WRITE( TxID, LBA, len …) 13

Page-independent Commit Protocol Transactional metadata: – TxID – TxCnt: (00…0N) – TxVer: commit sequence Keep it in the page metadata of each flash page T0, 0 T0, 5 T1, 0 T1, 2 T3, 1 A BC D E Version number Page 14

Zone-based Transaction State Tracking Transaction Lifetime Writes appended in the free flash blocks – Retire the dead: write back the mapping table, and remove the dead from tx state maintenance – Track the recently updated flash blocks Can we write back the mapping back for each commit? -Ordering cost (waiting for mapping table persistence) -Mapping persistent is not atomic 15

Block Zones – Free block: all pages are free – Available block: pages are available for allocation – Unavailable block: all pages have been written to but some pages belong to (1) a live tx, or (2) a dead tx but has at least one page in some available block – Checkpointed block: all pages have been written to and all pages belong to dead txs Respectively, we have Free, Available, Unavailable and Checkpointed Zones. 16

Checkpoint – Periodically write back the mapping table (making the txs dead) – And, sliding the zones (available + unavailable) Zone Sliding – Check all blocks in available and unavailable zones Move the block to the checkpointed zone if the block is checkpointed Move the block to the unavailable zone if the block is unavailable – Pre-allocate free blocks to the available zone – Garbage collection is only performed on the checkpointed zone 17

(1) Available Zone Updating Tx4, 0 Tx3, 0 Tx4, 0 Tx3, 0 Tx3, 4 Tx5, Tx3 Tx4 Tx Tx3 18

(2) Zone Sliding Tx4, 0 Tx3, 0 Tx4, 0 Tx3, 0 Tx3, 4 Tx5, Tx3 Tx4 Tx5 Tx4, 5Tx5, 0 Tx6, 0 Tx7, 0 Tx9, 0Tx8, Tx6 Tx7 Tx8 Tx Tx8, 0 Abort Tx Tx Tx Tx4 7-3 Tx7, 2 Tx8, 3 Tx9, Tx7 Tx Tx4, 0 Tx5,

(3) Zone Sliding Tx4, 0 Tx3, 0 Tx4, 0 Tx3, 0 Tx3, 4 Tx5, Tx3 Tx4 Tx5 Tx4, 5Tx5, 0 Tx6, 0 Tx7, 0 Tx9, 0Tx8, Tx6 Tx7 Tx8 Tx Tx8, 0 Abort Tx Tx Tx Tx4 7-3 Tx7, 2 Tx8, 3 Tx9, Tx7 Tx Tx4, 0 Tx5, 0 Tx3, 0 Tx4, 0 Tx3, 0 Tx3, 4 Tx5, 0 Tx4, 5Tx5, 0 Tx6, 0 Tx7, 0 Tx9, 0 Tx8, 0 Tx4, 0 Tx5, 0 20

(4) System Failure Tx4, 0 Tx3, 0 Tx4, 0 Tx3, 0 Tx3, 4 Tx5, Tx3 Tx4 Tx5 Tx4, 5Tx5, 0 Tx6, 0 Tx7, 0 Tx9, 0Tx8, Tx6 Tx7 Tx8 Tx Tx8, 0 Abort Tx Tx Tx Tx4 7-3 Tx7, 2 Tx8, 3 Tx9, Tx7 Tx Tx4, 0 Tx5, 0 Tx3, 0 Tx4, 0 Tx3, 0 Tx3, 4 Tx5, 0 Tx4, 5Tx5, 0 Tx6, 0 Tx7, 0 Tx9, 0 Tx8, 0 Tx4, 0 Tx5, Tx10 Tx Tx Tx11, 0 Tx10, 0 Tx11, 3 Tx11, 0 Tx9,

Recovery Tx3, 0 Tx4, 0 Tx4, 5 Tx3, 0 Tx3, 4 Tx5, 0 Tx6, 0 Tx7, 0 Tx9, 0 Tx8, 0 Tx7, 2 Tx8, 3 Tx11, 0 Tx10, 0 Tx11, 3 Tx11, 0 Tx9, 0 Tx3, 0 Tx4, 0 Tx4, 5 Tx3, 0 Tx3, 4 Tx5, 0 Tx7, 0 Tx9, 0 Tx8, Tx7 Tx8 Tx Tx10 Tx Scan the available zone Scan the unavailable zone 22

Recovery – Scan the available zone If TxCnt matches, completed tx If not, add the tx to the pending list – Scan the unavailable zone If TxID in the pending list, check TxCnt again. If TxCnt matches, completed tx If txID not in the pending list, discard it If TxCnt still doesn’t match, uncompleted tx – Replay with the sequence of TxVer 23

Outline Executive Summary Background LightTx Design Evaluation Conclusions 24

Experimental Setup SSD simulator – SSD add-on from Microsoft on DiskSim – Parameters from Samsung K9F8G08UXM NAND flash Trace – TPC-C benchmark: DBT2 on PostgreSQL

Flexibility 26 (1) For a given isolation level, LightTx provides as good or better tx throughput than other protocols. (2) In LightTx, no-page-conflict and serialization isolation improve throughput by 19.6% and 20.6% over strict isolation.

Garbage Collection Cost 27 (1) LightTx significantly outperforms SCC/BPCC when abort ratio is not zero. (2)Garbage collection overhead in SCC/BPCC goes extremely high when abort ratio goes up.

Recovery Time and Persistence Overhead 28 LightTx achieves fast recovery with low mapping persistence overhead.

Outline Executive Summary Background LightTx Design Evaluation Conclusions 29

Conclusion 30 Problem: Flash-based SSDs support transactions naturally (with out-of-place updates) but inefficiently: – Only a limited set of isolation levels are supported (inflexible) – Identifying transaction status is costly (heavyweight) Goal: a lightweight design to support flexible transactions Observations and Key Ideas: – Simultaneous updates can be written to different physical pages, and the FTL mapping table determines the ordering => (Flexibility) make commit protocol page-independent – Transactions have birth and death, and the near-logged update way enables efficient tracking => (Lightweight) track recently updated flash blocks, and retire the dead transactions Results: up to 20.6% performance improvement, stable GC overhead, fast recovery with negligible persistence overhead

A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx: 1 Tsinghua University 2 Carnegie Mellon University Thanks 31

32 Backup Slides

Existing Approaches (1) Atomic Writes – Log-structured FTL – Transaction state: [HPCA’11] Beyond block i/o: Rethinking traditional storage primitives + No logging, no commit record + No tx state maintenance cost - Poor parallelism - Mapping persistence overhead 33

Existing Approaches (2) A BC D E Version number Page SCC/BPCC (Cyclic commit protocol) – Use pointers in the page metadata to put all pages in a cycle for each tx [OSDI’08] Transactional flash 34

SCC – Block erase forced for aborted pages A BC D E Version number Page - Low garbage collection efficiency: lots of data moves due to forced block erase 35

SRS(D3) = {C2, D2} SRS(E3) = {D2} SRS(D4) = {C2, D2, D3} A BC D E Version number Page BPCC – SRS: Straddle Responsibility Set – Erasable only after SRS is empty - Complex and costly SRS updates - Low garbage collection efficiency: wait until SRS is empty 36

+ No logging, no commit record + Improved parallelism - Limited parallelism - High tx state maintenance overhead + No logging, no commit record + No tx state maintenance overhead - Poor parallelism - Mapping persistence overhead Problems: Tx support is inflexible (limited parallelism) – Cannot meet the flexible demands from software – Cannot fully exploit the internal parallelism of SSDs Tx state tracking causes high overhead in the device A lightweight design to support flexible transactions Atomic Writes SCC/BPCC 37