Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.

Similar presentations


Presentation on theme: "Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group."— Presentation transcript:

1 Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group

2 Introduction SSD: block-level APIs as disks Lost of opportunity Goal: new abstractions for better matching the nature of the new medium as well as the need from file systems and databases

3 Idea: Transactional Flash (Txflash) An SSD (w/ new features) Addressing: a linear array of pages Support read and write operations Support a simple transactional construct Each tranx consists of a series of write operations Atomicity Isolation Durability

4 Why is this useful? Transaction abstraction required in many places: file system journals, etc. Each application implements its own Complexity Redundant work Reliability of the implementation Great if a storage layer provides transactional API

5 Previous Work: disk-based Copy-on-Write + Logging Fragmentation  poor read performance Checkpointing and cleaning Cleaning cost SSDs mitigate these problems SSDs already do CoW for flash-related reasons Random read accesses are fast

6 Outline Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

7 TxFlash Architecture & API s WriteAtomic(p1…pn) p1…pn are in a tranx followed by write(p1)…write(pn) atomicity, isolation, durability Abort aborting in-progress tranx In-progress tranx Not issue conflict writes Core of TxFlash

8 Simple Interface WriteAtomic: multi-page writes Useful for file systems Not full-fledged tranx: no reads in tranx Reduce complexity Backward compatible

9 Flash is good for this purpose Copy-on-write: already supported by FTL Fast random reads High concurrency multiple flash chips inside New device: New interface more likely

10 Outline Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

11 Traditional Commit First write to a log: Intention record: (data, page# & version#, tranx ID) … Intention record Commit record Tranx is committed == commit record exists Intention records  modify original data If modifications are done, the records can be garbage collected

12 Traditional Commit on SSDs Optimizations: All writes can be issued in parallel Not update the original data, just update the remap table Problem: commit record Extra latency after other writes Garbage collection is complicated: Must know if all the updates complete or not

13 New Proposal (1): Simple Cyclic Commit No commit record Intension records of the same tranx use next links to form a cycle (data, page# & version#, next page# & version#) Tranx is committed == all intension records are written Flash page (4KB) + metadata (128B) are co-located

14 Problem

15 Solution: Any uncommitted intention on the stable storage must be erased before any new writes are issued to the same or a referenced page

16 Operations Initialization: Setting version# to 0, next-link to self Transaction Garbage Collection: For any uncommitted intention For committed page if a newer version is committed Recovery: scan all pages then look for cycles

17 New Proposal (2): Back Pointer Cyclic Commit Another way to deal with ambiguity Intention record: (data, page#&version#, next-link, link to last committed version)

18 A3 is a straddler of A2 Some complexity in garbage collection and recovery because of this

19 Protocol Comparison

20 Outline Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

21 Implementation Simulatior DiskSim  trace-driven SSD simulator (UNIX’08)  modifications for TxFlash Support tranx of maximum size 4MB Pseudo-device driver for recording traces TxExt3: Employ Txflash for Ext3 file system Tranx: Ext3 journal commit

22 Experimental Setup TxFlash device: 32GB: 8x 4GB flash packages 4 I/O operations within every flash package 15% of space reserved for garbage collection Workload on top of Ext3: IOzone: micro benchmark (no sync writes) Linux-build (no sync writes) Maildir (sync writes) TPC-B: simulate 10,000 credit-debit-like operations on TxExt3 file system (sync writes) Synthetic workloads

23 Cyclic commit vs. Traditional commit

24 Unlike database logging, large tranx sizes: no sync; data are included

25 simple cyclic commit has a high cost if there are aborts

26

27 TxFlash vs. SSD Remove WriteAtomic from traces Use SSD simulator SSD does not provide any transaction guarantees (so should have better performance)

28 Space comparison: TxFlash needs 25% of more main memory than SSD 4+1 MB per 4GB flash  40 MB for the 32GB TxFlash device

29 End-to-end performance TxFlash: Run pseudo-device driver on real SSD The performance is close to that of TxFlash Ext3: Use SSD as journal SSD cache is disabled in both cases

30

31 Summary TxFlash: Adding transaction interface in SSD Cyclic commit protocols Nice solution for file system journaling


Download ppt "Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group."

Similar presentations


Ads by Google