Presentation is loading. Please wait.

Presentation is loading. Please wait.

DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.

Similar presentations


Presentation on theme: "DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS."— Presentation transcript:

1 DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS 2009 Shimin Chen, Big Data Reading Group

2 Introduction Goal: improve performance of flash-based devices for workloads with random writes New Proposal: DFTL (Demand-based FTL)  FTL: flash translation layer)  FTL maintains a mapping table: virtual  physical address

3 Outline Introduction Background on FTL Design of DFTL Experimental Results Summary

4 Basics of Flash Memory OOB (out-of-band) area:  ECC  Logical page number  State: erased/valid/invalid

5 Flash Translation Layer Maintain mapping:  Virtual address (exposed to upper level)  physical address (on flash) Use a small, fast SRAM for storing this mapping Hide erase operation to the above  Avoiding in-place update  Updating a clean page  Performing garbage collection and erasure Note:  OOB has the physical  virtual mapping  FTL virtual  physical mapping can be rebuilt (at restart)

6 Page-Level FTL Keep page to page mapping table Pro: can map any logical page to any physical page  Efficient flash page utilization Con: mapping table is large  E.g., 16GB flash, 2KB flash page, requires 32MB SRAM  As flash size increases, SRAM size must scale  Too expensive!

7 Block-Level FTL Keep block to block mapping Pro: small  Mapping table size reduced by a factor of (block size / page size) ~ 64 times Con: page number offset within a block is fixed  Garbage collection overheads grow

8 Hybrid FTLs (a generic description) Data blocks: block-level mapping Log/update blocks: page-level mapping LPN: Logical Page Number

9 Operations in Hybrid FTLs Update on data blocks: write to log blocks  Log region is small (e.g., 3% of total flash size) Garbage collection (gc)  When no free log blocks are available, invoke gc to merge log blocks with data blocks

10 Full Merge can be Recursive thus Expensive Often resulted from random writes

11 Outline Introduction Background on FTL Design of DFTL Experimental Results Summary

12 DFTL Idea Avoid expensive full merges totally  Do not use log blocks at all Idea:  Use page-level mapping  Keep the full mapping on flash to reduce SRAM use  Exploit temporal locality in workloads  Dynamically load / unload page-level mappings into SRAM

13 DFTL Architecture Global mapping table

14 DFTL Address Translation Global mapping table Case 1: request_LPN hits in cache mapping table Done. Retrieve the mapping directly

15 DFTL Address Translation Global mapping table Case 2: a miss in cache mapping table (CMT) If (CMT is not full) then look up GDT read the translation page fill in CMT entry goto case 1

16 DFTL Address Translation Global mapping table Case 3: a miss in cache mapping table (CMT) If (CMT is full) then select CMT entry to evict (~LRU) write back dirty entry goto case 2

17 Address Translation Cost Worst case cost (case 3)  2 translation page reads  1 translation page write Temporal locality:  More hits, fewer misses, fewer evictions CMT contains multiple mappings in a single translation page  Batch updates

18 Data Read Address translation: LPN  PPN Read the data page PPN

19 Writes Current data block  Updated data page is appended into current data block Current translation block  Updated translation page is appended into current translation block Until number of free blocks < GC_threshold

20 Garbage Collection Select a victim block [15] Kawaguchi et al. 1995

21 Garbage Collection If selected victim block is a translation block  Copy valid page to a free translation block  Update GTD (global translation directory) If selected victim block is a data block  Copy valid page to a free data block  Update the page-level translation for each data block Possibly update CMT entry (if so, done) Locate translation page, update it, change GTD  Batch update opportunities if multiple page-level translations are in the same translation page

22 Benefits Page-level mapping:  No expensive full merge operations Better random write performance as a result But random writes are still worse than sequential  more CMT misses, more translation page writes  Data pages in a block are more scattered GC costs higher: less opportunities for batch updates

23 Outline Introduction Background on FTL Design of DFTL Experimental Results Summary

24 FTL Schemes Implemented FlashSim simulator  The authors enhanced DiskSim Block-based FTL A state-of-the-art hybrid FTL (FAST FTL) DFTL An idealized page-based FTL

25 Experimental Setup Model 32GB flash memory, 2KB page, 128KB block  Timing is displayed in Table 1

26 Traces Used in Experiments

27 Block Erases Baseline: idealized page-level FTL

28 Extra Read/Write Operations 63% CMT hits for financial

29 Response Times (from tech report)

30 CDF

31 address translation overhead shows up

32 CDF FAST has a long tail

33 Figure 10. Microscopic analysis

34 Summary Demand-based page-level FTL Two-level page table:  (Flash) Translation page: LPN to PPN entries  (SRAM) Global translation directory: translation page entries Mapping cache in SRAM


Download ppt "DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS."

Similar presentations


Ads by Google