DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.

DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS 2009 Shimin Chen, Big Data Reading Group

Introduction Goal: improve performance of flash-based devices for workloads with random writes New Proposal: DFTL (Demand-based FTL)  FTL: flash translation layer)  FTL maintains a mapping table: virtual  physical address

Outline Introduction Background on FTL Design of DFTL Experimental Results Summary

Basics of Flash Memory OOB (out-of-band) area:  ECC  Logical page number  State: erased/valid/invalid

Flash Translation Layer Maintain mapping:  Virtual address (exposed to upper level)  physical address (on flash) Use a small, fast SRAM for storing this mapping Hide erase operation to the above  Avoiding in-place update  Updating a clean page  Performing garbage collection and erasure Note:  OOB has the physical  virtual mapping  FTL virtual  physical mapping can be rebuilt (at restart)

Page-Level FTL Keep page to page mapping table Pro: can map any logical page to any physical page  Efficient flash page utilization Con: mapping table is large  E.g., 16GB flash, 2KB flash page, requires 32MB SRAM  As flash size increases, SRAM size must scale  Too expensive!

Block-Level FTL Keep block to block mapping Pro: small  Mapping table size reduced by a factor of (block size / page size) ~ 64 times Con: page number offset within a block is fixed  Garbage collection overheads grow

Hybrid FTLs (a generic description) Data blocks: block-level mapping Log/update blocks: page-level mapping LPN: Logical Page Number

Operations in Hybrid FTLs Update on data blocks: write to log blocks  Log region is small (e.g., 3% of total flash size) Garbage collection (gc)  When no free log blocks are available, invoke gc to merge log blocks with data blocks

Full Merge can be Recursive thus Expensive Often resulted from random writes

DFTL Idea Avoid expensive full merges totally  Do not use log blocks at all Idea:  Use page-level mapping  Keep the full mapping on flash to reduce SRAM use  Exploit temporal locality in workloads  Dynamically load / unload page-level mappings into SRAM

DFTL Architecture Global mapping table

DFTL Address Translation Global mapping table Case 1: request_LPN hits in cache mapping table Done. Retrieve the mapping directly

DFTL Address Translation Global mapping table Case 2: a miss in cache mapping table (CMT) If (CMT is not full) then look up GDT read the translation page fill in CMT entry goto case 1

DFTL Address Translation Global mapping table Case 3: a miss in cache mapping table (CMT) If (CMT is full) then select CMT entry to evict (~LRU) write back dirty entry goto case 2

Address Translation Cost Worst case cost (case 3)  2 translation page reads  1 translation page write Temporal locality:  More hits, fewer misses, fewer evictions CMT contains multiple mappings in a single translation page  Batch updates

Data Read Address translation: LPN  PPN Read the data page PPN

Writes Current data block  Updated data page is appended into current data block Current translation block  Updated translation page is appended into current translation block Until number of free blocks < GC_threshold

Garbage Collection Select a victim block [15] Kawaguchi et al. 1995

Garbage Collection If selected victim block is a translation block  Copy valid page to a free translation block  Update GTD (global translation directory) If selected victim block is a data block  Copy valid page to a free data block  Update the page-level translation for each data block Possibly update CMT entry (if so, done) Locate translation page, update it, change GTD  Batch update opportunities if multiple page-level translations are in the same translation page

Benefits Page-level mapping:  No expensive full merge operations Better random write performance as a result But random writes are still worse than sequential  more CMT misses, more translation page writes  Data pages in a block are more scattered GC costs higher: less opportunities for batch updates

FTL Schemes Implemented FlashSim simulator  The authors enhanced DiskSim Block-based FTL A state-of-the-art hybrid FTL (FAST FTL) DFTL An idealized page-based FTL

Experimental Setup Model 32GB flash memory, 2KB page, 128KB block  Timing is displayed in Table 1

Traces Used in Experiments

Block Erases Baseline: idealized page-level FTL

Extra Read/Write Operations 63% CMT hits for financial

Response Times (from tech report)

address translation overhead shows up

CDF FAST has a long tail

Figure 10. Microscopic analysis

Summary Demand-based page-level FTL Two-level page table:  (Flash) Translation page: LPN to PPN entries  (SRAM) Global translation directory: translation page entries Mapping cache in SRAM

DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.

Similar presentations

Presentation on theme: "DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.

Similar presentations

Presentation on theme: "DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS."— Presentation transcript:

Similar presentations

About project

Feedback