Download presentation
Presentation is loading. Please wait.
Published byKimberly Shepherd Modified over 8 years ago
1
DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS 2009 Shimin Chen, Big Data Reading Group
2
Introduction Goal: improve performance of flash-based devices for workloads with random writes New Proposal: DFTL (Demand-based FTL) FTL: flash translation layer) FTL maintains a mapping table: virtual physical address
3
Outline Introduction Background on FTL Design of DFTL Experimental Results Summary
4
Basics of Flash Memory OOB (out-of-band) area: ECC Logical page number State: erased/valid/invalid
5
Flash Translation Layer Maintain mapping: Virtual address (exposed to upper level) physical address (on flash) Use a small, fast SRAM for storing this mapping Hide erase operation to the above Avoiding in-place update Updating a clean page Performing garbage collection and erasure Note: OOB has the physical virtual mapping FTL virtual physical mapping can be rebuilt (at restart)
6
Page-Level FTL Keep page to page mapping table Pro: can map any logical page to any physical page Efficient flash page utilization Con: mapping table is large E.g., 16GB flash, 2KB flash page, requires 32MB SRAM As flash size increases, SRAM size must scale Too expensive!
7
Block-Level FTL Keep block to block mapping Pro: small Mapping table size reduced by a factor of (block size / page size) ~ 64 times Con: page number offset within a block is fixed Garbage collection overheads grow
8
Hybrid FTLs (a generic description) Data blocks: block-level mapping Log/update blocks: page-level mapping LPN: Logical Page Number
9
Operations in Hybrid FTLs Update on data blocks: write to log blocks Log region is small (e.g., 3% of total flash size) Garbage collection (gc) When no free log blocks are available, invoke gc to merge log blocks with data blocks
10
Full Merge can be Recursive thus Expensive Often resulted from random writes
11
Outline Introduction Background on FTL Design of DFTL Experimental Results Summary
12
DFTL Idea Avoid expensive full merges totally Do not use log blocks at all Idea: Use page-level mapping Keep the full mapping on flash to reduce SRAM use Exploit temporal locality in workloads Dynamically load / unload page-level mappings into SRAM
13
DFTL Architecture Global mapping table
14
DFTL Address Translation Global mapping table Case 1: request_LPN hits in cache mapping table Done. Retrieve the mapping directly
15
DFTL Address Translation Global mapping table Case 2: a miss in cache mapping table (CMT) If (CMT is not full) then look up GDT read the translation page fill in CMT entry goto case 1
16
DFTL Address Translation Global mapping table Case 3: a miss in cache mapping table (CMT) If (CMT is full) then select CMT entry to evict (~LRU) write back dirty entry goto case 2
17
Address Translation Cost Worst case cost (case 3) 2 translation page reads 1 translation page write Temporal locality: More hits, fewer misses, fewer evictions CMT contains multiple mappings in a single translation page Batch updates
18
Data Read Address translation: LPN PPN Read the data page PPN
19
Writes Current data block Updated data page is appended into current data block Current translation block Updated translation page is appended into current translation block Until number of free blocks < GC_threshold
20
Garbage Collection Select a victim block [15] Kawaguchi et al. 1995
21
Garbage Collection If selected victim block is a translation block Copy valid page to a free translation block Update GTD (global translation directory) If selected victim block is a data block Copy valid page to a free data block Update the page-level translation for each data block Possibly update CMT entry (if so, done) Locate translation page, update it, change GTD Batch update opportunities if multiple page-level translations are in the same translation page
22
Benefits Page-level mapping: No expensive full merge operations Better random write performance as a result But random writes are still worse than sequential more CMT misses, more translation page writes Data pages in a block are more scattered GC costs higher: less opportunities for batch updates
23
Outline Introduction Background on FTL Design of DFTL Experimental Results Summary
24
FTL Schemes Implemented FlashSim simulator The authors enhanced DiskSim Block-based FTL A state-of-the-art hybrid FTL (FAST FTL) DFTL An idealized page-based FTL
25
Experimental Setup Model 32GB flash memory, 2KB page, 128KB block Timing is displayed in Table 1
26
Traces Used in Experiments
27
Block Erases Baseline: idealized page-level FTL
28
Extra Read/Write Operations 63% CMT hits for financial
29
Response Times (from tech report)
30
CDF
31
address translation overhead shows up
32
CDF FAST has a long tail
33
Figure 10. Microscopic analysis
34
Summary Demand-based page-level FTL Two-level page table: (Flash) Translation page: LPN to PPN entries (SRAM) Global translation directory: translation page entries Mapping cache in SRAM
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.