DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.

Slides:



Advertisements
Similar presentations
Trading Flash Translation Layer For Performance and Lifetime
Advertisements

International Conference on Supercomputing June 12, 2009
Chapter 11: File System Implementation
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 17, 2003 Topic: Virtual Memory.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
Paging Algorithms Vivek Pai / Kai Li Princeton University.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
File System Structure §File structure l Logical storage unit l Collection of related information §File system resides on secondary storage (disks). §File.
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Operating Systems CMPSC 473 I/O Management (2) December Lecture 24 Instructor: Bhuvan Urgaonkar.
Memory Management in Windows and Linux &. Windows Memory Management Virtual memory manager (VMM) –Executive component responsible for managing memory.
Hot and Cold Data Identification -Applications to Storage Devices and Systems- Dongchul Park Computer Science and Engineering July 16, 2012 Advisor: Professor.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Origianal Work Of Hyojun Kim and Seongjun Ahn
Chapter 8 Memory Management Dr. Yingwu Zhu. Outline Background Basic Concepts Memory Allocation.
Embedded System Lab. 서동화 HIOS: A Host Interface I/O Scheduler for Solid State Disk.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by Chris Homan.
Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:
Introduction to F2FS SRC-Nanjing Chao Yu. 2/23 1.Overview 2.Design 3.Performance 4.TODO Contents.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Chapter 4 Memory Management Virtual Memory.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Operating Systems CMPSC 473 I/O Management (3) December 07, Lecture 24 Instructor: Bhuvan Urgaonkar.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
CS333 Intro to Operating Systems Jonathan Walpole.
A Semi-Preemptive Garbage Collector for Solid State Drives
Cache Memory Chapter 17 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
Lecture 22 SSD. LFS review Good for …? Bad for …? How to write in LFS? How to read in LFS?
ExLRU : A Unified Write Buffer Cache Management for Flash Memory EMSOFT '11 Liang Shi 1,2, Jianhua Li 1,2, Chun Jason Xue 1, Chengmo Yang 3 and Xuehai.
Sungkyunkwan University Sector Level Mappinng FTL Computer engineering, Sungkyunkwan Univ. Oh Gihwan, Han Gyuhwa, Hong Gyeonghwan Jasmine Open-SSD Project.
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
Application-Managed Flash
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CPSC 426: Building Decentralized Systems Persistence
Chin-Hsien Wu & Tei-Wei Kuo
Solid State Disk Prof. Moinuddin Qureshi Georgia Tech.
CS161 – Design and Architecture of Computer
FlashTier: A Lightweight, Consistent and Durable Storage Cache
Lecture 11 Virtual Memory
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Jonathan Walpole Computer Science Portland State University
Virtual Memory Chapter 7.4.
Memory COMPUTER ARCHITECTURE
Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie1 , Yong Chen1 and Philip C. Roth2 1. Texas Tech.
CS161 – Design and Architecture of Computer
Section 9: Virtual Memory (VM)
Virtual Memory User memory model so far:
An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CDA 5155 Caches.
Virtual Memory Hardware
Overheads for Computers as Components 2nd ed.
Parallel Garbage Collection in Solid State Drives (SSDs)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Contents Memory types & memory hierarchy Virtual memory (VM)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Lecture 8: Efficient Address Translation
Sarah Diesburg Operating Systems CS 3430
Sarah Diesburg Operating Systems COP 4610
CS 444/544 Operating Systems II Virtual Memory Translation
Virtual Memory 1 1.
Dong Hyun Kang, Changwoo Min, Young Ik Eom
Presentation transcript:

DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS 2009 Shimin Chen, Big Data Reading Group

Introduction Goal: improve performance of flash-based devices for workloads with random writes New Proposal: DFTL (Demand-based FTL)  FTL: flash translation layer)  FTL maintains a mapping table: virtual  physical address

Outline Introduction Background on FTL Design of DFTL Experimental Results Summary

Basics of Flash Memory OOB (out-of-band) area:  ECC  Logical page number  State: erased/valid/invalid

Flash Translation Layer Maintain mapping:  Virtual address (exposed to upper level)  physical address (on flash) Use a small, fast SRAM for storing this mapping Hide erase operation to the above  Avoiding in-place update  Updating a clean page  Performing garbage collection and erasure Note:  OOB has the physical  virtual mapping  FTL virtual  physical mapping can be rebuilt (at restart)

Page-Level FTL Keep page to page mapping table Pro: can map any logical page to any physical page  Efficient flash page utilization Con: mapping table is large  E.g., 16GB flash, 2KB flash page, requires 32MB SRAM  As flash size increases, SRAM size must scale  Too expensive!

Block-Level FTL Keep block to block mapping Pro: small  Mapping table size reduced by a factor of (block size / page size) ~ 64 times Con: page number offset within a block is fixed  Garbage collection overheads grow

Hybrid FTLs (a generic description) Data blocks: block-level mapping Log/update blocks: page-level mapping LPN: Logical Page Number

Operations in Hybrid FTLs Update on data blocks: write to log blocks  Log region is small (e.g., 3% of total flash size) Garbage collection (gc)  When no free log blocks are available, invoke gc to merge log blocks with data blocks

Full Merge can be Recursive thus Expensive Often resulted from random writes

Outline Introduction Background on FTL Design of DFTL Experimental Results Summary

DFTL Idea Avoid expensive full merges totally  Do not use log blocks at all Idea:  Use page-level mapping  Keep the full mapping on flash to reduce SRAM use  Exploit temporal locality in workloads  Dynamically load / unload page-level mappings into SRAM

DFTL Architecture Global mapping table

DFTL Address Translation Global mapping table Case 1: request_LPN hits in cache mapping table Done. Retrieve the mapping directly

DFTL Address Translation Global mapping table Case 2: a miss in cache mapping table (CMT) If (CMT is not full) then look up GDT read the translation page fill in CMT entry goto case 1

DFTL Address Translation Global mapping table Case 3: a miss in cache mapping table (CMT) If (CMT is full) then select CMT entry to evict (~LRU) write back dirty entry goto case 2

Address Translation Cost Worst case cost (case 3)  2 translation page reads  1 translation page write Temporal locality:  More hits, fewer misses, fewer evictions CMT contains multiple mappings in a single translation page  Batch updates

Data Read Address translation: LPN  PPN Read the data page PPN

Writes Current data block  Updated data page is appended into current data block Current translation block  Updated translation page is appended into current translation block Until number of free blocks < GC_threshold

Garbage Collection Select a victim block [15] Kawaguchi et al. 1995

Garbage Collection If selected victim block is a translation block  Copy valid page to a free translation block  Update GTD (global translation directory) If selected victim block is a data block  Copy valid page to a free data block  Update the page-level translation for each data block Possibly update CMT entry (if so, done) Locate translation page, update it, change GTD  Batch update opportunities if multiple page-level translations are in the same translation page

Benefits Page-level mapping:  No expensive full merge operations Better random write performance as a result But random writes are still worse than sequential  more CMT misses, more translation page writes  Data pages in a block are more scattered GC costs higher: less opportunities for batch updates

Outline Introduction Background on FTL Design of DFTL Experimental Results Summary

FTL Schemes Implemented FlashSim simulator  The authors enhanced DiskSim Block-based FTL A state-of-the-art hybrid FTL (FAST FTL) DFTL An idealized page-based FTL

Experimental Setup Model 32GB flash memory, 2KB page, 128KB block  Timing is displayed in Table 1

Traces Used in Experiments

Block Erases Baseline: idealized page-level FTL

Extra Read/Write Operations 63% CMT hits for financial

Response Times (from tech report)

CDF

address translation overhead shows up

CDF FAST has a long tail

Figure 10. Microscopic analysis

Summary Demand-based page-level FTL Two-level page table:  (Flash) Translation page: LPN to PPN entries  (SRAM) Global translation directory: translation page entries Mapping cache in SRAM