Lecture 22 SSD. LFS review Good for …? Bad for …? How to write in LFS? How to read in LFS?

Slides:



Advertisements
Similar presentations
More on File Management
Advertisements

File Systems 1Dennis Kafura – CS5204 – Operating Systems.
Flash storage memory and Design Trade offs for SSD performance
Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
Ext2/Ext3 Linux File System Reporter: Po-Liang, Wu.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
Chapter 11: File System Implementation
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
File System Implementation
File System Implementation
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Cse Feb-001 CSE 451 Section February 24, 2000 Project 3 – VM.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
The Design and Implementation of a Log-Structured File System Presented by Carl Yao.
Log-Structured File System (LFS) Review Session May 19, 2014.
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Operating Systems CMPSC 473 I/O Management (2) December Lecture 24 Instructor: Bhuvan Urgaonkar.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
Motivation SSDs will become the primary storage devices on PC, but NTFS behavior may not suitable to flash memory especially on metadata files. When considering.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
OCR GCSE Computing © Hodder Education 2013 Slide 1 OCR GCSE Computing Chapter 2: Memory.
Log-structured File System Sriram Govindan
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
26-Oct-15CSE 542: Operating Systems1 File system trace papers The Design and Implementation of a Log- Structured File System. M. Rosenblum, and J.K. Ousterhout.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.
Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:
Serverless Network File Systems Overview by Joseph Thompson.
Log-Structured File Systems
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Free Space Management.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File System Implementation
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
12.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Chapter 12: File System Implementation Chapter 12: File System Implementation.
Embedded System Lab. 서동화 The Design and Implementation of a Log-Structured File System - Mendel Rosenblum and John K. Ousterhout.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
CS333 Intro to Operating Systems Jonathan Walpole.
Lecture 21 LFS. VSFS FFS fsck journaling SBDISBDISBDI Group 1Group 2Group N…Journal.
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Lecture 20 FSCK & Journaling. FFS Review A few contributions: hybrid block size groups smart allocation.
File System Performance CSE451 Andrew Whitaker. Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn.
W4118 Operating Systems Instructor: Junfeng Yang.
CPSC 426: Building Decentralized Systems Persistence
File System Consistency
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Database Recovery Techniques
Jonathan Walpole Computer Science Portland State University
The Design and Implementation of a Log-Structured File System
Operating Systems ECE344 Lecture 11: SSD Ding Yuan
Filesystems 2 Adapted from slides of Hank Levy
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Lecture 20 LFS.
Printed on Monday, December 31, 2018 at 2:03 PM.
Overview: File system implementation (cont)
File-System Structure
Lecture 11: Flash Memory and File System Abstraction
Chapter 14: File-System Implementation
File System Implementation
File System Performance
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
The Design and Implementation of a Log-Structured File System
Presentation transcript:

Lecture 22 SSD

LFS review Good for …? Bad for …? How to write in LFS? How to read in LFS?

Disk after Creating Two Files

Garbage Collection in LFS General operation: pick M segments, compact into N Mechanism: how do we know whether data in segments is valid? Is an inode the latest version? Is a data block the latest version? Policy: when and which segments to compact?

Determining Data Block Liveness

Crash Recovery Start from the checkpoint Checkpoint often: random I/O Checkpoint rarely: recovery takes longer LFS checkpoints every 30s Crash on log writing Crash on checkpoint region update

Metadata Journaling 1/2. Data write: Write data to final location; wait for completion (the wait is optional; see below for details). 1/2. Journal metadata write: Write the begin block and metadata to the log; wait for writes to complete. 3. Journal commit: Write the transaction commit block (containing TxE) to the log; wait for the write to complete; the transaction (including data) is now committed. 4. Checkpoint metadata: Write the contents of the metadata update to their final locations within the file system. 5. Free: Later, mark the transaction free in journal superblock

Checkpoint In journaling Write the contents of the update to their final locations within the file system. In LFS Checkpoint regions locate on a special fixed position on disk. Checkpoint region contains the addresses of all imap blocks, current time, the address of the last segment written, etc.

Checkpoint Strategy Have two checkpoints. Only overwrite one at a time. it first writes out a header (with timestamp) then the body of the CR finally one last block (also with a timestamp) Use timestamps to identify the newest consistent one. If the system crashes during a CR update, LFS can detect this by seeing an inconsistent pair of timestamps

Roll-forward Scanning BEYOND the last checkpoint to recover max data Use information from segment summary blocks for recovery If found new inode in Segment Summary block -> update the inode map (read from checkpoint) -> new data block on the FS Data blocks without new copy of inode => incomplete version on disk => ignored by FS Adjusting utilization in the segment usage table to incorporate live data after roll-forward (utilization after checkpoint = 0 initially) Adjusting utilization of deleted & overwritten segments Restoring consistency between directory entries & inodes

Major Data Structures Superblock: Holds static configuration information such as number of segments and segment size. - Fixed inode: Locates blocks of file, holds protection bits, modify time, etc. Log Indirect block: Locates blocks of large files. - Log Inode map: Locates position of inode in log, holds time of last access plus version number version number. - Log Segment summary: Identifies contents of segment (file number and offset for each block). - Log Directory change log: Records directory operations to maintain consistency of reference counts in inodes. - Log Segment usage table: Counts live bytes still left in segments, stores last write time for data in segments. - Log Checkpoint region: Locates blocks of inode map and segment usage table, identifies last checkpoint in log. - Fixed

SSD

Flash-based Solid-state Storage Disk A new form of persistent storage device Unlike hard drives, it has no mechanical or moving parts Unlike typical random-access memory, it retains information despite power loss Unlike hard drives and like memory, random-access device Basics: To write a flash page, the flash block first needs to be erased Wear out …

Storing a Single Bit Store one or more bits in a single transistor single-level cell (SLC) flash, 1 or 0 multi-level cell (MLC) flash, 00, 01, 10, and 11 triple-level cell (TLC) flash, which encodes 3 bits per cell SLC chips achieve higher performance and are more expensive

From Bits to Blocks and Pages Flash chips are organized into banks or planes. A bank is accessed in two different sized units: Blocks (erase blocks): 128 KB or 256 KB Pages: 4KB

Basic Flash Operations Read (a page): a random access device. Erase (a block): Set each bit to the value 1 Quite expensive, taking a few milliseconds to complete Program (a page): Only if the block has been erased Around 100s of microseconds - less expensive than erasing a block, but more costly than reading a page Write is expensive, and frequent erase/program lead to wear out

4-page Block Status Erase() Program(0) Program(1) Erase() iiii Initial: pages in block are invalid (i) → EEEE State of pages in block set to erased (E) → VEEE Program page 0; state set to valid (V) → error Cannot re-program page after programming → VVEE Program page 1 → EEEE Contents erased; all pages programmable

A Detailed Example

Flash Performance And Reliability Raw Flash Performance Characteristics The primary concern is wear out, as a little bit of extra charge is slowly accrued Disturbance: when accessing (read/program) a particular page within a flash, it is possible that some bits get flipped in neighboring pages

Raw Flash → Flash-Based SSDs The standard storage interface: lots of sectors Inside SSD: flash chips, RAM for cache, and flash translation layer (FTL) – control logic to turn client reads and writes into flash operations FTL needs to reduce write amplification: bytes issued to the flash chips by the FTL divided by bytes issued by the client to the SSD FTL takes care of wear out - do wear leveling) FTL takes care of disturbance - access in order

A Bad Approach: Direct Mapped logical page N is mapped directly to physical page N Performance is bad Uneven wear out What might be a good approach? Trying to improve write performance Use the device circularly

Yeah, a blank slide

A Log-Structured FTL Need to add a mapping table Operations: Write(100) with contents a1 Write(101) with contents a2 Write(2000) with contents b1 Write(2001) with contents b2

The resulting SSD How to read? Wear leveling: FTL now spreads writes across all pages

Keep FTL Mapping Persistent Record some mapping information with each page called an out-of-band (OOB) area When the device looses power and is restarted Scan OOB areas and reconstruct the mapping table is memory Logging and checkpointing

Garbage Collection Garbage example (the figure has a bug) “ VVii ” should be “ VVEE ” Determine liveness: Within each block, store information about which logical blocks are stored within each page Checking the mapping table for the logical block

Garbage Collection Steps Read live data (pages 2 and 3) from block 0 Write live data to end of the log Erase block 0 (freeing it for later usage)

Block-Based Mapping to Reduce Mapping Table Size Logical address: the least significant two bits as offset Page mapping: 2000→4, 2001→5, 2002→6, 2003→7 Before After

Problem with Block-Based Mapping Small write The FTL must read a large amount of live data from the old block and copy it into a new one What might be a good solution? Page-based mapping is good at …, but bad at … Block-based mapping is bad at …, but good at …

Hybrid Mapping Log blocks: a few blocks that are per-page mapped Call the per-page mapping log table Data blocks: blocks that are per-block mapped Call the per-block mapping data table How to read and write? How to switch between per-page mapping and per- block mapping?

Hybrid Mapping Exmaple Overwrite each page

Switch Merge Before and After

Partial Merge Before and After

Full Merge The FTL must pull together pages from many other blocks to perform cleaning Imagine that pages 0, 4, 8, and 12 are written to log block A

Wear Leveling The FTL should try its best to spread that work across all the blocks of the device evenly The log-structuring approach does a good initial job What if a block is filled with long-lived data that does not get over-written? Periodically read all the live data out of such blocks and re-write it elsewhere

SSD Performance Fast but expensive An SSD costs 60 cents per GB A typical hard drive costs 5 cents per GB

Next Data Integration and Protection Distributed Systems RPC