FFS: The Fast File System

Slides:



Advertisements
Similar presentations
Chapter 12: File System Implementation
Advertisements

More on File Management
Free Space and Allocation Issues
File Systems.
CSE506: Operating Systems Block Cache. CSE506: Operating Systems Address Space Abstraction Given a file, which physical pages store its data? Each file.
Lecture 18 ffs and fsck. File-System Case Studies Local FFS: Fast File System LFS: Log-Structured File System Network NFS: Network File System AFS: Andrew.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Chapter 12: File System Implementation
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Operating Systems COMP 4850/CISG 5550 Disks, Part II Dr. James Money.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
File Systems (1). Readings r Silbershatz et al: 10.1,10.2,
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
1 Interface Two most common types of interfaces –SCSI: Small Computer Systems Interface (servers and high-performance desktops) –IDE/ATA: Integrated Drive.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
Log-structured File System Sriram Govindan
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Free Space Management.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File System Implementation
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
CS 153 Design of Operating Systems Spring 2015 Lecture 21: File Systems.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
Lecture 19 FFS. File-System Case Studies Local VSFS: Very Simple File System FFS: Fast File System LFS: Log-Structured File System Network NFS: Network.
CS333 Intro to Operating Systems Jonathan Walpole.
UNIX File System (UFS) Chapter Five.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
IT 344: Operating Systems Winter 2008 Module 15 BSD UNIX Fast File System Chia-Chi Teng CTB 265.
THE FILE SYSTEM Files long-term storage RAM short-term storage Programs, data, and text are all stored in files, which is stored on.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Lecture 20 FSCK & Journaling. FFS Review A few contributions: hybrid block size groups smart allocation.
Lecture Topics: 12/1 File System Implementation –Space allocation –Free Space –Directory implementation –Caching Disk Scheduling File System/Disk Interaction.
CS399 New Beginnings Jonathan Walpole. Disk Technology & Secondary Storage Management.
File System Performance CSE451 Andrew Whitaker. Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn.
CHAPTER 4-3 FILE SYSTEM CONSISTENCY AND EFFICIENCY.
File-System Management
File System Consistency
File System Examples Unix Fast File System (FFS)
Jonathan Walpole Computer Science Portland State University
Transactions and Reliability
Chapter 11: File System Implementation
File System Structure How do I organize a disk into a file system?
CS703 - Advanced Operating Systems
Filesystems.
Journaling File Systems
Filesystems 2 Adapted from slides of Hank Levy
Lecture 20 LFS.
Overview Continuation from Monday (File system implementation)
Introduction to Operating Systems
Jonathan Walpole Computer Science Portland State University
CSE 451: Operating Systems Autumn 2004 BSD UNIX Fast File System
Printed on Monday, December 31, 2018 at 2:03 PM.
Secondary Storage Management Brian Bershad
File System Implementation
CSE 451: Operating Systems Winter Module 15 BSD UNIX Fast File System
CSE 451 Fall 2003 Section 11/20/2003.
CSE451 Virtual Memory Paging Autumn 2002
Chapter 14: File-System Implementation
Secondary Storage Management Hank Levy
CS333 Intro to Operating Systems
Department of Computer Science
SE350: Operating Systems Lecture 12: File Systems.
File System Performance
Lecture Topics: 11/20 HW 7 What happens on a memory reference Traps
CSE 451: Operating Systems Winter Module 15 BSD UNIX Fast File System
Lecture 29: File Systems (cont'd)
The Design and Implementation of a Log-Structured File System
Presentation transcript:

FFS: The Fast File System CS 161: Lecture 13 3/30/17

The Original, Not-Fast Unix Filesystem Disk Superblock Inodes Data Inode Metadata Direct ptr Indirect ptr 2-indirect ptr Data . . . Contains file system-wide metadata like the size of disk blocks, and the start offsets for the inode section and data section Directory Name i-number Note that the number of inodes and the size of the data section is fixed.

The Original, Not-Fast Unix Filesystem Disk Design: Disk is treated like a linear array of bytes Problem: Data access incurs mechanical delays! Accessing a file’s inode and then a data block requires two seeks (or more if the block is indirectly-pointed-to) Block allocation wasn’t clever, so files in the same directory were often far apart Block size was 512 bytes, increasing penalty for poor block allocation (more disk seeks!) Result: File system provided only 4% of the sequential disk bandwidth! Superblock Inodes Inode for file X DirEntry: “X” --> inode# Data Data block for X

FFS: The Fast File System Goal: Keep the same file system abstractions (e.g., open(), read()), but improve the performance of the implementation First idea: Increase block size from 512 bytes to 4096 bytes Increases min(bytes_returned_per_seek) --> decreases number of seeks 8x as much data covered by direct blocks --> fewer indirect block accesses --> decreases number of seeks Second idea: Disk-aware file layout Consider disk geometry and mechanical delays when deciding where to put files Keep related things next to each other to reduce seeks

FFS: Data Layout Spindle R/W heads Platter

FFS: Data Layout Track Cylinder Cylinder group

FFS: Data Layout Cylinder group Inodes Data Directory allocation: Use a cylinder group with few allocated directories and many free inodes File allocation: Allocate file inodes in cylinder group of parent directory; allocate file data blocks in cylinder group of file inode Allocation policies driven by expectation of temporal locality Files in the same directory will be accessed together (e.g., source code compilation, a browser’s web cache) Providing spatial locality for data with temporal locality decreases disk seeks! Superblock Inodes Data Inode bitmap Data bitmap Note that the number of inodes and data blocks is fixed.

Sequential file scan incurs rotational latency for each block! FFS: Data Layout 1 2 3 4 5 6 Disk rotation OS wants to read 0 and 1, but disk only allows one outstanding request . . . Sequential file scan incurs rotational latency for each block! Request 0 Request 1 Return 0

FFS: Data Layout 3 4 5 6 1 2 OS wants to read 0 and 1, but disk only allows one outstanding request . . . FFS determines the number of skip blocks by empirically measuring disk characteristics Request 1 Request 0 Return 0

Block Placement Tricks: Still A Good Idea? Modern disks are more powerful than FFS-era disks Use hardware-based track buffer to cache entire track during the read of a single sector Buffer writes, and batch multiple sequential writes into single one Keep a small reserve of “extra” physical sectors so that bad sectors can be avoided (disk implements a virtual-to-physical mapping!) Modern disks don’t expose many details about geometry Only guarantee that sectors with similar sector numbers are probably “close” to each other w.r.t. access time So, modern file systems use “block groups” instead of “cylinder groups” . . . CG #0 CG #1 CG #2

Ensuring Consistency After Crashes Q: What happens to on-disk structures after an OS crash or a power outage? A: The file system must recover to a “reasonable” state Some data loss is ok . . . . . . but it’s NOT ok to have an unmountable file system! So, file system *metadata* must be consistent post-crash There’s a trade-off between performance and data loss

Crash Consistency: Creating a New File To create a new file “foo”, you need to write three things to disk: Update on-disk inode bitmap to allocate a new inode Write the new inode for “foo” to disk Write updated version of the directory that points to the new inode The order of the writes makes a difference! Suppose (1) has completed . . . how should we order (2) and (3)? Points to sadness Directory “bar” Inode #X Directory “bar” Inode #X “foo” Inode #Y Directory “bar” Inode #X “foo” Inode #Y Crash! Inode #Y Metadata DataPtrs=NULL Directory “bar” Inode #X Directory “bar” Inode #X Directory “bar” Inode #X Inode #Y Metadata DataPtrs=NULL Crash! (storage leak!) ^ Not referenced by a dirent . . . but no dirents point to junk/old stuff! After crash, we need to find unreferenced inodes and mark them as unused.

Why Is Crash Consistency Hard? We want to atomically move the file system from one state to another, but . . . A single, high-level file system operation like “create a new file” often corresponds to multiple disk writes, but the machine may crash without completing all writes The file system may issue writes to disk asynchronously, to allow processes to continue execution immediately after issuing IOs Ex: FFS would only flush data once every 30 seconds unless explicitly asked via a sync() request To improve performance, the hard disk may complete writes in a different order than they were generated! Ex: If the hard disk receives w0 (whose destination is far from the current position of the disk head), and w1 (whose destination is nearby), the disk may write w1 first

fsck: When Bad Things Happen To Good People In FFS, the fsck (“file system checker”) tool must be run after a crash to restore the file system to a consistent state “Consistent” is defined as “all metadata is in agreement (e.g., no block that is marked as unallocated is referenced by an inode)” This definition of “consistent” is NOT the same as “all writes which made it to the disk pre-crash must be reflected in the post-crash file system”—in other words, fsck may discard some write data fsck is run on the file system before the file system is mounted, i.e., before the file system can be accessed by other applications

fsck: When Bad Things Happen To Good People Ex: Appending to a file via a direct pointer requires three disk writes (1) inode (to update data pointer, file size, last update time) (2) data block bitmap (to indicate that a new data block has been allocated) (3) the data block itself Suppose that writes (1) and (3) complete, but (2) doesn’t Inconsistency: inode points to a valid data block, but the block bitmap wasn’t updated, so the new data block isn’t marked as in-use fsck resolves block bitmap inconsistencies by treating inodes as ground truth When fsck starts, it initializes the bitmap to “all unallocated” Start from root directory, fsck recursively scans all inodes, and marks all reachable data blocks as allocated in the bitmap At the end of a fsck, metadata is always consistent, but file *data* may contain junk! Ex: Writes (1) and (2) complete, but not (3) Suppose that writes (1) and (3) complete, but (2) doesn’t. There’s an inconsistency in the file system metadata: the direct pointer in the inode will point to a valid data block, but the block bitmap was not updated---remember that write (2) didn’t complete. So, the inode points to a data block which is listed as unallocated in the block bitmap! This is bad, because the file system could then allocate that block to another file, leading to junk data in the original file.

fsck: When Bad Things Happen To Good People Besides fixing the data block bitmaps, fsck does a bunch of other checks Ex: An inode’s link count refers to the number of directory entries that refer to inode (the number may be higher than 1 due to hard links) fsck traverses the entire file system, starting from the root directory, updating each inode’s link count Disadvantages of fsck Implementing fsck requires deep knowledge of the file system: the code is complicated and hard to get right fsck is very slow, because it requires multiple traversals of the entire file system! Ideally, the recovery effort should be proportional to the number of outstanding writes at the time of the crash Journaling achieves this goal! An inode’s link count might get out-of-sync if, for example, a directory is updated to include a hard link, but the inode pointed to by that hard link is not written to disk to include the new link count.

Sector Sizes From the perspective of an OS, a storage device is a big, linear array of bytes Sector: Smallest amount of data that the device can read or write in a single operation Up until 2011, hard drives used a sector size of 512 bytes Sync (helps with disk head timing timing alignment) Address mark (contains sector # and cylinder #; helps detect “misstepped” head in wrong location) Inter-sector gap Error-correcting code Data (512 bytes) 15 bytes 50 bytes

Sector Sizes As areal bit densities increased (i.e., as a 512 byte sector decreased in size), media defects of a given size caused increasingly larger damage A standard 512 byte sector could usually only correct defects of up to 50 bytes Smaller sector Post-2011, all commodity hard disks had transitioned to 4K sector sizes with 100 bytes of ECC (but 15 bytes of preamble, as in 512 byte sectors) 100 bytes of ECC per 4K sector can correct up to 400 bytes of error

Older/poorly-written software assumes a sector size of 512 bytes, so modern hard drives provide a backwards compatibility mode Two definitions for a sector Physical: Smallest amount of data that the device can read or write in a single operation Logical: Smallest amount of data that software can ask the device to read or write A drive with 4K physical sectors can advertise a logical sector size of 512 bytes This hack will keep cranky software happy . . . . . . but a write that is smaller than the physical sector size will result in a read-modify-write :-( Poor use of space :-(

Suppose that physical sectors are 4K bytes, logical sectors are 512 bytes, and the OS wants to write 512 bytes . . . 4KB physical sector in cache 512 B logical sector to write Step 1: Hard disk reads entire 4 KB physical sector into on-disk cache Step 2: Disk updates the logical sector in the cache Step 3: Disk rewrites modified physical sector to the platter

Write amplification: A single write requires additional reads or writes! To avoid write amplification, the OS should avoid writes that are smaller than the physical sector size The OS should also ensure that its data structures are properly aligned Ex: The file system should allocate data blocks such that: The size of each data block is evenly divisible by the size of a physical sector The start of each data block is aligned with the start of a physical sector Ex: Each swap file entry should be aligned with the start of a physical sector