Lecture 20 FSCK & Journaling. FFS Review A few contributions: hybrid block size groups smart allocation.

Slides:



Advertisements
Similar presentations
4/8/14CS161 Spring FFS Recovery: Soft Updates Learning Objectives Explain how to enforce write-ordering without synchronous writes. Identify and.
Advertisements

More on File Management
CS-3013 & CS-502, Summer 2006 More on File Systems1 More on Disks and File Systems CS-3013 & CS-502 Operating Systems.
File Systems Examples.
Ext2/Ext3 Linux File System Reporter: Po-Liang, Wu.
Chapter 11: File System Implementation
Lecture 18 ffs and fsck. File-System Case Studies Local FFS: Fast File System LFS: Log-Structured File System Network NFS: Network File System AFS: Andrew.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
Crash recovery All-or-nothing atomicity & logging.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Ext3 Journaling File System “absolute consistency of the filesystem in every respect after a reboot, with no loss of existing functionality” chadd williams.
Crash recovery All-or-nothing atomicity & logging.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
File System Reliability. Main Points Problem posed by machine/disk failures Transaction concept Reliability – Careful sequencing of file system operations.
The Design and Implementation of a Log-Structured File System Presented by Carl Yao.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
JOURNALING VERSUS SOFT UPDATES: ASYNCHRONOUS META-DATA PROTECTION IN FILE SYSTEMS Margo I. Seltzer, Harvard Gregory R. Ganger, CMU M. Kirk McKusick Keith.
IT 344: Operating Systems Winter 2008 Module 16 Journaling File Systems Chia-Chi Teng CTB 265.
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
CS 4284 Systems Capstone Godmar Back Disks & File Systems.
Log-structured File System Sriram Govindan
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
1 Shared Files Sharing files among team members A shared file appearing simultaneously in different directories Share file by link File system becomes.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Log-Structured File Systems
File System Implementation
CSE 451: Operating Systems Spring 2012 Journaling File Systems Mark Zbikowski Gary Kimura.
Advanced file systems: LFS and Soft Updates Ken Birman (based on slides by Ben Atkin)
Journaled Component Files John Scholes and Richard Smith 13 October, 2008 Or – How to never see FILE DAMAGED again!
File Systems Operating Systems 1 Computer Science Dept Va Tech August 2007 ©2007 Back File Systems & Fault Tolerance Failure Model – Define acceptable.
1 Week 13 FAT32 Utility Operations Guide: rm and rmdir Classes COP4610 / CGS5765 Florida State University.
Lecture 19 FFS. File-System Case Studies Local VSFS: Very Simple File System FFS: Fast File System LFS: Log-Structured File System Network NFS: Network.
CS333 Intro to Operating Systems Jonathan Walpole.
Lecture 21 LFS. VSFS FFS fsck journaling SBDISBDISBDI Group 1Group 2Group N…Journal.
UNIX File System (UFS) Chapter Five.
Outline for Today Journaling vs. Soft Updates Administrative.
Lecture 22 SSD. LFS review Good for …? Bad for …? How to write in LFS? How to read in LFS?
File Systems 2. 2 File 1 File 2 Disk Blocks File-Allocation Table (FAT)
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Transactions and Reliability Andy Wang Operating Systems COP 4610 / CGS 5765.
File system and file structures
JOURNALING VERSUS SOFT UPDATES: ASYNCHRONOUS META-DATA PROTECTION IN FILE SYSTEMS Margo I. Seltzer, Harvard Gregory R. Ganger, CMU M. Kirk McKusick Keith.
Lecture Topics: 12/1 File System Implementation –Space allocation –Free Space –Directory implementation –Caching Disk Scheduling File System/Disk Interaction.
File Systems Topics Design criteria History of file systems Berkeley Fast File System Effect of file systems on programs CS 105 “Tour of the Black Holes.
File System Performance CSE451 Andrew Whitaker. Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn.
CSE 451: Operating Systems Winter 2015 Module 17 Journaling File Systems Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
CSE 451: Operating Systems Spring Module 17 Journaling File Systems
File System Consistency
FFS: The Fast File System
Jonathan Walpole Computer Science Portland State University
Transactions and Reliability
CS703 - Advanced Operating Systems
CS 5204 Operating Systems Disks & File Systems Godmar Back.
Journaling File Systems
Lecture 20 LFS.
CSE 451: Operating Systems Autumn Module 16 Journaling File Systems
CSE 451: Operating Systems Spring 2011 Journaling File Systems
Printed on Monday, December 31, 2018 at 2:03 PM.
CSE 451: Operating Systems Winter Module 16 Journaling File Systems
CSE 451: Operating Systems Spring Module 17 Journaling File Systems
Overview: File system implementation (cont)
CSE 451: Operating Systems Spring Module 16 Journaling File Systems
File-System Structure
CSE 451 Fall 2003 Section 11/20/2003.
CSE 451: Operating Systems Spring 2008 Module 14
File System Performance
Operating Systems File Systems ENCE 360.
CSE 451: Operating Systems Winter Module 16 Journaling File Systems
EECE.4810/EECE.5730 Operating Systems
CSE 451: Operating Systems Spring 2010 Module 14
Presentation transcript:

Lecture 20 FSCK & Journaling

FFS Review A few contributions: hybrid block size groups smart allocation

Hybrid Block Size: Blocks + Fragments Big blocks: fast Small blocks: space efficient FFS split regular blocks into fragments when less than a block is needed.

Groups and Allocation With groups, each inode has data blocks near it File inodes: allocate in same group with dir Dir inodes: allocate in new group with fewer inodes than the average group First data block: allocate near inode Other data blocks: allocate near previous block Large file data blocks: after 48KB, go to new group. Move to another group (w/ fewer than avg blocks) every subsequent 1MB. SBDISBDISBDI

Redundancy? Definition: if A and B are two pieces of data, and knowing A eliminates some or all the values B could B, there is redundancy between A and B. Superblock: field contains total blocks in FS. Inode: field contains pointer to data block. Is there redundancy between these fields? Why? Yes. If total block number is N, pointers to block N or after are invalid.

Redundancy in FFS Dir entries AND inode table. Dir entries AND inode link count. Data bitmap AND inode pointers. Inode file size AND inode/indirect pointers.

Redundancy Uses Redundancy may improve: Performance Reliability Redundancy hurts: Capacity Redundancy implies: Certain combinations of values are illegal. Inconsistencies

Consistency Challenge We may need to do several disk writes to redundant blocks. We don’t want to be interrupted between writes. Things that interrupt us: power loss kernel panic, reboot user hard reset

Partial Update Suppose we are appending to a file, and must update the following: data block, inode, and data bitmap What if crash after only updating some of these? data: nothing bad inode: point to garbage, somebody else may use bitmap: lost block, space leak bitmap and inode: point to garbage bitmap and data: lost block data and inode: somebody else may use

fsck FSCK = file system checker. Strategy: after a crash, scan whole disk for contradictions. For example, is a bitmap block correct? Read every valid inode+indirect. If an inode points to a block, the corresponding bit should be 1

fsck Other checks: Do superblocks match? Do number of dir entries equal inode link counts? Do different inodes ever point to same block? Do directories contain “.” and “..”? … How to solve problems?

Exmaples Dir Entry -> inode link_count = 1 <- Dir Entry make the link_count 2 inode link_count = 1 with no Dir Entry points to it link it under lost+found/ Data and inode are written, but not bitmap change bitmap Two inodes point to the same block duplicate the block inode points to a block N or more remove the link

fsck It’s not always obvious how to patch the file system back together. We don’t know the “correct” state, just a consistent one. Too slow.

Regaining Consistency After Crash Solution 1: reformat disk Solution 2: guess (fsck) Solution 3: do fancy bookkeeping before crash

Journaling Goals It’s ok to do some recovery work after crash, but not to read entire disk. Don’t just get to a consistent state, get to a “correct” state. Known as write-ahead logging is database systems.

Atomicity Concurrency definition: operations in critical sections are not interrupted by operations on other critical sections. Persistence definition: collections of writes are not interrupted by crashes. Get all new or all old data.

Basic Idea Before overwriting the disk, write down a little note Upon a crash, check the note Ext3 file system with a journal Group 1Group 2Group N…Journal

Data Journaling Before writing inode (I[v2]), bitmap (B[v2]), and data block (Db) to disk, write to the log/journal TxB (transaction begin): information about the pending updates, e.g., the final addresses for the blocks, transaction ID, checksum. Middle three blocks: physical logging TxE (transaction end): mark the end, also contains the transaction ID, checksum.

Sequence of Operations (V1) 1. Journal write: Write the transaction, including a transaction-begin block, all pending data and metadata updates, and a transaction-end block, to the log; wait for these writes to complete. 2. Checkpoint: Write the pending metadata and data updates to their final locations in the file system.

How to write the journal? Write set of blocks: e.g., TxB, I[v2], B[v2], Db, TxE Issue one block by one block: too slow Issue five blocks at one: unsafe

Write in two steps To make the write of TxE atomic, make it a single 512-byte block

Sequence of Operations (V2) 1. Journal write: Write the contents of the transaction (including TxB, metadata, and data) to the log; wait for these writes to complete. 2. Journal commit: Write the transaction commit block (containing TxE) to the log; wait for write to complete; transaction is said to be committed. 3. Checkpoint: Write the contents of the update (metadata and data) to their final on-disk locations.

Recovery A crash could happen at any time. If crash before step 2 completes Skip the pending update If crash after step 2 completes Transactions are replayed What if crash during checkpointing?

Batching Log Updates Basic protocol could add a lot of extra disk traffic Suppose we create two files Going to write the same inode block over and over to the log Buffer all updates into a global transaction

Making The Log Finite What if the log is full? Recovery takes longer to replay everything in the log No further transactions can happen Make the journal circular Free the space after a transaction is checkpointed

Sequence of Operations (V3) 1. Journal write: Write the contents of the transaction (containing TxB and the contents of the update) to the log; wait for these writes to complete. 2. Journal commit: Write the transaction commit block (containing TxE) to the log; wait for the write to complete; the transaction is now committed. 3. Checkpoint: Write the contents of the update to their final locations within the file system. 4. Free: Some time later, mark the transaction free in the journal by updating the journal superblock.

Metadata Journaling For each write, we write twice. Other than data journaling, there is also ordered journaling (metadata journaling) User data is not written to the journal When to write Db to disk?

Sequence of Operations (V4) 1/2. Data write: Write data to final location; wait for completion (the wait is optional). 1/2. Journal metadata write: Write the begin block and metadata to the log; wait for writes to complete. 3. Journal commit: Write the transaction commit block (containing TxE) to the log; wait for the write to complete; the transaction (including data) is now committed. 4. Checkpoint metadata: Write the contents of the metadata update to their final locations within the file system. 5. Free: Later, mark the transaction free in journal superblock

Tricky Case: Block Reuse The Db of foobar will be overwritten Solutions: Never reuse blocks until the delete of said blocks is checkpointed out of the journal add a new type of record to the journal, a revoke record

Data Journaling Timeline

Metadata Journaling Timeline

Other Approaches Soft Update COW: copy-on-write BBC: backpointer-based consistency Optimistic crash consistency

Journaling Reduces recovery time from O(size-of-the-disk-volume) to O(size-of-the-log)

Next LFS