Download presentation
Presentation is loading. Please wait.
Published byBriana Norton Modified over 9 years ago
1
Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift
2
Applications require data Use FS to reliably store data Both hardware and software can fail Typical Solution Large clusters for availability Reliability through replication 2 GFS Master Slave Nodes
3
OS FS Replication infeasible for desktop environments Wouldn’t RAID work? Can only tolerate H/W failures FS crash are more severe Services/applications are killed Requiring OS reboot and recovery Need: better reliability in the event of file system failures 3 Raid Controller Disks Disk App
4
Motivation Background Restartable file systems Advantages and limitations Conclusions 4
5
Exception paths not tested thoroughly Exceptions: failed I/O, bad arguments, null pointer On errors: call panic,BUG,BUG_ON After failure: data becomes inaccessible Reason for no recovery code Hard to apply corrective measures Not straightforward to add recovery 5
6
6 int journal_mark_dirty(….) { struct reiserfs_journal_cnode *cn = NULL; if (!cn) { cn = get_cnode(p_s_sb); if (!cn) { reiserfs_panic(p_s_sb, "get_cnode failed!\n"); }} } void reiserfs_panic(struct super_block *sb,...) { BUG(); /* this is not actually called, but makes reiserfs_panic() "noreturn" */ panic("REISERFS: panic %s\n“, error_buf); } ReiserFS File systems already detect failures Recovery: simplified by generic recovery mechanism
7
1. Code to recover from all failures Not feasible in reality 2. Restart on failure Previous work have taken this approach FS need: stateful & lightweight recovery 7 HeavyweightLightweight Stateless Stateful Nooks/Shadow Xen, Minix L4, Nexus SafeDrive Singularity CuriOS EROS
8
Goal: build lightweight & stateful solution to tolerate file-system failures Solution: single generic recovery mechanism for any file system failure 1.Detect failures through assertions 2.Cleanup resources used by file system 3.Restore file-system state before crash 4.Continue to service new file system requests 8 FS Failures: completely transparent to applications
9
9 Transparency Multiple applications using FS upon crash Intertwined execution Fault-tolerance Handle a gamut of failures Transform to fail-stop failures Consistency OS and FS could be left in an inconsistent state
10
FS consistency required to prevent data loss 10 Not all FS support crash-consistency FS state constantly modified by applications Periodically checkpoint FS state Mark dirty blocks as Copy-On-Write Ensure each checkpoint is atomically written On Crash: revert back to the last checkpoint
11
11 VFS File System Application Epoch 0Epoch 1 time checkpoint Open (“file”)write()read() CompletedIn-progress Legend: Crash write() Periodically create checkpoints 1 Move to recent checkpoint 4 Replay completed operations 5 Unwind in-flight processes 3 File System Crash 2 Re-execute unwound process 6 12456 write()Close() 3
12
File systems constantly modified Hard to identify a consistent recovery point Naïve Solution: Prevent any new FS operation and call sync Inefficient and unacceptable overhead 12
13
13 VFS File System Page Cache Disk App File Systems write to disk through Page Cache All requests go through the VFS layer ext 3 VFAT Control requests to FS and dirty pages to disk
14
14 VFS File System Page Cache Disk App VFS File System Page Cache App Disk Regular At Checkpoint After Checkpoint VFS File System Page Cache App Disk STOP Membrane 1 1
15
Have built-in crash consistency mechanism Journaling or Snapshotting Seamlessly integrate with these mechanism Need FSes to indicate beginning and end of an transaction Works for data and ordered journaling mode Need to combine writeback mode with COW 15
16
Log operations at the VFS level Need not modify existing file systems Operations: open, close, read, write, symlink, unlink, seek, etc. Read: Logs are thrown away after each checkpoint What about logging writes? 16
17
Mainly used for replaying writes Goal: Reduce the overhead of logging writes Soln: Grab data from page cache during recovery 17 VFS File System Page Cache VFS File System Page Cache Before Crash During Recovery VFS File System Page Cache After Recovery Write (fd, buf, offset, count)
18
18
19
19
20
Setup 20
21
21
22
22
23
Restart ext2 during random-read micro benchmark 23
24
Data (Mb) Recovery Time (ms) 1012.9 2013.2 4016.1 24
25
Improves tolerance to file system failures Build trust in new file systems (e.g., ext4, btrfs) Quick-fix bug patching Developer transform corruptions to restart Restart instead of extensive code restructuring Encourage more integrity checks in FS code Assertions could be seamlessly transformed to restart File systems more robust to failures/crashes 25
26
Only tolerate fail-stop failures Not address-space based Faults could corrupt other kernel components FS restart may be visible to application e.g., Inode numbers could be changed after restart 26 VFS File System Application Epoch 0 After Crash Recovery Before Crash Epoch 0 create (“file1”)stat (“file1”)write (“file1”, 4k) File : file1 Inode# : 15 create (“file1”) stat (“file1”)write (“file1”, 4k) File1: inode# 12 File1: inode# 15 Inode# Mismatch File : file1 Inode# : 12
27
Failures are inevitable in file systems Learn to cope and not hope to avoid them Generic recovery mechanism for FS failures Improves FS reliability availability of data Users: Install new FSes with confidence Developers: Ship FS faster; as not all exception cases are now show-stoppers 27
28
Questions and Comments 28 Advanced Systems Lab (ADSL) University of Wisconsin-Madison http://www.cs.wisc.edu/adsl
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.