Tolerating File-System Mistakes with EnvyFS Lakshmi N. Bairavasundaram NetApp, Inc. Swaminathan Sundararaman Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau.

Slides:



Advertisements
Similar presentations
Chapter 12: File System Implementation
Advertisements

Concepts about the file system 2. The disk structure 3. Files in disk – The ext2 FS 4. The Virtual File System (c) 2013, Prof. Jordi Garcia.
Snapshots in a Flash with ioSnap TM Sriram Subramanian, Swami Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau Copyright © 2014.
File Systems.
Allocation Methods - Contiguous
Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift.
Chapter 10: File-System Interface
Ext2/Ext3 Linux File System Reporter: Po-Liang, Wu.
Dr. Kalpakis CMSC 421, Operating Systems. Fall File-System Interface.
CSE506: Operating Systems Block Cache. CSE506: Operating Systems Address Space Abstraction Given a file, which physical pages store its data? Each file.
File System Implementation
File Systems Implementation
Accurate and Efficient Replaying of File System Traces Nikolai Joukov, TimothyWong, and Erez Zadok Stony Brook University (FAST 2005) USENIX Conference.
Operating Systems File Systems (Select parts of Ch 6)
6/24/2015B.RamamurthyPage 1 File System B. Ramamurthy.
Chapter 12 File Management Systems
Lecture 17 FS APIs and vsfs. File and File Name What is a File? Array of bytes. Ranges of bytes can be read/written. File system consists of many files,
7/15/2015B.RamamurthyPage 1 File System B. Ramamurthy.
Swaminathan Sundararaman, Yupu Zhang, Sriram Subramanian, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau.
CS 6560 Operating System Design Lecture 13 Finish File Systems Block I/O Layer.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
CS 346 – Chapter 12 File systems –Structure –Information to maintain –How to access a file –Directory implementation –Disk allocation methods  efficient.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
CSE 451: Operating Systems
Selective Versioning in a Secure Disk System Swaminathan Sundararaman University of Wisconsin-Madison Gopalan Sivathanu Google Inc. Erez Zadok Stony Brook.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials– 8 th Edition Chapter 10: File System Implementation.
File Systems CSCI What is a file? A file is information that is stored on disks or other external media.
File System Interface. File Concept Access Methods Directory Structure File-System Mounting File Sharing (skip)‏ File Protection.
UNIX File and Directory Caching How UNIX Optimizes File System Performance and Presents Data to User Processes Using a Virtual File System.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
Journal-guided Resynchronization for Software RAID
Chapter 5 File Management File System Implementation.
Using Model Checking to Find Serious File System Errors StanFord Computer Systems Laboratory and Microsft Research. Published in 2004 Presented by Chervet.
File System Implementation
Semantically-Smart Disk Systems Muthian Sivathanu, Vijayan Prabhakaran, Florentina Popovici, Tim Denehy, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
1 SQCK: A Declarative File System Checker Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin.
Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift File System Kernel.
Disk & File System Management Disk Allocation Free Space Management Directory Structure Naming Disk Scheduling Protection CSE 331 Operating Systems Design.
Solutions for the First Quiz COSC 6360 Spring 2014.
Page 112/7/2015 CSE 30341: Operating Systems Principles Chapter 11: File System Implementation  Overview  File system structure – layered, block based.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
Chapter 11 – File-System Implementation (Pgs )
EXT2C: Increasing Disk Reliability Brian Pellin, Chloe Schulze CS736 Presentation May 3 th, 2005.
Linux File system and VFS. A simple description of the UNIX system, also applicable to Linux, is this: "On a UNIX system, everything is a file; if something.
14.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 10 & 11: File-System Interface and Implementation.
Linux file systems Name: Peijun Li Student ID: Prof. Morteza Anvari.
Review CS File Systems - Partitions What is a hard disk partition?
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
MINIX Presented by: Clinton Morse, Joseph Paetz, Theresa Sullivan, and Angela Volk.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems File System Interface.
W4118 Operating Systems Instructor: Junfeng Yang.
Fall 2011 Nassau Community College ITE153 – Operating Systems 1 Session 5 Files.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
File-System Management
File System Examples Unix Fast File System (FFS)
Chapter 11: File System Implementation
Scaling a file system to many cores using an operation log
Filesystems.
Journaling File Systems
Subject Name: Operating Systems Subject Code:10CS53
File Sharing Sharing of files on multi-user systems is desirable
File System B. Ramamurthy B.Ramamurthy 11/27/2018.
Directory Structure A collection of nodes containing information about all files Directory Files F 1 F 2 F 3 F 4 F n Both the directory structure and the.
CSE 451 Fall 2003 Section 11/20/2003.
Chapter 14: File-System Implementation
Presentation transcript:

Tolerating File-System Mistakes with EnvyFS Lakshmi N. Bairavasundaram NetApp, Inc. Swaminathan Sundararaman Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin Madison

File Systems in Today’s World Modern file systems are complex – Tens of thousands of lines of code (e.g., XFS 45K LOC) Storage stack is also getting deeper – Hypervisor, network, logical volume manager Need to handle a gamut of failures – Memory allocation, disk faults, bit flips, system crashes Preserve integrity of its meta-data and user data 5/13/20152Tolerating File-System Mistakes with EnvyFS

File System Bugs Bug reports for Linux 2.6 series from Bugzilla – ext3: 64, JFS: 17, ReiserFS: 38 – Some are FS corruption causing permanent data loss FS bugs broadly classified into two categories – “fail-stop”: System immediately crashes Solutions: Nooks [ Swift 04 ], CuriOS [David08] – “fail-silent”: Accidentally corrupt on-disk state Many such bugs uncovered [ Prabhakaran05, Gunawi08, Yang04, Yang06b ] 5/13/20153Tolerating File-System Mistakes with EnvyFS

5/13/20154Tolerating File-System Mistakes with EnvyFS Bugs are inevitable in file systems Challenge: how to cope with them?

Based on N-version programming [ Avizienis77 ] – NFS servers [ Rodrigues01 ], databases [ Vandiver07 ], security [ Cox06 ] N-Version File Systems EnvyFS: Simple software layer –Store data in N child file systems –Operations performed on all children Rely on a simple software layer Challenge: reducing overheads while retaining reliability –SubSIST: Novel Single Instance Store 5/13/2015Tolerating File-System Mistakes with EnvyFS5 EnvyFS layer Child 1 Child 2 Child N Disk driver Disk … SIS layer Application

Results Robustness – Traditional file systems handle few corruptions (< 4%) – EnvyFS 3 tolerates 98.9% of single file system mistakes Performance – Desktop workloads: EnvyFS 3 has comparable performance – I/O intensive workloads: Normal mode: EnvyFS 3 + SubSIST acceptable performance Under memory pressure: EnvyFS 3 + SubSIST large overheads Potential as a debugging tool for FS developers – Pinpoint the source of “fail-silent” bug in ext3 5/13/20156Tolerating File-System Mistakes with EnvyFS

Outline Introduction Building reliable file systems Reducing overheads with SubSIST Evaluation Conclusion 5/13/20157Tolerating File-System Mistakes with EnvyFS

N-Version Systems Development process: 1.Producing the specification of software 2.Implementing N versions of the software 3.Creating N-version layer —Executes different versions —Determines the consensus result 5/13/20158Tolerating File-System Mistakes with EnvyFS

1. Producing Specification Our own specification ? – Impractical: Requires wide scale changes to file systems – Specifications take years to get accepted Can we leverage existing specification ? – Yes, can leverage VFS, but there are some issues VFS not precise for N-versioning purpose – Needs to handle cases where specification is not precise – e.g., Ordering directory entries, inode number allocation 5/13/2015Tolerating File-System Mistakes with EnvyFS9

Imprecise VFS Specification Ordering directory entries Issue: – No specified return order – Can’t blindly compare entries Solution: – Read all entries from a directory (dir: test in our case) from all FSes – Match entries from FSes – Return majority results 5/13/201510Tolerating File-System Mistakes with EnvyFS FS X FS Y FS Z EnvyFS layer … File 1 File 2 File 3 Dir: test File 2 File 3 File 1 Dir: test File 1File 2File 3 Readdir: test No Entries File 3 File 1 File 2 File 1 File 2 File 3 File 1 File 2 File 3 Dir: test

Virt #FS 1FS 3FS 2 ?? File 1 | 36 Imprecise VFS Specification (cont) Inode number allocation – Inode numbers returned through system calls – Each child file system issues different inode numbers – Possible solution: Force file systems to use same algorithm? – Our solution: Issue inode numbers at EnvyFS layer 5/13/201511Tolerating File-System Mistakes with EnvyFS FS X FS Y FS Z EnvyFS layer Dir: test File 1 | 10File 1 |65 File 1 10 File 2 15 File 3 16 File 2 04 File 3 44 File 1 36 File 1 | Inode Mapping Table 15Stat: File 1 File 3 99 File 1 65 File 2 43 Inode Numbers Inode Mapping Table not persistently stored

2. Implementing N versions of FS Painful process – High cost of development, long time delays Lucky! Hard work already done for us – 30 different disk based file systems in Linux 2.6 Which file systems to use? – ext3, JFS, ReiserFS in a three-version FS – Others should work without modifications 5/13/201512Tolerating File-System Mistakes with EnvyFS

3. Creating N-Version Layer 5/13/ N-Version layer (EnvyFS) – Inserted beneath VFS – Simple design to avoid bugs Example: Reading a file – Allocate N data buffers – Read data block from the disk – Compare: data, return code, file position – Return: data, return code Issues: – Allocate memory for each read operation – Extra copy from allocated buffer to application – Comparison overheads ComparatorsWrappers Inode Mapping Table Application VFS layer … ext3JFS ReiserFS EnvyFS Layer Read (file, 1 block) Read (…) FFF pos: x D D D D D D err = Disk D err, D Tolerating File-System Mistakes with EnvyFS

Reading a File in EnvyFS Solution: – Same application buffer for all FS – TCP-like checksums for data comparison – Compare: checksums, return code, file position – Read data until majority 5/13/201514Tolerating File-System Mistakes with EnvyFS ComparatorsWrappers Inode Mapping Table Application VFS layer … ext3JFS ReiserFS EnvyFS Layer Read (file, 1 block) Read (…) FFF pos: x D D D D D err = FS 1 #FS 2 #FS N #… 435 … 436 Checksums Disk D err, D Read (…) D pos: x

Outline Introduction Building reliable file systems Reducing overheads with SubSIST Evaluation Conclusion 5/13/201515Tolerating File-System Mistakes with EnvyFS

Part 1Part 2Part N … … Disk 1Disk 2Disk N Disk Case for Single Instance Storage (SIS) Ideal: One disk per FS Practical: One disk for all FS Overheads – Effective storage space: 1/N – N times more I/O (Read/write) Challenge: Maintain diversity while minimizing overheads 5/13/201516Tolerating File-System Mistakes with EnvyFS EnvyFS layer … FS 1 FS 2 FS N Application VFS layer Disk Req. Queue N 12N

SubSIST: Single Instance Store Variant of an Single Instance Store – Selectively merges data blocks Block addressable SIS – Exports virtual disks to FSes – Manages mapping, free space info. – Not persistently stored on disk EnvyFS writes through N file systems – N data blocks merged to 1 data block – Content hashes not stored persistently – Meta-data blocks not merged – Inter FS blocks and not intra FS 5/13/201517Tolerating File-System Mistakes with EnvyFS EnvyFS layer … FS 1 FS 2 FS N Application VFS layer Vdisk 1 Disk Vdisk 2Vdisk N Read CacheCHash Layer Free Space Management SubSIST DDMMMD DDD D

FS 1 D Disk Handling Data Block Corruptions? Corruption to data in a single FS – Due to bugs, bit flips, storage stack – Corrupt data blocks not merged – All other N-1 data blocks merged – Corrupt data block fixed at next read ×Corruption to data block inside disk Single copy of data – Different code paths – Different on-disk structures 5/13/201518Tolerating File-System Mistakes with EnvyFS EnvyFS layer … FS 2 FS N Application VFS layer Vdisk 1Vdisk 2Vdisk N Read CacheCHash Layer Free Space Management SubSIST DD D D DD D DDD D

Outline Introduction Building reliable file systems Reducing overheads with SubSIST Evaluation – Reliability – Performance Conclusion 5/13/201519Tolerating File-System Mistakes with EnvyFS

Robustness of EnvyFS in recovering from a child file system’s mistake? Disk BB B EnvyFS layer Block Driver B Reliability Evaluation: Fault Injection Corruption: bugs in FS / storage stack Types of disk blocks – superblock, inode, block bitmap, file data, … Perform different file ops – mount, stat, creat, unlink, read, … Report user visible results All results are applicable with SubSIST except corruption to data blocks 5/13/2015Tolerating File-System Mistakes with EnvyFS20 ext 3 JFS ReiserFS Pseudo Device Driver VFS B B B B Type-aware fault injection [P rabhakaran05 ]

ext3 path traversal SET-1 ( stat, … ) SET-2 ( chmod ) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 ( fsync ) umount INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Result Matrix Normal Data loss N/A Cannot mount Ops fail Data corrupt Crash Read-only e Depends E 5/13/201521Tolerating File-System Mistakes with EnvyFS

ext3 path traversal SET-1 ( stat, … ) SET-2 ( chmod ) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 ( fsync ) umount INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Data loss N/A Cannot mount Ext3 stores many superblock copies; but, does not handle superblock corruption 5/13/2015Tolerating File-System Mistakes with EnvyFS22 E

ext3 path traversal SET-1 ( stat, … ) SET-2 ( chmod ) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 ( fsync ) umount INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Data loss N/A Cannot mount Ops fail Crash In addition to operations failing, inode corruption leads to data loss Unlink: system crash during unmount 5/13/2015Tolerating File-System Mistakes with EnvyFS23 E

ext3 path traversal SET-1 ( stat, … ) SET-2 ( chmod ) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 ( fsync ) umount INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Normal Data loss N/A Cannot mount Ops fail Data corrupt Crash Read-only e Depends E 5/13/201524Tolerating File-System Mistakes with EnvyFS

Kernel panic in ext3 path traversal SET-1 ( stat, … ) SET-2 ( chmod ) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 ( fsync ) umount Normal N/A INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC EnvyFS 3 works in every scenario 5/13/2015Tolerating File-System Mistakes with EnvyFS25 EnvyFS 3 E R J EnvyFS

Potential for Bug Isolation ext3EnvyFS 3 Time Unlink on corrupt inode: - ext3_lookup (bug) - ext3_unlink Unmount (panic) Time Unlink on corrupt inode: - ext3_lookup (bug) -ext3 inode does not match others -Further ops not issued In typical use, a problem is noticed only on panic In EnvyFS 3, a problem is noticed the first time child file system returns wrong results 5/13/2015Tolerating File-System Mistakes with EnvyFS26

JFS path traversal SET-1 SET-2 read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 umount INODE DIR BMAP IMAP INTERNAL DATA SUPER JSUPER JDATA AGGR-INODE IMAPDESC IMAPCNTL Normal Data loss N/A Cannot mount Ops fail Data corrupt Crash Read-only a Depends J 5/13/201527Tolerating File-System Mistakes with EnvyFS

EnvyFS 3 path traversal SET-1 SET-2 read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 umount INODE DIR BMAP IMAP INTERNAL DATA SUPER JSUPER JDATA AGGR-INODE IMAPDESC IMAPCNTL Normal N/A Crash Kernel panic in EnvyFS 3 E R J EnvyFS 5/13/201528Tolerating File-System Mistakes with EnvyFS

Experimental setup – AMD Opteron 2.2 GHz Processor – 2GB RAM – 80 GB Hitachi Deskstar 7200-rpm SATA disk – Linux – 4GB disk partition for each file system OpenSSH Benchmark Performance Evaluation 5/13/201529Tolerating File-System Mistakes with EnvyFS Elapsed Time (in Seconds) File Systems CPU Intensive OpenSSH Copy, untar and make Performance of EnvyFS 3 is comparable to a single file system 3 % overhead

I/O Intensive – Mimics busy mail server workload – Transaction: creates, deletes, reads, appends, … Postmark Configuration – 2500 files – File size: 4Kb – 40Kb – No. of transactions: 10K and 100K Postmark Benchmark 5/13/201530Tolerating File-System Mistakes with EnvyFS Elapsed Time (in Seconds) EnvyFS 3 : 3.3x + SubSIST: -32% EnvyFS 3 : 8x + SubSIST: 4x EnvyFS 3 : 1.7x + SubSIST: 11.5%

Summary of Results Robustness – Traditional file systems vulnerable to corruptions – EnvyFS 3 tolerates almost all mistakes in one FS Performance – Desktop workloads: EnvyFS 3 has comparable performance – I/O intensive workloads: Regular Operations: EnvyFS 3 + SubSIST acceptable performance Memory pressure: EnvyFS 3 + SubSIST has large overhead 5/13/201531Tolerating File-System Mistakes with EnvyFS

Outline Introduction Building reliable file systems Reducing overheads with SubSIST Evaluation Conclusion 5/13/201532Tolerating File-System Mistakes with EnvyFS

Conclusion Bugs/mistakes are inevitable in any software – Must cope, not just hope to avoid EnvyFS: N-version approach to tolerating FS bugs – Built using existing specification and file systems SubSIST: single instance store – Decreases overheads while retaining reliability 5/13/201533Tolerating File-System Mistakes with EnvyFS

Thank You! Advanced Systems Lab (ADSL) University of Wisconsin-Madison 5/13/201534Tolerating File-System Mistakes with EnvyFS