Download presentation
Presentation is loading. Please wait.
Published byCecil Parrish Modified over 9 years ago
1
Tolerating File-System Mistakes with EnvyFS Lakshmi N. Bairavasundaram NetApp, Inc. Swaminathan Sundararaman Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin Madison
2
File Systems in Today’s World Modern file systems are complex – Tens of thousands of lines of code (e.g., XFS 45K LOC) Storage stack is also getting deeper – Hypervisor, network, logical volume manager Need to handle a gamut of failures – Memory allocation, disk faults, bit flips, system crashes Preserve integrity of its meta-data and user data 5/13/20152Tolerating File-System Mistakes with EnvyFS
3
File System Bugs Bug reports for Linux 2.6 series from Bugzilla – ext3: 64, JFS: 17, ReiserFS: 38 – Some are FS corruption causing permanent data loss FS bugs broadly classified into two categories – “fail-stop”: System immediately crashes Solutions: Nooks [ Swift 04 ], CuriOS [David08] – “fail-silent”: Accidentally corrupt on-disk state Many such bugs uncovered [ Prabhakaran05, Gunawi08, Yang04, Yang06b ] 5/13/20153Tolerating File-System Mistakes with EnvyFS
4
5/13/20154Tolerating File-System Mistakes with EnvyFS Bugs are inevitable in file systems Challenge: how to cope with them?
5
Based on N-version programming [ Avizienis77 ] – NFS servers [ Rodrigues01 ], databases [ Vandiver07 ], security [ Cox06 ] N-Version File Systems EnvyFS: Simple software layer –Store data in N child file systems –Operations performed on all children Rely on a simple software layer Challenge: reducing overheads while retaining reliability –SubSIST: Novel Single Instance Store 5/13/2015Tolerating File-System Mistakes with EnvyFS5 EnvyFS layer Child 1 Child 2 Child N Disk driver Disk … SIS layer Application
6
Results Robustness – Traditional file systems handle few corruptions (< 4%) – EnvyFS 3 tolerates 98.9% of single file system mistakes Performance – Desktop workloads: EnvyFS 3 has comparable performance – I/O intensive workloads: Normal mode: EnvyFS 3 + SubSIST acceptable performance Under memory pressure: EnvyFS 3 + SubSIST large overheads Potential as a debugging tool for FS developers – Pinpoint the source of “fail-silent” bug in ext3 5/13/20156Tolerating File-System Mistakes with EnvyFS
7
Outline Introduction Building reliable file systems Reducing overheads with SubSIST Evaluation Conclusion 5/13/20157Tolerating File-System Mistakes with EnvyFS
8
N-Version Systems Development process: 1.Producing the specification of software 2.Implementing N versions of the software 3.Creating N-version layer —Executes different versions —Determines the consensus result 5/13/20158Tolerating File-System Mistakes with EnvyFS
9
1. Producing Specification Our own specification ? – Impractical: Requires wide scale changes to file systems – Specifications take years to get accepted Can we leverage existing specification ? – Yes, can leverage VFS, but there are some issues VFS not precise for N-versioning purpose – Needs to handle cases where specification is not precise – e.g., Ordering directory entries, inode number allocation 5/13/2015Tolerating File-System Mistakes with EnvyFS9
10
Imprecise VFS Specification Ordering directory entries Issue: – No specified return order – Can’t blindly compare entries Solution: – Read all entries from a directory (dir: test in our case) from all FSes – Match entries from FSes – Return majority results 5/13/201510Tolerating File-System Mistakes with EnvyFS FS X FS Y FS Z EnvyFS layer … File 1 File 2 File 3 Dir: test File 2 File 3 File 1 Dir: test File 1File 2File 3 Readdir: test No Entries File 3 File 1 File 2 File 1 File 2 File 3 File 1 File 2 File 3 Dir: test
11
Virt #FS 1FS 3FS 2 ?? File 1 | 36 Imprecise VFS Specification (cont) Inode number allocation – Inode numbers returned through system calls – Each child file system issues different inode numbers – Possible solution: Force file systems to use same algorithm? – Our solution: Issue inode numbers at EnvyFS layer 5/13/201511Tolerating File-System Mistakes with EnvyFS FS X FS Y FS Z EnvyFS layer Dir: test File 1 | 10File 1 |65 File 1 10 File 2 15 File 3 16 File 2 04 File 3 44 File 1 36 File 1 | 15103665 Inode Mapping Table 15Stat: File 1 File 3 99 File 1 65 File 2 43 Inode Numbers Inode Mapping Table not persistently stored
12
2. Implementing N versions of FS Painful process – High cost of development, long time delays Lucky! Hard work already done for us – 30 different disk based file systems in Linux 2.6 Which file systems to use? – ext3, JFS, ReiserFS in a three-version FS – Others should work without modifications 5/13/201512Tolerating File-System Mistakes with EnvyFS
13
3. Creating N-Version Layer 5/13/201513 N-Version layer (EnvyFS) – Inserted beneath VFS – Simple design to avoid bugs Example: Reading a file – Allocate N data buffers – Read data block from the disk – Compare: data, return code, file position – Return: data, return code Issues: – Allocate memory for each read operation – Extra copy from allocated buffer to application – Comparison overheads ComparatorsWrappers Inode Mapping Table Application VFS layer … ext3JFS ReiserFS EnvyFS Layer Read (file, 1 block) Read (…) FFF pos: x D D D D D D err = Disk D err, D Tolerating File-System Mistakes with EnvyFS
14
Reading a File in EnvyFS Solution: – Same application buffer for all FS – TCP-like checksums for data comparison – Compare: checksums, return code, file position – Read data until majority 5/13/201514Tolerating File-System Mistakes with EnvyFS ComparatorsWrappers Inode Mapping Table Application VFS layer … ext3JFS ReiserFS EnvyFS Layer Read (file, 1 block) Read (…) FFF pos: x D D D D D err = FS 1 #FS 2 #FS N #… 435 … 436 Checksums Disk D err, D Read (…) D pos: x
15
Outline Introduction Building reliable file systems Reducing overheads with SubSIST Evaluation Conclusion 5/13/201515Tolerating File-System Mistakes with EnvyFS
16
Part 1Part 2Part N … … Disk 1Disk 2Disk N Disk Case for Single Instance Storage (SIS) Ideal: One disk per FS Practical: One disk for all FS Overheads – Effective storage space: 1/N – N times more I/O (Read/write) Challenge: Maintain diversity while minimizing overheads 5/13/201516Tolerating File-System Mistakes with EnvyFS EnvyFS layer … FS 1 FS 2 FS N Application VFS layer Disk Req. Queue 1 1 2 N 12N
17
SubSIST: Single Instance Store Variant of an Single Instance Store – Selectively merges data blocks Block addressable SIS – Exports virtual disks to FSes – Manages mapping, free space info. – Not persistently stored on disk EnvyFS writes through N file systems – N data blocks merged to 1 data block – Content hashes not stored persistently – Meta-data blocks not merged – Inter FS blocks and not intra FS 5/13/201517Tolerating File-System Mistakes with EnvyFS EnvyFS layer … FS 1 FS 2 FS N Application VFS layer Vdisk 1 Disk Vdisk 2Vdisk N Read CacheCHash Layer Free Space Management SubSIST DDMMMD DDD D
18
FS 1 D Disk Handling Data Block Corruptions? Corruption to data in a single FS – Due to bugs, bit flips, storage stack – Corrupt data blocks not merged – All other N-1 data blocks merged – Corrupt data block fixed at next read ×Corruption to data block inside disk Single copy of data – Different code paths – Different on-disk structures 5/13/201518Tolerating File-System Mistakes with EnvyFS EnvyFS layer … FS 2 FS N Application VFS layer Vdisk 1Vdisk 2Vdisk N Read CacheCHash Layer Free Space Management SubSIST DD D D DD D DDD D
19
Outline Introduction Building reliable file systems Reducing overheads with SubSIST Evaluation – Reliability – Performance Conclusion 5/13/201519Tolerating File-System Mistakes with EnvyFS
20
Robustness of EnvyFS in recovering from a child file system’s mistake? Disk BB B EnvyFS layer Block Driver B Reliability Evaluation: Fault Injection Corruption: bugs in FS / storage stack Types of disk blocks – superblock, inode, block bitmap, file data, … Perform different file ops – mount, stat, creat, unlink, read, … Report user visible results All results are applicable with SubSIST except corruption to data blocks 5/13/2015Tolerating File-System Mistakes with EnvyFS20 ext 3 JFS ReiserFS Pseudo Device Driver VFS B B B B Type-aware fault injection [P rabhakaran05 ]
21
ext3 path traversal SET-1 ( stat, … ) SET-2 ( chmod ) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 ( fsync ) umount INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Result Matrix Normal Data loss N/A Cannot mount Ops fail Data corrupt Crash Read-only e Depends E 5/13/201521Tolerating File-System Mistakes with EnvyFS
22
ext3 path traversal SET-1 ( stat, … ) SET-2 ( chmod ) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 ( fsync ) umount INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Data loss N/A Cannot mount Ext3 stores many superblock copies; but, does not handle superblock corruption 5/13/2015Tolerating File-System Mistakes with EnvyFS22 E
23
ext3 path traversal SET-1 ( stat, … ) SET-2 ( chmod ) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 ( fsync ) umount INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Data loss N/A Cannot mount Ops fail Crash In addition to operations failing, inode corruption leads to data loss Unlink: system crash during unmount 5/13/2015Tolerating File-System Mistakes with EnvyFS23 E
24
ext3 path traversal SET-1 ( stat, … ) SET-2 ( chmod ) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 ( fsync ) umount INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Normal Data loss N/A Cannot mount Ops fail Data corrupt Crash Read-only e Depends E 5/13/201524Tolerating File-System Mistakes with EnvyFS
25
Kernel panic in ext3 path traversal SET-1 ( stat, … ) SET-2 ( chmod ) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 ( fsync ) umount Normal N/A INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC EnvyFS 3 works in every scenario 5/13/2015Tolerating File-System Mistakes with EnvyFS25 EnvyFS 3 E R J EnvyFS
26
Potential for Bug Isolation ext3EnvyFS 3 Time Unlink on corrupt inode: - ext3_lookup (bug) - ext3_unlink Unmount (panic) Time Unlink on corrupt inode: - ext3_lookup (bug) -ext3 inode does not match others -Further ops not issued In typical use, a problem is noticed only on panic In EnvyFS 3, a problem is noticed the first time child file system returns wrong results 5/13/2015Tolerating File-System Mistakes with EnvyFS26
27
JFS path traversal SET-1 SET-2 read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 umount INODE DIR BMAP IMAP INTERNAL DATA SUPER JSUPER JDATA AGGR-INODE IMAPDESC IMAPCNTL Normal Data loss N/A Cannot mount Ops fail Data corrupt Crash Read-only a Depends J 5/13/201527Tolerating File-System Mistakes with EnvyFS
28
EnvyFS 3 path traversal SET-1 SET-2 read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 umount INODE DIR BMAP IMAP INTERNAL DATA SUPER JSUPER JDATA AGGR-INODE IMAPDESC IMAPCNTL Normal N/A Crash Kernel panic in EnvyFS 3 E R J EnvyFS 5/13/201528Tolerating File-System Mistakes with EnvyFS
29
Experimental setup – AMD Opteron 2.2 GHz Processor – 2GB RAM – 80 GB Hitachi Deskstar 7200-rpm SATA disk – Linux 2.6.12 – 4GB disk partition for each file system OpenSSH Benchmark Performance Evaluation 5/13/201529Tolerating File-System Mistakes with EnvyFS Elapsed Time (in Seconds) File Systems CPU Intensive OpenSSH 4.5 -- Copy, untar and make Performance of EnvyFS 3 is comparable to a single file system 3 % overhead
30
I/O Intensive – Mimics busy mail server workload – Transaction: creates, deletes, reads, appends, … Postmark Configuration – 2500 files – File size: 4Kb – 40Kb – No. of transactions: 10K and 100K Postmark Benchmark 5/13/201530Tolerating File-System Mistakes with EnvyFS Elapsed Time (in Seconds) 129.0 39.0 26.4 14.7 9.6 29 107 34 851 430 128 243 78 406 271 EnvyFS 3 : 3.3x + SubSIST: -32% EnvyFS 3 : 8x + SubSIST: 4x EnvyFS 3 : 1.7x + SubSIST: 11.5%
31
Summary of Results Robustness – Traditional file systems vulnerable to corruptions – EnvyFS 3 tolerates almost all mistakes in one FS Performance – Desktop workloads: EnvyFS 3 has comparable performance – I/O intensive workloads: Regular Operations: EnvyFS 3 + SubSIST acceptable performance Memory pressure: EnvyFS 3 + SubSIST has large overhead 5/13/201531Tolerating File-System Mistakes with EnvyFS
32
Outline Introduction Building reliable file systems Reducing overheads with SubSIST Evaluation Conclusion 5/13/201532Tolerating File-System Mistakes with EnvyFS
33
Conclusion Bugs/mistakes are inevitable in any software – Must cope, not just hope to avoid EnvyFS: N-version approach to tolerating FS bugs – Built using existing specification and file systems SubSIST: single instance store – Decreases overheads while retaining reliability 5/13/201533Tolerating File-System Mistakes with EnvyFS
34
Thank You! Advanced Systems Lab (ADSL) University of Wisconsin-Madison http://www.cs.wisc.edu/adsl 5/13/201534Tolerating File-System Mistakes with EnvyFS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.