1 SQCK: A Declarative File System Checker Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin.

Slides:



Advertisements
Similar presentations
File Systems.
Advertisements

Introduction to Databases
The Zebra Striped Network File System Presentation by Joseph Thompson.
Ext2/Ext3 Linux File System Reporter: Po-Liang, Wu.
Lecture 18 ffs and fsck. File-System Case Studies Local FFS: Fast File System LFS: Log-Structured File System Network NFS: Network File System AFS: Andrew.
File Systems and Databases
Chapter 12: File System Implementation
Chapter 12 File Management Systems
Crash recovery All-or-nothing atomicity & logging.
Lecture 17 FS APIs and vsfs. File and File Name What is a File? Array of bytes. Ranges of bytes can be read/written. File system consists of many files,
Dr. Kalpakis CMSC 461, Database Management Systems Introduction.
Backup & Recovery Concepts for Oracle Database
Students: Nadia Goshmir, Yulia Koretsky Supervisor: Shai Rozenrauch Industrial Project Advanced Tool for Automatic Testing Final Presentation.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Unix File System Internal Structures By C. Shing ITEC Dept Radford University.
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2007 (Week 3, Tuesday 9/4/2007)
Week 1 Lecture MSCD 600 Database Architecture Samuel ConnSamuel Conn, Asst. Professor Suggestions for using the Lecture Slides.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
Components of Database Management System
CSE 451: Operating Systems
Project 6 Project 6 Supplemental Lecture Joe Mongeluzzi Jason Zhao Cornell CS 4411, November 30, 2012.
1 Interface Two most common types of interfaces –SCSI: Small Computer Systems Interface (servers and high-performance desktops) –IDE/ATA: Integrated Drive.
File Systems CSCI What is a file? A file is information that is stored on disks or other external media.
Journal-guided Resynchronization for Software RAID
Linux Ext 3 File System. Linux Uses Ext3 in Linux Hierarchical FS composed of directories. Files mounted during boot process When shut down the.
CS 4284 Systems Capstone Godmar Back Disks & File Systems.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Using Model Checking to Find Serious File System Errors StanFord Computer Systems Laboratory and Microsft Research. Published in 2004 Presented by Chervet.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Semantically-Smart Disk Systems Muthian Sivathanu, Vijayan Prabhakaran, Florentina Popovici, Tim Denehy, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University.
File Systems Operating Systems 1 Computer Science Dept Va Tech August 2007 ©2007 Back File Systems & Fault Tolerance Failure Model – Define acceptable.
IRON for JFFS2 Presented by: Abhinav Kumar Raja Ram Yadhav Ramakrishnan.
Ext2/Ext3 Linux File System Reporter: Po-Liang, Wu.
Chapter 11 – File-System Implementation (Pgs )
UNIX File System (UFS) Chapter Five.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Jeff's Filesystem Papers Review Part I. Review of "Design and Implementation of The Second Extended Filesystem"
Oracle11g: PL/SQL Programming Chapter 3 Handling Data in PL/SQL Blocks.
Visual Basic for Application - Microsoft Access 2003 Finishing the application.
Persistence – Iteration 4 Vancouver Bootcamp Aaron Zeckoski
2007/5/ Digital Forensic Research Workshop (DFRWS) New Orleans, LA 1 Data Hiding in Journaling File Systems Knut Eckstein, Marko Jahnke 報告人:陳晉煒.
IS2803 Developing Multimedia Applications for Business (Part 2) Lecture 1: Introduction to IS2803 Rob Gleasure
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
Lecture 20 FSCK & Journaling. FFS Review A few contributions: hybrid block size groups smart allocation.
Review CS File Systems - Partitions What is a hard disk partition?
10 1 Chapter 10 - A Transaction Management Database Systems: Design, Implementation, and Management, Rob and Coronel.
Embedded System Lab. 최 진 화최 진 화 Kilmo Choi 최길모 A Study of Linux File System Evolution L. Lu, A. C. Arpaci-Dusseau, R. H. ArpaciDusseau,
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Journaling versus Softupdates Asynchronous Meta-Data Protection in File System Authors - Margo Seltzer, Gregory Ganger et all Presenter – Abhishek Abhyankar.
Ryoichi,Kato(at)jp,sony,com 12 Jun 2007 Bogus Orphaned inode Error Problem in e2fsck CELF Tech Jamboree Tokyo.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
EXT in Detail High-Performance Database Research Center
FFS: The Fast File System
Transactions and Reliability
Data Virtualization Tutorial: Introduction to SQL Script
Types for Programs and Proofs
Journaling File Systems
Understanding Real World Data Corruptions in Cloud Systems
Translation of ER-diagram into Relational Schema
-A File System for Lots of Tiny Files
File Systems and Databases
Lecture 20 LFS.
Printed on Monday, December 31, 2018 at 2:03 PM.
CSE 451 Fall 2003 Section 11/20/2003.
CodePainter Revolution Trainer Course
Chapter 5 File Systems -Compiled for MCA, PU
The Design and Implementation of a Log-Structured File System
Presentation transcript:

1 SQCK: A Declarative File System Checker Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin – Madison OSDI ’08 – December 9 th, 2008

2/25 Corrupt file systems  File systems  Store massive amounts of data  Must be reliable  Corrupted file system images  Due to hardware errors, file system bugs, etc.  Need to be repaired a.s.a.p.

3/25 Who should repair?  Does journaling (write-ahead log) help?  No, only for crashes  Does file system repair itself online?  No, not enough machinery  Fsck: the last line of defense  It’s a “must have” utility − XFS: “no need fsck ever”, but deploys fsck at the end  Must be fully reliable

4/25 But … fsck is complex  Fsck has a big task  Turn any corrupt image to a consistent image  E.g. check if a data block is shared by two inodes  How are they implemented?  Written in C  hard to reason about  Large and complex − Ext2 fsck: 150 checks in 16 KLOC − XFS fsck: 340 checks in 22 KLOC  Hundreds of cluttered if-check statements  Bottom line: fsck code is “untouchable”

5/25 Two Questions  Are current checkers really reliable?  If not, how should we build robust checkers?

6/25 e2fsck is unreliable  Analyze e2fsck (ext2 file system checker)  Findings:  Inconsistent repair − The file system becomes unreadable  Consistent but not “correct” − Fsck deletes valid directory entries − Fsck loses a huge number of files

7/25 SQCK  Lesson: Complexity is the enemy of reliability  Big task + bad design  complexity  unreliability  Need a higher-level approach for simplicity  SQCK (SQL-based Fsck)  Use a declarative query language to write checks  Put simply: write fewer lines of code  Evaluation  Simple and reliable: e2fsck in 150 queries (vs. 16 KLOC of C)  More: Great flexibility and reasonable performance

8/25 Outline  Introduction  Analysis of e2fsck  SQCK Design  SQCK Evaluation  Conclusion

9/25 Methodology  E2fsck task: cross-check all ext2 metadata  An indirect pointer should not point to the superblock  A subdir should only be accessible from one directory  Inject single corruption  Observe how e2fsck repairs a single corruption  Only corrupt on-disk pointers − Corrupt an indirect pointer to point to the superblock − Corrupt a directory entry to point to another directory  Usually, a corrupt pointer is simply cleared to zero

10/25 Inconsistent (Out-of-order) Repair Inode *ind Inode *ind … … … … Indirect block 0 Superblock 1.Check bad indirect pointer 2. Check indirect content Ideal fsck e2fsck Inode *ind Inode *ind … … … … Superblock 2. Check indirect content 1.Check bad indirect pointer 0 Superblock … … … … 0 0 0

11/25 Consistent but Incorrect Repair (1) / a1b1 a2b2 Ideal fsck e2fsck / a1b1 a2b2 / a1b1 a2b2 X LF / a1b1 a2b2 / a1b1 b2 X Kidnapping problem! E2fsck does not use all available information

12/25 Result Summary  Four problems  Inconsistent  Information-incomplete  Policy-inconsistent  Insecure  E2fsck does not handle all corruptions  “Warning: Programming bug in e2fsck! Or some bonehead (you) is checking a mounted (live) filesystem.”  Not simple implementation bugs  Difficult to combine available information  Difficult to ensure correct ordering

13/25 Outline  Introduction  Analysis  SQCK Design  SQCK Evaluation  Conclusion

14/25 Fsck Properties  Hundreds of checks  Complex cross-checks  Taxonomy of checks in e2fsck:  Must be ordered correctly Single instance Multiple instances Same structure 6311 Different structures 1235 struct A { int x int y } A { x y } A { x y } A { x y } A { x y } B { m n } A { x y } B { m n } A { x y } B { m n } A { x y } B { m n }

15/25 A Declarative Approach  Lesson: Complexity is the enemy of reliability  SQCK  Use a declarative query language (e.g. SQL), why?  It is declarative: high-level intent is clear  Fit for cross-checking massive information  Goals achieved  Simple: e2fsck in 150 queries (vs. 16 KLOC of C)  Reliable: Each check/query is easy to understand  Flexible: Plug in/out different queries

16/25 Using SQCK  Take a fs image  Load metadata to db tables  Temporary tables  Ex: InodeTable, GroupDescTable, DirEntryTable  Run checks and repairs (in the form of queries)  Flush any modification, and delete tables Scanner Loader File system image Checks + Repairs Flush Database tables

17/25 Declarative check (example 1)  Cross-checking a single instance of a structure  “Find block bitmap that is not located within its block group” first_block = sb->s_first_data_block; last_block = first_block + blocks_per_group; for (i = 0, gd=fs->group_desc; i group_desc_count; i++, gd++) \{ if (i == fs->group_desc_count - 1) last_block = sb->s_blocks_count; if ((gd->bg_blk_bmap < first_block) || (gd->bg_blk_bmap >= last_block)) { px.blk = gd->bg_block_bitmap; if (fix_problem(BB_NOT_GROUP,...)) gd->bg_block_bitmap = 0; }... } SELECT * FROM GroupDescTable G WHERE G.blockBitmap NOT BETWEEN G.start AND G.end

18/25 Declarative check (example 2)  Cross-checking multiple instances of the same structure  “Find false parents (i.e. directory entries that point to a subdirectory that already belongs to another directory)”  Must read all directory entries in dir data blocks  Wrong implementation in e2fsck (the kidnapping problem)

19/25 Declarative check (example 2) if ((dot_state > 1) && (ext2fs_test_inode_bitmap (ctx->inode_dir_map, dirent->inode))) { // ext2fs_get_dir_info // is 20 lines long subdir = e2fsck_get_dir_info (dirent->inode);... if (subdir->parent) { if (fix_problem(LINK_DIR,..)) { dirent->inode = 0; goto next; } } else { subdir->parent = ino; }

20/25 Declarative check (example 2) SELECT F.* //  returns the // false parent(s) FROM DirEntryTable P, C, F WHERE // P says C is its child P.entry_num >= 3 AND P.entry_ino = C.ino AND // and C says P is his parent C.entry_num = 2 AND C.entry_ino = P.ino AND // F also says C is its child F.entry_num >= 3 AND F.entry_ino = C.ino AND F.ino <> P.ino AND FP C

21/25 Declarative Repairs  Running declarative checks is part of the problem  Must also perform the declarative repairs  A repair = An update query  Some repairs simply update a few fields  A repair = A series of queries  Ex: Reconnect an orphan directory to the lost+found directory  Combine a series of queries with C code − All repairs are written in SQL − C code is only used for connecting them... SET T.field = newValue, T.dirty = 1

22/25 Outline  Introduction  Analysis  SQCK Design  SQCK Evaluation  Conclusion

23/25 SQCK Evaluation  Complexity  150 queries in 1100 lines of SQL statements  (compared to 16,000 lines of C in e2fsck)  Reliability  Pass hundreds of corruption scenarios  Flexibility  Add new checks/repairs  Enable different versions of e2fsck  Performance  Introduce some optimizations

24/25 SQCK vs. e2fsck  Reasonable  First generation of SQCK (with MySQL)  Within 1.5x of e2fsck  Future optimizations  Hierarchical checks  Concurrent queries

25/25 Conclusion  Complexity is the enemy of reliability  Recovery code is complex  SQCK: Build recovery tools with a higher- level approach

26 Thank you! Questions? ADvanced Systems Laboratory