Journal-guided Resynchronization for Software RAID

Slides:



Advertisements
Similar presentations
More on File Management
Advertisements

File Systems 1Dennis Kafura – CS5204 – Operating Systems.
Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.
Snapshots in a Flash with ioSnap TM Sriram Subramanian, Swami Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau Copyright © 2014.
Fast and Safe Performance Recovery on OS Reboot Kenichi Kourai Kyushu Institute of Technology.
The Zebra Striped Network File System Presentation by Joseph Thompson.
RAID Technology CS350 Computer Organization Section 2 Larkin Young Rob Deaderick Amos Painter Josh Ellis.
CSCI 3140 Module 8 – Database Recovery Theodore Chiasson Dalhousie University.
Allocation Methods - Contiguous
File Systems Examples.
Ext2/Ext3 Linux File System Reporter: Po-Liang, Wu.
Chapter 11: File System Implementation
Ext3 Journaling File System “absolute consistency of the filesystem in every respect after a reboot, with no loss of existing functionality” chadd williams.
Crash recovery All-or-nothing atomicity & logging.
Section 3 : Business Continuity Lecture 29. After completing this chapter you will be able to:  Discuss local replication and the possible uses of local.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
File System Reliability. Main Points Problem posed by machine/disk failures Transaction concept Reliability – Careful sequencing of file system operations.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
The Design and Implementation of a Log-Structured File System Presented by Carl Yao.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Copyright © 2009 EMC Corporation. Do not Copy - All Rights Reserved.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
JOURNALING VERSUS SOFT UPDATES: ASYNCHRONOUS META-DATA PROTECTION IN FILE SYSTEMS Margo I. Seltzer, Harvard Gregory R. Ganger, CMU M. Kirk McKusick Keith.
IT 344: Operating Systems Winter 2008 Module 16 Journaling File Systems Chia-Chi Teng CTB 265.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
UNIX File and Directory Caching How UNIX Optimizes File System Performance and Presents Data to User Processes Using a Virtual File System.
Copyright © 2009 EMC Corporation. Do not Copy - All Rights Reserved.
Linux Ext 3 File System. Linux Uses Ext3 in Linux Hierarchical FS composed of directories. Files mounted during boot process When shut down the.
Log-structured File System Sriram Govindan
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Deconstructing Storage Arrays Timothy E. Denehy, John Bent, Florentina I. Popovici, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin,
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Semantically-Smart Disk Systems Muthian Sivathanu, Vijayan Prabhakaran, Florentina Popovici, Tim Denehy, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University.
CSE 451: Operating Systems Spring 2012 Journaling File Systems Mark Zbikowski Gary Kimura.
Advanced file systems: LFS and Soft Updates Ken Birman (based on slides by Ben Atkin)
Lecture 21 LFS. VSFS FFS fsck journaling SBDISBDISBDI Group 1Group 2Group N…Journal.
Outline for Today Journaling vs. Soft Updates Administrative.
Lecture 20 FSCK & Journaling. FFS Review A few contributions: hybrid block size groups smart allocation.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Scott Finley University of Wisconsin – Madison CS 736 Project.
Analysis and Evolution of Journaling File Systems By: Vijayan Prabhakaran, Andrea and Remzi Arpai-Dusseau Presented by: Andrew Quinn EECS 582 – W161.
Bridging the Information Gap in Storage Protocol Stacks Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin,
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
Journaling versus Softupdates Asynchronous Meta-Data Protection in File System Authors - Margo Seltzer, Gregory Ganger et all Presenter – Abhishek Abhyankar.
W4118 Operating Systems Instructor: Junfeng Yang.
CSE 451: Operating Systems Winter 2015 Module 17 Journaling File Systems Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.
WSRR 111 Coerced Cache Eviction and Discreet Mode Journaling: Dealing with Misbehaving Disks Abhishek Rajimwale, Vijay Chidambaram, Deepak Ramamurthi Andrea.
Free Transactions with Rio Vista Landon Cox April 15, 2016.
Transactions and Reliability
Chapter 11: File System Implementation
Journaling File Systems
Lecture 20 LFS.
Overview Continuation from Monday (File system implementation)
Bridging the Information Gap in Storage Protocol Stacks
CSE 451: Operating Systems Spring 2011 Journaling File Systems
Printed on Monday, December 31, 2018 at 2:03 PM.
File-System Structure
CSE 451: Operating Systems Spring 2008 Module 14
File System Implementation
CS703 - Advanced Operating Systems
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
CSE 451: Operating Systems Winter Module 16 Journaling File Systems
CSE 451: Operating Systems Spring 2010 Module 14
The Design and Implementation of a Log-Structured File System
Presentation transcript:

Journal-guided Resynchronization for Software RAID Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin, Madison

RAID Consistent Update Problem RAID task is to maintain consistency Challenging in the face of crashes Updates must be applied to more than one disk Inconsistency means window of vulnerability Disk failure may lead to data loss P P P P P P P P P P P P P P P P

High-end RAID Solution Consistent update with non-volatile memory Logs writes in NVRAM until they reach disk Performance – logging to NVRAM is fast Reliability – data is safe in NVRAM Availability – recovery is fast But, enterprise systems are expensive

Software RAID Solutions Consistent update is challenging Performance versus reliability trade-off Performance: resynchronization after crash Scan entire volume to fix inconsistencies Extremely slow, hours for 100s of GBs to days for TBs Reliability: lengthens window of vulnerability Availability: consumes array bandwidth Reliability: log intentions to a bitmap Performance: extra writes to maintain bitmap

Cooperative Software RAID Solution Journaling file systems perform logging Maintain file system data structure consistency ext3, ReiserFS, JFS, NTFS Journal-guided resynchronization New ext3 mode: declared mode New software RAID interface: verify read Achieves performance, reliability, availability

Journal-guided Resync Overview Crash: What writes were outstanding? Narrow the range of possible inconsistencies Obtain information from journal (declared mode) Restart: journal-guided resynchronization Use journal to identify outstanding writes Communicate locations to RAID (verify read) Check redundancy and repair inconsistencies Greatly reduce the time for resynchronization

Outline Problem ext3 Background and Analysis ext3 Declared Mode and RAID Verify Read Journal-guided Resynchronization Evaluation Conclusion

ext3 Modes Data-journaling mode Ordered mode (default) Writeback mode All data and metadata is written to the journal Ordered mode (default) Only metadata is written to the journal Strict ordering between data and metadata Writeback mode No ordering between data and metadata

ext3 Transactions Updates are grouped into transactions Transaction states Running – collect updates in memory Commit – write updates to journal Checkpoint – write updates to home locations

ext3 Journal Structures Journal superblock Head and tail pointers into journal file Transaction sequence number Descriptor block List of home locations for upcoming blocks Commit block Marks the end of a transaction

Data-journaling Write Analysis Checkpointing Committing Running Checkpoint: write journaled blocks to home, wait (known) update superblock (known) Commit: write desc, meta, data to journal, wait (bounded) write commit to journal, wait (bounded) Running: collect file system updates in memory META DATA DATA DATA DATA DESC 11 META DATA DATA DATA DATA COMM 11 Super Journal P P P P P P P P P P P P P P P P

Data-journaling Summary Provides a record of all outstanding writes Suitable for journal-guided resynchronization Offers poor performance Block Type Write Location superblock known, fixed journal bounded, fixed home metadata known, descriptors home data

Ordered Write Analysis Committing Running Commit: write data to home, wait (unknown) write desc and meta to journal, wait (bounded) write commit to journal, wait (bounded) Running: collect file system updates in memory pdflush may write data to home (unknown) META DATA DATA DATA DATA DESC 11 META DATA COMM 11 Super Journal P P P P P P P P P P P P P P P P

Ordered Summary Does not provide outstanding write record Unsuitable for journal-guided resynchronization Block Type Write Location superblock known, fixed journal bounded, fixed home metadata known, descriptors home data unknown

Outline Problem ext3 Background and Analysis ext3 Declared Mode and RAID Verify Read Journal-guided Resynchronization Evaluation Conclusion

Declared Mode Variation of ordered mode Only metadata is journaled, strict ordering Declares its intent to write to home locations New journal structure: declare block List of home data locations for the transaction Space and performance overheads

Declared Write Analysis Committing Running Commit: write declare to journal, wait (bounded) write data to home, wait (known) write desc and meta to journal, wait (bounded) write commit to journal, wait (bounded) Running: collect file system updates in memory pdflush may write data to home (unknown) META DATA DATA DATA DATA DECL 11 DESC 11 META DATA COMM 11 Super Journal P P P P P P P P P P P P P P P P

Software RAID Verify Read File system must communicate possible inconsistencies to the software RAID layer New interface: verify read request Read block and verify its redundant information Repair redundant information if inconsistent xor xor = ? P P P P P P P P P P P P P P P P P

Outline Problem ext3 Background and Analysis ext3 Declared Mode and RAID Verify Read Journal-guided Resynchronization Evaluation Conclusion

Journal-guided Resynchronization Recovery and Resynchronization: superblock write: verify read for superblock checkpointing: verify reads for descriptor home locations committing: verify reads for head of the journal home data writes: verify reads for declared home locations checkpoint committed transactions DECL 11 DESC 11 META DATA COMM 11 DECL 12 Super Journal P P P P P P P P P P P P P P P P

Outline Problem ext3 Background and Analysis ext3 Declared Mode and RAID Verify Read Journal-guided Resynchronization Evaluation Conclusion

Declared Mode Evaluation Microbenchmarks (versus ordered mode) Random write (3% slowdown) Sequential write (5% slowdown) Sprite create, read, unlink (4% slowdown) Macrobenchmarks ssh Benchmark (3% speedup for unpack) Postmark (40% speedup - 5% slowdown) Speedup from globally sorted write order TPC-B (20% - 5% slowdown) Small transaction size increases declare overhead

Implementation Complexity Cooperative approach reduces complexity Journal-guided Resynchronization Module Original Lines Modified Lines Change Software RAID-5 3475 18 0.5 % ext3 8621 69 0.8 % Journaling 3472 308 8.9 % Total 15568 395 2.5 % Linux RAID-1 Intent Bitmap Logging Software RAID-1 3116 1193 38.3 %

Resynchronization Experiment Five disk, 1 GB RAID-5 array Foreground process reading a set of files After 30 seconds, crash and restart machine Resynchronization begins Foreground process restarts Monitor foreground bandwidth and resync

Resynchronization Results Availability: foreground BW from 29.6 to 34.1 MB/s Reliability: vulnerability from 254 to 0.21 seconds Reduced from O(array size) to O(journal size)

Outline Problem ext3 Background and Analysis ext3 Declared Mode and RAID Verify Read Journal-guided Resynchronization Evaluation Conclusion

Conclusion RAID consistent updates are challenging Analyzed ext3 journaling, declared mode Identifies outstanding writes after a crash Software RAID verify read interface Journal-guided Resynchronization Leverages functionality, reducing complexity Provides performance, reliability, and availability Cooperation between layers is the key

Questions? http://www.cs.wisc.edu/adsl/