Download presentation
Published byNathaniel Andrews Modified over 9 years ago
1
Journal-guided Resynchronization for Software RAID
Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin, Madison
2
RAID Consistent Update Problem
RAID task is to maintain consistency Challenging in the face of crashes Updates must be applied to more than one disk Inconsistency means window of vulnerability Disk failure may lead to data loss P P P P P P P P P P P P P P P P
3
High-end RAID Solution
Consistent update with non-volatile memory Logs writes in NVRAM until they reach disk Performance – logging to NVRAM is fast Reliability – data is safe in NVRAM Availability – recovery is fast But, enterprise systems are expensive
4
Software RAID Solutions
Consistent update is challenging Performance versus reliability trade-off Performance: resynchronization after crash Scan entire volume to fix inconsistencies Extremely slow, hours for 100s of GBs to days for TBs Reliability: lengthens window of vulnerability Availability: consumes array bandwidth Reliability: log intentions to a bitmap Performance: extra writes to maintain bitmap
5
Cooperative Software RAID Solution
Journaling file systems perform logging Maintain file system data structure consistency ext3, ReiserFS, JFS, NTFS Journal-guided resynchronization New ext3 mode: declared mode New software RAID interface: verify read Achieves performance, reliability, availability
6
Journal-guided Resync Overview
Crash: What writes were outstanding? Narrow the range of possible inconsistencies Obtain information from journal (declared mode) Restart: journal-guided resynchronization Use journal to identify outstanding writes Communicate locations to RAID (verify read) Check redundancy and repair inconsistencies Greatly reduce the time for resynchronization
7
Outline Problem ext3 Background and Analysis
ext3 Declared Mode and RAID Verify Read Journal-guided Resynchronization Evaluation Conclusion
8
ext3 Modes Data-journaling mode Ordered mode (default) Writeback mode
All data and metadata is written to the journal Ordered mode (default) Only metadata is written to the journal Strict ordering between data and metadata Writeback mode No ordering between data and metadata
9
ext3 Transactions Updates are grouped into transactions
Transaction states Running – collect updates in memory Commit – write updates to journal Checkpoint – write updates to home locations
10
ext3 Journal Structures
Journal superblock Head and tail pointers into journal file Transaction sequence number Descriptor block List of home locations for upcoming blocks Commit block Marks the end of a transaction
11
Data-journaling Write Analysis
Checkpointing Committing Running Checkpoint: write journaled blocks to home, wait (known) update superblock (known) Commit: write desc, meta, data to journal, wait (bounded) write commit to journal, wait (bounded) Running: collect file system updates in memory META DATA DATA DATA DATA DESC 11 META DATA DATA DATA DATA COMM 11 Super Journal P P P P P P P P P P P P P P P P
12
Data-journaling Summary
Provides a record of all outstanding writes Suitable for journal-guided resynchronization Offers poor performance Block Type Write Location superblock known, fixed journal bounded, fixed home metadata known, descriptors home data
13
Ordered Write Analysis
Committing Running Commit: write data to home, wait (unknown) write desc and meta to journal, wait (bounded) write commit to journal, wait (bounded) Running: collect file system updates in memory pdflush may write data to home (unknown) META DATA DATA DATA DATA DESC 11 META DATA COMM 11 Super Journal P P P P P P P P P P P P P P P P
14
Ordered Summary Does not provide outstanding write record
Unsuitable for journal-guided resynchronization Block Type Write Location superblock known, fixed journal bounded, fixed home metadata known, descriptors home data unknown
15
Outline Problem ext3 Background and Analysis
ext3 Declared Mode and RAID Verify Read Journal-guided Resynchronization Evaluation Conclusion
16
Declared Mode Variation of ordered mode
Only metadata is journaled, strict ordering Declares its intent to write to home locations New journal structure: declare block List of home data locations for the transaction Space and performance overheads
17
Declared Write Analysis
Committing Running Commit: write declare to journal, wait (bounded) write data to home, wait (known) write desc and meta to journal, wait (bounded) write commit to journal, wait (bounded) Running: collect file system updates in memory pdflush may write data to home (unknown) META DATA DATA DATA DATA DECL 11 DESC 11 META DATA COMM 11 Super Journal P P P P P P P P P P P P P P P P
18
Software RAID Verify Read
File system must communicate possible inconsistencies to the software RAID layer New interface: verify read request Read block and verify its redundant information Repair redundant information if inconsistent xor xor = ? P P P P P P P P P P P P P P P P P
19
Outline Problem ext3 Background and Analysis
ext3 Declared Mode and RAID Verify Read Journal-guided Resynchronization Evaluation Conclusion
20
Journal-guided Resynchronization
Recovery and Resynchronization: superblock write: verify read for superblock checkpointing: verify reads for descriptor home locations committing: verify reads for head of the journal home data writes: verify reads for declared home locations checkpoint committed transactions DECL 11 DESC 11 META DATA COMM 11 DECL 12 Super Journal P P P P P P P P P P P P P P P P
21
Outline Problem ext3 Background and Analysis
ext3 Declared Mode and RAID Verify Read Journal-guided Resynchronization Evaluation Conclusion
22
Declared Mode Evaluation
Microbenchmarks (versus ordered mode) Random write (3% slowdown) Sequential write (5% slowdown) Sprite create, read, unlink (4% slowdown) Macrobenchmarks ssh Benchmark (3% speedup for unpack) Postmark (40% speedup - 5% slowdown) Speedup from globally sorted write order TPC-B (20% - 5% slowdown) Small transaction size increases declare overhead
23
Implementation Complexity
Cooperative approach reduces complexity Journal-guided Resynchronization Module Original Lines Modified Lines Change Software RAID-5 3475 18 0.5 % ext3 8621 69 0.8 % Journaling 3472 308 8.9 % Total 15568 395 2.5 % Linux RAID-1 Intent Bitmap Logging Software RAID-1 3116 1193 38.3 %
24
Resynchronization Experiment
Five disk, 1 GB RAID-5 array Foreground process reading a set of files After 30 seconds, crash and restart machine Resynchronization begins Foreground process restarts Monitor foreground bandwidth and resync
25
Resynchronization Results
Availability: foreground BW from 29.6 to 34.1 MB/s Reliability: vulnerability from 254 to 0.21 seconds Reduced from O(array size) to O(journal size)
26
Outline Problem ext3 Background and Analysis
ext3 Declared Mode and RAID Verify Read Journal-guided Resynchronization Evaluation Conclusion
27
Conclusion RAID consistent updates are challenging
Analyzed ext3 journaling, declared mode Identifies outstanding writes after a crash Software RAID verify read interface Journal-guided Resynchronization Leverages functionality, reducing complexity Provides performance, reliability, and availability Cooperation between layers is the key
28
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.