WSRR 111 Coerced Cache Eviction and Discreet Mode Journaling: Dealing with Misbehaving Disks Abhishek Rajimwale, Vijay Chidambaram, Deepak Ramamurthi Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau
Ordering points are essential All modern file systems depend on ordering points Journaling file systems ( ext3, ext4 ) Data before metadata Copy on write file systems ( ZFS ) Transaction data before uber-block File systems flush the disk cache to ensure correct ordering of writes WSRR 11 2
3 Problem: Flushing doesn’t work Disks can fail to flush data upon request One reason: Bugs Errors in the storage stack [FAST 08] Improper propagation of error codes [FAST 08] Inadequate failure policies [SOSP 05] Bugs in the firmware [SOSP 03]
Disks can lie! WSRR 11 4 F_FULLFSYNC From the fcntl man page in Mac OSX: Does the same thing as fsync(2) then asks the drive to flush all buffered data to the permanent storage device (arg is ignored). This is currently implemented on HFS, MS-DOS (FAT), and Universal Disk Format (UDF) file systems. The operation may take quite a while to complete. Certain FireWire drives have also been known to ignore the request to flush their buffered data. Caching improves performance File systems are usually blamed for data loss
Summary We present Coerced Cache Eviction We show how to characterize a disk cache We implement CCE in the ext3 file system We find that CCE adds low overhead for most workloads WSRR 11 5
6 Outline Motivation Background Coerced Cache Eviction Cache Fingerprinting Discreet Mode Journaling Evaluation Conclusion
Background - Journaling WSRR 11 7 DDDCMM Memory Disk cache Disk surface
WSRR 11 8 Outline Motivation Background Coerced Cache Eviction Cache Fingerprinting Discreet Mode Journaling Evaluation Conclusion
WSRR 11 9 Coerced Cache Eviction Ensures that cache has been truly flushed Key idea: Write a flush workload to disk which replaces the target blocks in the disk cache, flushing target blocks to disk
Coerced Cache Eviction WSRR DDDCMM Memory Disk cache Disk surface FFFFFFFFF
Coerced Cache Eviction Desired properties: Low performance overhead High probability of flushing target blocks Need to understand the disk cache to design good workload WSRR 11 11
WSRR Outline Motivation Background Coerced Cache Eviction Cache Fingerprinting Discreet Mode Journaling Evaluation Conclusion
WSRR Cache Fingerprinting Manufacturers don’t expose details about disk caches Disk caches can vary in: Read/Write partition size Number of segments Replacement policy Poorly characterized in literature
WSRR Cache Fingerprinting Flush micro-benchmark: Write target block Write varied flush workload – measure fsync Read target – measure Vary in each workload: Number of writes Amount of data in each write Sequential/Random writes
WSRR Cache Fingerprinting For each workload Eviction fingerprint Probability of eviction is visually shown Darker region indicates higher probability Performance fingerprint Time taken to write flush workload Darker region indicates higher time to flush
WSRR Cache Fingerprinting Optimal workload selection Highest probability of eviction Lowest cost For some disks, sequential writes may be ineffective at flushing A number of random writes are required
Cache Fingerprinting WSRR ManufacturerCache (MB) Number of writes Data (MB) Time (ms) Hitachi Hitachi Samsung Samsung Western Digital Western Digital Seagate Seagate Seagate
WSRR Outline Motivation Background Coerced Cache Eviction Cache Fingerprinting Discreet Mode Journaling Evaluation Conclusion
WSRR Discreet Mode Journaling Incorporating CCE into ext3 Fingerprint the disk to find optimal flush workload Create flush zone with suitable size Modify ext3 to issue flush zone writes at each ordering point
WSRR Outline Motivation Background Coerced Cache Eviction Cache Fingerprinting Discreet Mode Journaling Evaluation Conclusion
WSRR Evaluation Performance of Discreet Mode Journaling is comparable to regular journaling for most workloads such as Postmark, Filebench webserver and openSSH Postmark
WSRR Conclusion Disk drives misbehave with respect to flush commands Coerced Cache Eviction provides a way to run file systems reliably on top of such disks, with low overhead Trust in disk is weakening – Previously flushing was trusted Cloud computing systems – Can virtualized hardware be trusted? Will coercion be more widely used?
WSRR Thank you!
WSRR Evaluation Workloads which call fsync frequently perform poorly – Each fsync causes 4 flushes Filebench varmail writes a small amount of data and calls fsync repeatedly We did a number of optimizations to improve the situation: Transactional Checksum – removes 1 CCE Use optimized CCE instead of generic CCE Incorporate Group Commit in varmail
WSRR Evaluation Performance of Filebench Varmail after optimizations: