Download presentation
Presentation is loading. Please wait.
Published byMary Brummitt Modified over 9 years ago
1
RELIABILITY ANALYSIS OF ZFS CS 736 Project University of Wisconsin - Madison
2
Reliability Analysis of ZFS University of Wisconsin - Madison To perform reliability analysis of ZFS Test existing reliability claims Layered driver interface – simulating transient block corruptions at various levels in ZFS on-disk hierarchy. Results Classes of fault handled by ZFS. Measure of the robustness of ZFS. Lessons on building a reliable, robust file system. Summary
3
Coming Up University of Wisconsin - Madison ZFS Organization ZFS On Disk format ZFS features and specs regarding reliability. Experimental Setup and Experiments Results and Conclusions Future Work Outline of the talk
4
ZFS Organization University of Wisconsin - Madison Pooled Storage Model -Pooled Storage Model - Disk is a ZFS pool comprising of many file systems. ZFS Pool ZFS
5
ZFS Organization University of Wisconsin - Madison Transactional based object file system Every structure is an object. Operation on object(s) is a transaction. Grouping of transaction as transaction group. All data and metadata blocks are checksummed. No silent corruptions. Modifications are always Copy on Write Always on-disk consistent. All metadata and data(optional) is compressed. Object based
6
ZFS Structures University of Wisconsin - Madison Entire file system is represented as Objects - dnode_phys_t Object Sets - dnode_phys_t [ ] P/L analogy – each object is a template. The bonus buffer describes specific attributes.
7
ZFS Structures University of Wisconsin - Madison Data transferred to disks in terms of blocks. Block pointers (blkptr_t) used to locate, verify and describe blocks. Contains checksum and compression information. Physical size of block <> Logical Size of block Gang blocks Blocks and block pointers
8
ZFS Structures University of Wisconsin - Madison Data Virtual Address – combination of fields in blkptr_t to locate block on disk. Wideness – blkptr_t can store upto three copies of the data pointed by a unique DVA. These blocks are called as “ditto blocks”. Three for pool wide metadata Two for file system wide metadata One for data (configurable) Block pointers offset1 asize vdev1 asize vdev2 offset2 asize vdev3 offset3 Lvl typ cksum comp psize lsize
9
ZFS Structures University of Wisconsin - Madison Wideness
10
ZFS Structures University of Wisconsin - Madison ZAP (ZFS Attribute Processor) ZAP objects used to handle arbitrary (name, object) associations within an object set (objset) Most commonly used to implement directories Also used extensively throughout the DSL Attributes on disk
11
Putting it all together University of Wisconsin - Madison Everything in ZFS is an object. A dnode describes and organizes a collection of blocks making up an object. Objects
12
Putting it all together University of Wisconsin - Madison Group related objects to form objsets. Filesystems, volumes, clones and snapshots are objsets. Objects Object set Object Sets
13
Putting it all together University of Wisconsin - Madison Objects Object set Snapshot Information DataSet Encapsulates objset and provides Space usage Snapshot Information Space map DataSets
14
Putting it all together University of Wisconsin - Madison Objects Object set Snapshot Information DataSet Child Map Properties DataSet Directory Groups Datasets Properties such as quotas, compression Dataset Relationships Space map Dataset directories
15
A road less travelled University of Wisconsin - Madison From vdev label to data
16
To sum up University of Wisconsin - Madison Layers of indirection End to end Checksums which are separated from data. Wideness (Ditto Blocks) (3 – 2 – 1) Compression Copy on Write Scrub facility Moving forward
17
Experimental Setup Corruption Framework Corrupter Driver Modify physical disk blocks Analyzer App Understand on-disk ZFS structures Consumer App Monitor ZFS responses, error codes University of Wisconsin - Madison
18
Experimental Setup - Simplification Setup on Solaris 10 VM Only one physical vdev (disk) No striping, mirror, raid… Initial target – Pointer Corruption Reduced Sample Space Interesting Cases Disable compression as much as possible University of Wisconsin - Madison
19
Initial Finding All metadata compressed Cannot disable metadata compression Pointer Corruption not feasible Perform corruptions on compressed objects Representative of effects of disk faults on ZFS University of Wisconsin - Madison
20
Corruption Experiments TYPE: Type-aware Object Corruptions TARGET (Targeted On-Disk Objects) Vdev labels [@Pool] Uberblocks [@Pool] Object sets Meta Object Set [@Pool] objset_phys_t (describing object set) Object array Myfs Object Set [@FS] objset_phys_t Indirect blkptr objects Object array ZIL [@FS] File Data [@FS] Directory Data [@FS] University of Wisconsin - Madison
21
Results DetectionRecoveryCorrection vdev labelYES/ChecksumYES/ReplicaNO/COW uberblockYES/ChecksumYES/ReplicaNO/COW MOS ObjectYES/ChecksumYES/ReplicaNO/COW MOS Object SetYES/ChecksumYES/ReplicaNO/COW FS ObjectYES/ChecksumYES/ReplicaNO/COW FS Indirect ObjectsYES/ChecksumYES/ReplicaNO/COW FS Object SetYES/ChecksumYES/ReplicaNO/COW ZILYES/ChecksumNO Directory DataYES/ChecksumNO/Configurable File DataYES/ChecksumNO/Configurable University of Wisconsin - Madison
22
Summary (using IRON Taxonomy) Detection Checksums in parent blkptrs Recovery Replication in parent blkptrs (ditto blocks) University of Wisconsin - Madison
23
Conclusion Integration of File System and Volume Manager Saves an additional translation Use of one generic pointer block for checksums and replication Merkel tree provides Robustness Use of replication/compression in commodity file system viable COW can be used effectively University of Wisconsin - Madison
24
Observations/Questions No correction of ditto blocks: relies on COW Consecutive (n=wideness) failures without transaction group commit ?? Snapshot corruption ?? Explicit scrubbing corrects ditto blocks in-place Potential for corruption ?? Space/ Performance hit due to redundancy/compression 2% hit in terms of space/IO ?? (Banham & Nash) No Page Cache, uses ARC University of Wisconsin - Madison
25
Future Work Snapshot corruptions Multiple device configuration Striping Mirror RAID-Z University of Wisconsin - Madison
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.