Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell
Byzantine fault isolation \'biz- ə n- tēn folt ī-sə-'lā- shən\ n (2006) : methodology for designing a distributed system that can, under Byzantine failure, operate with application- defined partial correctness '' ˙ ' Farsite \'fär-sīt\ n (2000) : serverless distributed file system developed at Microsoft Research, designed to be scalable, strongly consistent, and secure despite running on an untrusted infrastructure of desktop PCs Definitions Byzantine fault \'biz- ə n- tēn folt\ n (1982) : a failure of a system component that produces arbitrary behavior ' ˙ ' BFI \ bē-ef-'ī\ n (2006) : Byzantine fault isolation '
Talk Outline Context – Farsite system Why BFT doesn’t scale Farsite’s use of multiple BFT groups The need for isolating Byzantine faults Formal system specification BFI in Farsite
Farsite System client server client server
Farsite System usersBFT group metadata clients – Metadata
usersBFT groupclients Using Byzantine agreement protocol, assign sequence numbers to messages Prepare-commit among 2 T + 1 servers T = tolerable faults R = count of replicas R > 3 T Deterministically update metadata Reply to client Farsite System– Metadata
The Cost of BFT Groups computation messages message delays
machine count throughput multiple idealtypicalflatBFT Throughput vs. Scale
Workload Sharing Workload clientserver
BFT at Scale
Multiple BFT Groups
Tree of BFT Groups
/ users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin
Delegation to New Group / users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin
Pathname Resolution / users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin /users/Alice/code/C#/bar
Machine Failures at Scale
Group Failures at Scale
System Failure at Scale
Quantitative Fault Analysis Example system –File system distributed among interacting BFT groups Simplifying assumptions –Files are partitioned evenly among BFT groups –Machine failures are independent Machine fault probability = Evaluate: operational fault rate –Probability that an operation on a randomly selected file exhibits a fault
Operational Faults vs. System Scale ,00010,000100,000 system scale (count of BFT groups) operational fault rate BFT 4, no BFIBFT 7, no BFIBFT 10, no BFI BFT 4, ideal BFIBFT 4, tree (4) BFIBFT 4, tree (16) BFI 10 – –2 10 –3 10 –4 10 –5 10 –6 10 –7 6 10 – 10 –6 3 10 –5
BFI versus no BFI
computation throughput reduction: messages 4 32 10 60% % 4-member BFT groups with BFI 10-member BFT groups without BFI
refinement BFI via Formal Specification state actions state semantic spec distributed system spec actions + faults ment + faults Improved! NEW
C++emacs tools src a.ha.cppa.exe Farsite Semantic Spec cl.exe open handlespending operations open read move / code bin a.obj
Farsite Distributed-System Spec
Farsite Refinement del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move / code bin a.obj
Actions are State Transitions / open handles pending operations a.cpp
Proving Refinement Inductively / open handles pending operations a.cpp
Refinement with Byzantine Faults del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move code bin a.obj /
Refinement with Byzantine Faults del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move / code bin a.obj
emacssrc a.ha.cppa.exe bin a.obj code Hello world,,)*&#()*&{ 1[9^^x **{ o [[ …. 2 %% {^ \-~-/ ^} " ",". { _ } / } ==_.:Y:. _=={ { _/ `--^--' \_} } / \ / \ / { ( ) y \ ! | | ! /,-.i~ ~i i~ ~i,-. (!!( V )!!) ^-'-'-^-'-'-^ Safety –A tainted file may have arbitrary contents and attributes –A tainted file may appear not linked into namespace –A tainted file may pretend not to have children it actually has –A tainted file may pretend to have children that do not exist –A tainted file may pretend another tainted file is a child or parent Liveness –Operations involving a tainted file may not complete Semantic Fault Specification C++ tools cl.exe / A tainted file may have arbitrary contents and attributes A tainted file may appear not linked into namespace A tainted file may pretend not to have children it actually has A tainted file may pretend to have children that do not exist A tainted file may pretend another tainted file is a child or parent Operations involving a tainted file may not complete foobar
Maintain redundant info across BFT group boundaries Augment messages with info that justifies correctness Ensure unambiguous chains of authority over data Carefully order messages and state updates for operations involving multiple BFT groups Distributed-System Improvements Maintain redundant info across BFT group boundaries Augment messages with info that justifies correctness Ensure unambiguous chains of authority over data Carefully order messages and state updates for operations involving multiple BFT groups
Summary of BFI Methodology Formally specify your system –Semantic spec: user’s view of system –Distributed-system spec: designer’s view of system –Refinement interprets distributed-system spec in semantic terms Modify distributed-system spec to express Byzantine faults Simultaneously –Strategically weaken semantic spec to describe faults –Improve distributed-system spec to quarantine faults Refinement lets you know when you are done
Conclusions BFT groups have negative throughput scaling Scalable systems can be built from multiple BFT groups System scale increases the probability of non-maskable Byzantine faults If faults are not isolated, a single faulty group can corrupt the entire system. BFI is a methodology for isolating Byzantine faults BFI uses formal system specification Improves fault tolerance without hurting throughput, unlike increasing BFT group size
Contact Information
Backup Slides
Semantic specification –1800 lines of TLA+ –114 definitions Distributed-system specification –11,500 lines of TLA+ –775 definitions Why so big? –Windows file-system semantics are complex –Scalability and strong consistency –Byzantine fault isolation Farsite Spec Stats