Download presentation
Presentation is loading. Please wait.
Published byDaniela Alaina Adams Modified over 9 years ago
1
Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell
2
Byzantine fault isolation \'biz- ə n- tēn folt ī-sə-'lā- shən\ n (2006) : methodology for designing a distributed system that can, under Byzantine failure, operate with application- defined partial correctness '' ˙ ' Farsite \'fär-sīt\ n (2000) : serverless distributed file system developed at Microsoft Research, designed to be scalable, strongly consistent, and secure despite running on an untrusted infrastructure of desktop PCs Definitions Byzantine fault \'biz- ə n- tēn folt\ n (1982) : a failure of a system component that produces arbitrary behavior ' ˙ ' BFI \ bē-ef-'ī\ n (2006) : Byzantine fault isolation '
3
Talk Outline Context – Farsite system Why BFT doesn’t scale Farsite’s use of multiple BFT groups The need for isolating Byzantine faults Formal system specification BFI in Farsite
4
Farsite System client server client server
5
Farsite System usersBFT group metadata clients – Metadata
6
usersBFT groupclients Using Byzantine agreement protocol, assign sequence numbers to messages Prepare-commit among 2 T + 1 servers T = tolerable faults R = count of replicas R > 3 T Deterministically update metadata Reply to client Farsite System– Metadata
7
The Cost of BFT Groups computation messages message delays 1 2 2 4 32 5
8
0 1 2 3 4 5 6 7 1234567 machine count throughput multiple idealtypicalflatBFT Throughput vs. Scale
9
Workload Sharing Workload clientserver
10
BFT at Scale
11
Multiple BFT Groups
12
Tree of BFT Groups
13
/ users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin
14
Delegation to New Group / users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin
15
Pathname Resolution / users cruftemacs viOutlook public AliceBob docscode C++C# foobar Proj X srcbinsrcbin /users/Alice/code/C#/bar
16
Machine Failures at Scale
17
Group Failures at Scale
18
System Failure at Scale
19
Quantitative Fault Analysis Example system –File system distributed among interacting BFT groups Simplifying assumptions –Files are partitioned evenly among BFT groups –Machine failures are independent Machine fault probability = 0.001 Evaluate: operational fault rate –Probability that an operation on a randomly selected file exhibits a fault
20
Operational Faults vs. System Scale 1101001,00010,000100,000 system scale (count of BFT groups) operational fault rate BFT 4, no BFIBFT 7, no BFIBFT 10, no BFI BFT 4, ideal BFIBFT 4, tree (4) BFIBFT 4, tree (16) BFI 10 –1 10 0 –2 10 –3 10 –4 10 –5 10 –6 10 –7 6 10 –6 0.45 6 10 –6 3 10 –5
21
BFI versus no BFI
22
computation throughput reduction: messages 4 32 10 60% 200 84% 4-member BFT groups with BFI 10-member BFT groups without BFI
23
refinement BFI via Formal Specification state actions state semantic spec distributed system spec actions + faults ment + faults Improved! NEW
24
C++emacs tools src a.ha.cppa.exe Farsite Semantic Spec cl.exe open handlespending operations open read move / code bin a.obj
25
Farsite Distributed-System Spec
26
Farsite Refinement del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move / code bin a.obj
27
Actions are State Transitions / open handles pending operations a.cpp
28
Proving Refinement Inductively / open handles pending operations a.cpp
29
Refinement with Byzantine Faults del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move code bin a.obj /
30
Refinement with Byzantine Faults del C++emacs tools src a.ha.cppa.execl.exe open handlespending operations read move / code bin a.obj
31
emacssrc a.ha.cppa.exe bin a.obj code Hello world,,)*&#()*&{ 1[9^^x **{ o [[ …. 2 %% @@),.,. {^ \-~-/ ^} " ",". { _ } / } ==_.:Y:. _=={ { _/ `--^--' \_} } / \ / \ / { ( ) y \ ! | | ! /,-.i~ ~i i~ ~i,-. (!!( V )!!) ^-'-'-^-'-'-^ Safety –A tainted file may have arbitrary contents and attributes –A tainted file may appear not linked into namespace –A tainted file may pretend not to have children it actually has –A tainted file may pretend to have children that do not exist –A tainted file may pretend another tainted file is a child or parent Liveness –Operations involving a tainted file may not complete Semantic Fault Specification C++ tools cl.exe / A tainted file may have arbitrary contents and attributes A tainted file may appear not linked into namespace A tainted file may pretend not to have children it actually has A tainted file may pretend to have children that do not exist A tainted file may pretend another tainted file is a child or parent Operations involving a tainted file may not complete foobar
32
Maintain redundant info across BFT group boundaries Augment messages with info that justifies correctness Ensure unambiguous chains of authority over data Carefully order messages and state updates for operations involving multiple BFT groups Distributed-System Improvements Maintain redundant info across BFT group boundaries Augment messages with info that justifies correctness Ensure unambiguous chains of authority over data Carefully order messages and state updates for operations involving multiple BFT groups
33
Summary of BFI Methodology Formally specify your system –Semantic spec: user’s view of system –Distributed-system spec: designer’s view of system –Refinement interprets distributed-system spec in semantic terms Modify distributed-system spec to express Byzantine faults Simultaneously –Strategically weaken semantic spec to describe faults –Improve distributed-system spec to quarantine faults Refinement lets you know when you are done
34
Conclusions BFT groups have negative throughput scaling Scalable systems can be built from multiple BFT groups System scale increases the probability of non-maskable Byzantine faults If faults are not isolated, a single faulty group can corrupt the entire system. BFI is a methodology for isolating Byzantine faults BFI uses formal system specification Improves fault tolerance without hurting throughput, unlike increasing BFT group size
35
Contact Information JohnDo@microsoft.com Howell@microsoft.com http://research.microsoft.com/farsite
36
Backup Slides
37
Semantic specification –1800 lines of TLA+ –114 definitions Distributed-system specification –11,500 lines of TLA+ –775 definitions Why so big? –Windows file-system semantics are complex –Scalability and strong consistency –Byzantine fault isolation Farsite Spec Stats
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.