Download presentation
Presentation is loading. Please wait.
Published byCaitlyn Daggett Modified over 9 years ago
1
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems
2
2 Why Distributed ≠ Centralized ? Failures: a process can deviate from its specification There are problems that cannot be solved fault-tolerant (even if just one process might fail)
3
3 Crash failures Crash fault-tolerant consensus cannot be achieved in an asynchronous system [FLP85] A process crashes = prematurely halts all its activities
4
4 Abstracting out crash failures Failure detectors [Chandra and Toueg, 1996] Engineering side: can be specified and implemented independently of algorithms Theory side: can be used for comparing and classifying problems (the weakest failure detectors)
5
5 Using failure detectors Eventually strong FD <>S [Chandra and Toueg, 1996]: outputs a list of suspected processes. There is a time after which: every crashed process is suspected by every correct process some correct process is never suspected by any correct process Consensus is solvable with <>S and a majority of correct processes
6
6 Using failure detectors, contd. Abstracting out a majority assumption : Quorum failure detector Σ [DFG, 2004] : outputs a list of processes, called a quorum Every two quorums (output at any processes at any times) intersect There is a time after which every output quorum contain only correct processes
7
7 The weakest failure detector <>S is necessary to solve consensus [CHT, 1996] Σ is the weakest FD to implement a RW register [DFG, 2004] => (<>S, Σ) is the weakest FD to solve consensus
8
8 State machine replication [Lamport, 1984; Schneider, 1993;…] ClientsServers requests response
9
9 State machine replication Client: broadcast request to all servers wait until a response is received Server: repeat forever if there are unserved requests use consensus (<>S, Σ) to agree on the order in which the requests are served send the results of served requests to the clients
10
10 Useful abstractions SMR (Totally ordered broadcast) = reliable broadcast + consensus [Toueg, Hadzilacos, 1993] Consensus = (<>S, Σ)
11
11 Detectable Byzantine failures Crash Mute Ignorant Byzantine failures Detectable Byzantine
12
12 Byzantine failure detectors BFDs are parameterized with the specification of the correct system behavior The output of BFD depends solely on detectable failures: no information about steps performed by correct processes can be extracted (necessary to distinguish algorithms from BFDs)
13
13 Byzantine FD abstraction BFDAutomaton Ai Network Monitoring algorithms (Peerreview, HotDep 2006) Enforcing algorithms (SMR) Application
14
14 State machine replication: classics Client: broadcast requests to all servers wait until a response is received Server: repeat forever if there are unserved requests use consensus to agree on the order in which the requests are served send the results of served requests to the clients (!) a single malicious process can ignore correct requests and inject bogus requests
15
15 BFT state machine replication [Doudou et al, 2005] reliable broadcast + weak interactive consistency WIConsistency: every correct process proposes a value and decides on a set of values the decided set contains at least one value proposed by a correct process no two correct processes decide differently SMR can be implemented using RB and WIConsistency
16
16 The question SMR = RB + WIConsistency? No: (<>S B, Σ B ) can implement SMR but cannot implement WIConsistency => WIConsistency > SMR
17
17 <>S B [MR97,DS98,KMM03] Outputs a list of suspected to be mute processes. There is a time after which: every mute process is suspected by every correct process some correct process is never suspected by any correct process
18
18 Byzantine quorum FD Σ B Outputs a list of processes, called quorum Every two quorums (output at any two correct processes at any times) share at least one correct process There is a time after which every output quorum contain only correct processes
19
19 SMR using (<>S B, Σ B ) (<>S B, Σ B ) can be used to implement BFT replication system Adaptation of BFT [Castro, Liskov, 1999]: wait until receive acks from 2f+1 processes => wait until receive acks from Σ B If the primary replica is timed-out then initiate a view change => If the primary replica is in <>S B then initiate a view change
20
20 WIConsistency using (<>S B, Σ B ) ? Assume an algorithm exists Let processes in Q be correct and the rest crash initially E: Q decide on V (set of values proposed by Q) E’: an extension of E in which some pi not in Q decides V E’’: an extension of E in which all processes in V are faulty and pi is correct => contradiction
21
21 Related work State machine replication [Lamport 84, 89; Schneider, 1990; Doudou et al., 2005;…] Failure detectors [Chandra, Toueg, 1991; Chandra et al., 1992; Delporte et al., 2003;…] Byzantine quorum systems [Malkhi, Reiter, 1997] Byzantine failure detection [MR97; DS98; KMM03; AMPR01; BAR, 2005; …]
22
22 Conclusions Byzantine FD abstraction does make sense! BFT state machine replication using (<>S B, Σ B ) BFT SMR is strictly weaker than WIConsistency Is the lower bound tight? How to implement Byzantine FDs?
23
23 Monitoring: PeerReview [HKD06] BFD produces three types of indications for the application layer: trusted, suspected, and exposed. Completeness: Eventually, every detectably ignorant node is forever suspected by every correct node Eventually, every detectably malicious node is exposed by every correct node Accuracy: No correct node is forever suspected by a correct node No node is exposed by a correct node, unless it is detectably malicious
24
24 PeerReview approach Nodes locally observe message traffic and classify other nodes as trusted, suspected, or exposed Quick overview: Every node keeps a log of all its local inputs and outputs Use crypto techniques to ensure that log is accurate & linear Nodes can audit each others' log at any time To check for faulty behavior, auditors replay the contents of the log In case of misbehavior, produce evidence that can be verified independently by other nodes Eventually complete and accurate! State machine (e.g. NFS) Application Network PeerReview detector {trusted, suspected, exposed}
25
25 Typical consensus algorithm repeat round++ c = round mod n if p=c then try to “lock” the current estimate help in locking until a decided value is received from c, or c is suspected by <>S until a decided value is received
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.