Download presentation
Presentation is loading. Please wait.
Published byNeal McCormick Modified over 9 years ago
1
Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya
2
Reading List L. Lamport, R. Shostak, M. Pease, “The Byzantine Generals Problem,” ACM ToPLaS 1982. M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance,” OSDI 1999. 2
3
Byzantine Generals Problem A sender wants to send message to n-1 other peers Fault-free nodes must agree Sender fault-free agree on its message Up to f failures
4
Byzantine Generals Problem A sender wants to send message to n-1 other peers Fault-free nodes must agree Sender fault-free agree on its message Up to f failures
5
S 3 2 1 v v v Byzantine Generals Algorithm 5 value v Faulty peer
6
S 3 2 1 v v v 6 value v v v Byzantine Generals Algorithm
7
S 3 2 1 v v v 7 value v v v ? ? Byzantine Generals Algorithm
8
S 3 2 1 v v v 8 value v v v v ? v ? Byzantine Generals Algorithm
9
3 2 v v v 9 value v x v v ? v ? [v,v,?] S 1 Byzantine Generals Algorithm
10
3 2 v v v 10 value v x v v ? v ? v v S 1 Majority vote results in correct result at good peers Byzantine Generals Algorithm
11
S 3 2 1 v w x 11 Faulty source Byzantine Generals Algorithm
12
S 3 2 v w x 12 w w 1 Byzantine Generals Algorithm
13
S 3 2 v w x 13 x w w v x v 1 Byzantine Generals Algorithm
14
S 3 2 v w x 14 x w w v x v 1 [v,w,x] Byzantine Generals Algorithm
15
S 3 2 v w x 15 x w w v x v 1 [v,w,x] Vote result identical at good peers Byzantine Generals Algorithm
16
Known Results Need 3f + 1 nodes to tolerate f failures Need Ω(n 2 ) messages in general 16
17
Ω(n 2 ) Message Complexity Each message at least 1 bit Ω(n 2 ) bits “communication complexity” to agree on just 1 bit value 17
18
Practical Byzantine Fault Tolerance Computer systems provide crucial services Computer systems fail – Crash-stop failure – Crash-recovery failure – Byzantine failure Example: natural disaster, malicious attack, hardware failure, software bug, etc. Need highly available service 18 Replicate to increase availability
19
Challenges 19 Request A Request B Client
20
Requirements All replicas must handle same requests despite failure. Replicas must handle requests in identical order despite failure. 20
21
Challenges 21 2: Request B 1: Request A Client
22
State Machine Replication 22 2: Request B 1: Request A 2: Request B 1: Request A 2: Request B 1: Request A 2: Request B 1: Request A Client How to assign sequence number to requests?
23
Primary Backup Mechanism 23 Client 2: Request B 1: Request A What if the primary is faulty? Agreeing on sequence number Agreeing on changing the primary (view change) What if the primary is faulty? Agreeing on sequence number Agreeing on changing the primary (view change) View 0
24
Normal Case Operation Three phase algorithm: – PRE-PREPARE picks order of requests – PREPARE ensures order within views – COMMIT ensures order across views Replicas remember messages in log Messages are authenticated – {.} σk denotes a message sent by k 24
25
Pre-prepare Phase Primary: Replica 0 Replica 1 Replica 2 Replica 3 Request: m {PRE-PREPARE, v, n, m} σ0 Fail 25
26
Prepare Phase Request: m PRE-PREPARE Primary: Replica 0 Replica 1 Replica 2 Replica 3 Fail Accepted PRE-PREPARE 26
27
Prepare Phase Request: m PRE-PREPARE Primary: Replica 0 Replica 1 Replica 2 Replica 3 Fail {PREPARE, v, n, D(m), 1} σ1 Accepted PRE-PREPARE 27
28
Prepare Phase Request: m PRE-PREPARE Primary: Replica 0 Replica 1 Replica 2 Replica 3 Fail {PREPARE, v, n, D(m), 1} σ1 Accepted PRE-PREPARE Collect PRE-PREPARE + 2f matching PREPARE 28
29
Commit Phase Request: m PRE-PREPARE Primary: Replica 0 Replica 1 Replica 2 Replica 3 Fail PREPARE {COMMIT, v, n, D(m)} σ2 29
30
Commit Phase (2) Request: m PRE-PREPARE Primary: Replica 0 Replica 1 Replica 2 Replica 3 Fail PREPARE COMMIT Collect 2f+1 matching COMMIT: execute and reply 30
31
View Change Provide liveness when primary fails – Timeouts trigger view changes – Select new primary (= view number mod 3f+1) Brief protocol – Replicas send VIEW-CHANGE message along with the requests they prepared so far – New primary collects 2f+1 VIEW-CHANGE messages – Constructs information about committed requests in previous views 31
32
View Change Safety Goal: No two different committed request with same sequence number across views Quorum for Committed Certificate (m, v, n) At least one correct replica has Prepared Certificate (m, v, n) At least one correct replica has Prepared Certificate (m, v, n) View Change Quorum View Change Quorum 32
33
Related Works Fault Tolerance Fail Stop Fault Tolerance Paxos 1989 (TR) VS Replication PODC 1988 Byzantine Fault Tolerance Byzantine Agreement Rampart TPDS 1995 SecureRing HICSS 1998 PBFT OSDI ‘99 BASE TOCS ‘03 Byzantine Quorums Malkhi-Reiter JDC 1998 Phalanx SRDS 1998 Fleet ToKDI ‘00 Q/U SOSP ‘05 Hybrid Quorum HQ Replication OSDI ‘06 33
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.