Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building Dependable Distributed Systems, Copyright Wenbing Zhao

Similar presentations


Presentation on theme: "Building Dependable Distributed Systems, Copyright Wenbing Zhao"— Presentation transcript:

1 Building Dependable Distributed Systems, Copyright Wenbing Zhao
Chapter 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems, Copyright Wenbing Zhao

2 Outline Zyzzyva Byzantine general problem
By Leslie Lamport, Robert Shostak, & Marshall Pease Practical Byzantine fault tolerance By Miguel Castro and Barbara Liskov, OSDI’99 Zyzzyva 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 2

3 The Byzantine Generals Problem
Abstract model of a computer system that may have faulty components Faulty components may send conflicting information to different parts of the system Scenario where Byzantine Generals must reach agreement in the presence of traitors Generals must reach consensus among themselves => they all agree on an action: retreat or attack 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 3

4 The Byzantine Generals Scenario
Commanding General General Lieutenant General General Byzantine Army Division Byzantine Army Division Enemy City Ask: why we insist on each lieutenant general to share what it has received from the commanding general to others? Lieutenant General General Traitorous General General Lieutenant General Byzantine Army Division Byzantine Army Division 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 4

5 Byzantine Generals Problem
A commanding general must send an order to his n-1 lieutenants such that IC1. All loyal lieutenants obey the same order IC2. If the commanding general is loyal, then every loyal lieutenant obeys the order he sends IC1 = Agreement clause IC2 = Validity clause IC1 and IC2 are called interactive consistency Conditions Without IC2, we have a trivial solution: all lieutenants decide on attack or retreat 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 5

6 Byzantine Agreement Protocol
Assumption: Every message that is sent is delivered correctly Traitors cannot interfere with messages they do not sent The receiver of a message knows who sent it Traitors cannot spoof messages The absence of a message can be detected Traitors cannot prevent an agreement by not sending => Synchronous system + no spoofing 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

7 Byzantine Agreement Protocol (f=1)
Round 1: the commander sends a value to each of the lieutenants Round 2: each of the lieutenants sends the value it received to its peers At the end of round 2, each lieutenant check to see if there is a majority opinion (attack or retreat). We have a solution if there is Question is: how many generals needed to tolerate f number of traitors? Note: f>=2 cases are much more complicated and expensive (Oral message algorithms) Why synchronous? We assume each round every message reaches the target reliably and in time 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 7

8 Unsolvable Situations – N=3, f=1
Commander Attack Attack He said Retreat lieutenant lieutenant A lieutenant cannot be sure if the commander or the other lieutenant is lying. Commander Attack Retreat lieutenant He said Retreat lieutenant 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 8

9 Byzantine Agreement Protocol (f=1)
Commander Attack Retreat lieutenant He said Retreat lieutenant He said Attack He said Attack He said Attack Attack He said Retreat He said Attack lieutenant He said Attack If there are f traitors, then there must be at least 3f + 1 total generals for IC1 and IC2 to hold 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

10 Byzantine Agreement Protocol (f=1)
Under our assumption, if message digital signature is used and assuming the signature cannot be forged, we need only N=2f+1 to tolerate f traitors The commander still can send different information to different lieutenant, but a lieutenant cannot lie about what the commander has told him In asynchronous systems, N=2f+1 is not sufficient We have to stop after collecting f+1 input because the f faulty traitors could simply refrain from sending Unfortunately there might be f inputs from traitors 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

11 Introduction to BFT Paper
The growing reliance of industry and government on online information services Malicious attacks become more serious and successful More software errors due to increased size and complexity of software This paper presents “practical” algorithm for state machine replication that works in asynchronous systems like the Internet 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

12 Assumptions Asynchronous distributed system The network may fail to deliver, delay, duplicate or deliver them out of order Faulty nodes may behave arbitrarily Independent node failures The adversary cannot delay correct nodes indefinitely All messages are cryptographically signed by their sender and these signatures cannot be subverted by the adversary 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

13 Service Properties A (deterministic) service is replicated among ≥ 3f+1 processors. Resilient to ≤ f failures Safety: All non-faulty replicas guaranteed to process the same requests in the same order Liveness: Clients eventually receive replies to their requests 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

14 Optimal Resiliency Imagine non-faulty processors trying to agree upon a piece of data by telling each other what they believe the data to be A non-faulty processor must be sure about a piece of data before it can proceed f replicas may refuse to send messages, so each processor must be ready to proceed after having received (n-1)-f messages Total of n-1 other replicas 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

15 Optimal Resiliency But what if f of the (n-1)-f messages come from faulty replicas? To avoid confusion, the majority of messages must come from non-faulty nodes, i.e, (n-f-1)/2 ≥ f => Need a total of ≥3f+1 replicas 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

16 BFT Algorithm in a Nutshell
Backup f + 1 Match (OK) Client Primary Backup Backup 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

17 Replicas and Views ……… Set of replicas (R): |R| ≥ 3f + 1 R0 R0 R0 R1
For view v: primary p is assigned such that p= v mod |R| 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

18 Safeguards If the client does not receive replies soon enough, it broadcasts the request to all replicas If the request has already been processed, the replicas simply re-send the reply (replicas remember the last reply message they sent to each client) If the primary does not multicast the request to the group, it will eventually be suspected to be faulty by enough replicas to cause a view change 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

19 Normal Case Operation {REQUEST, o, t, c} Client Primary o – Operation
t – Timestamp c - Client Timestamps are totally ordered such that later requests have higher timestamps than earlier ones 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

20 Normal Case Operation Primary p receives a client request m , it starts a three-phase protocol Three phases are: pre-prepare, prepare, commit 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

21 Pre-Prepare Phase Backup <<PRE-PREPARE, v, n, d> , m>
Primary <<PRE-PREPARE, v, n, d> , m> Backup Note that d is the digest of the message m (i.e., hash value of m). It is NOT the digital signature for the pre-prepare message!!!! The digital signature is not explicitly included in the message shown in the illustration here and later!!! v – view number n – sequence number d – digest of the message D(m) m – message Backup 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 21

22 Prepare Phase A backup accepts the PRE-PREPARE message only if:
The signatures are valid and the digest matches m It is in view v It has not accepted a PRE-PREPARE for the same v and n Sequence number is within accepted bounds 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

23 Prepare Phase <PREPARE, v, n, d, i> Signatures are correct
If backup i accepts the pre-prepare message it enters prepare phase by multicasting <PREPARE, v, n, d, i> to all other replicas and adds both messages to its log Otherwise does nothing Replica (including primary) accepts prepare message and adds them to its log, provided that Signatures are correct View numbers match the current view Sequence number is within accepted bounds d is message digest! NOT digital signature! 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 23

24 Prepare Phase At replica i, prepared (m, v, n, i) = true,
iff 2f PREPARE from different backups that match the pre-prepare When prepared = true, replica i multicasts <COMMIT, v, n, d , i> to other replicas 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

25 Agreement Achieved If primary is non-faulty then all 2f+1 non-faulty replicas agree on the sequence number If primary is faulty Either ≥f+1 non-faulty replicas (majority) agree on some other sequence and the rest realize that the primary is faulty Or, all non-faulty replicas will suspect the primary is faulty When a faulty primary is replaced, the minority of confused non-faulty replicas are brought up to date up by the majority 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

26 Commit Phase Replicas accept commit messages and insert them in their log provided signatures are same Define committed and committed-local predicates as Committed (m, v, n) = true, iff prepared (m, v, n, i) is true for all i in some set of f+1 non-faulty replicas Committed-local (m, v, n, i) = true iff the replica has accepted 2f+1 commit message from different replicas that match the pre-prepare for m If Committed-local (m,v,n,i) is true for some non-faulty replica i, then committed (m,v,n) is true 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

27 Commit Phase Replica i executes the operation requested by m after committed-local (m, v, n, i) = true and i’s state reflects the sequential execution of all requests with lower sequence numbers The PRE-PREPARE and PREPARE phases of the protocol ensure agreement on the total order of requests within a view The PREPARE and COMMIT phases ensure total ordering across views 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

28 Normal Operation Reply
All replicas sends the reply <REPLY, v, t, c, i, r>, directly to the client v = current view number t = timestamp of the corresponding request i = replica number r = result of executing the requested operation c = client id Client waits for f+1 replies with valid signatures from different replicas, and with same t and r, before accepting the result r 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

29 Normal Case Operation: Summery
Request Pre-prepare Prepare Commit Reply C Primary: 0 1 2 Faulty: 3 X 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

30 Garbage Collection Used to discard messages from the log
For the safety condition to hold, messages must be kept in a replica’s log until it knows that the requests have been executed by at least f+1 non-faulty replicas Achieved using a checkpoint, which occur when a request with sequence number (n) is divisible by some constant is executed May not get to this 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 30

31 Garbage Collection When a replica i produces a checkpoint it multicasts a message <CHECKPOINT, n, d, i> to other replicas Each replica collects checkpoint messages in its log until it has 2f+1 of them for sequence number n with same digest d This creates a stable checkpoint and the replica discards all the pre-prepare, prepare and commit messages n is the sequence number of the last request whose execution is reflected in the state d is digest of the checkpoint 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 31

32 View Changes Triggered by timeouts that prevent backups from waiting indefinitely for request to execute If the timer of backup expires in view v, the backup starts a view change to move to view v+1 by, Not accepting messages (other than checkpoint, view-change, and new-view messages) Multicasting a VIEW-CHANGE message What prompts a view change? When does the tiimer start? Stop? 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 32

33 View Changes VIEW-CHANGE message is defined as
<VIEW-CHANGE, v+1, n, C, P, i> where, C = 2f + 1 checkpoint messages P = set of sets Pm Pm = a PRE-PREPARE msg + all PREPARE messages for all messages with committed = false Here n is the sequence number of the last stable checkpoint s known to i, 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 33

34 View Change - Primary Primary p of view v+1 receives 2f valid VIEW-CHANGE messages Multicasts a <NEW-VIEW, v+ 1, V, O> message to all other replicas where V = set of 2f valid VIEW-CHANGE messages O = set of reissued PRE-PREPARE messages Moves to view v+1 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

35 View Changes - Backups Accepts NEW-VIEW by checking V and O
Sends PREPARE messages for everything in O These PREPARE messages carry view v+1 Moves to view v+1 The primary creates a new pre-prepare message for view v+1 for each sequence number between min-s and max-s. There are two cases: (1) there is at least one set in the P component of some view-change message in V with sequence number n, or (2) there is no such set. In the first case, the primary creates a new message <PRE-PREPARE,v+1,n,d>, where d is the request digest in the pre-prepare message for sequence number with the highest view number in V. In the second case, it creates a new preprepare message <PRE-PREPARE,v+1,d,d^null>, where d^null is the digest of a special null request; a null request goes through the protocol like other requests, but its execution is a no-op. 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 35

36 Events Before the View Change
Before the view change we have two groups of non-faulty replicas: the Confused minority and the Agreed majority A non-faulty replica becomes Confused when it is kept by the faulty's from agreeing on a sequence number for a request It can't process this request and so it will time out, causing the replica to vote for a new view 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 36

37 Events Before the View Change
The minority Confused replicas send a VIEW-CHANGE message and drop off the network The majority Agreed replicas continue working as long as the faulty's help with agreement The two groups can go out of synch but the majority keeps working until the faulty's cease helping with agreement 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 37

38 System State: Faulty Primary
Is Erroneous View Change Possible? System State Confused Minority ≤f non-faulty replicas Agreed Majority ≥f+1 non-faulty replicas Agreed Majority ≥f+1 non-faulty replicas Confused Minority ≤f non-faulty replicas Adversary f non-faulty replicas P Adversary f non-faulty replicas P f faulty replicas f faulty replicas ≤2f replicas: NOT enough to change views 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 38

39 Events Before the View Change
Given ≥f+1 non-faulty replicas that are trying to agree, the faulty replicas can either help that or hinder that If they help, then agreement on request ordering is achieved and the clients get ≥f+1 matching replies for all requests with the faulty's help If they hinder, then the ≥f+1 non-faulty's will time out and demand for a new view When the new majority is in favor of a view change, we can proceed to the new view Can we have a confused majority while the remaining minority correct replicas keep working? No because they have no way to get 2f+1 matching messages to do anything 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 39

40 System State: Faulty Primary
Is it possible to continue processing requests? System State Confused Minority ≤f non-faulty replicas Confused Minority ≤f non-faulty replicas Agreed Majority ≥f+1 non-faulty replicas Agreed Majority ≥f+1 non-faulty replicas Adversary P Adversary f non-faulty replicas P f faulty replicas f faulty replicas YES ≥2f+1 replicas: enough for agreement 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 40

41 System State: Faulty Primary
Majority now large enough to independently move to a new view Confused Minority ≤f non-faulty replicas Agreed Majority ≥f+1 non-faulty replicas Adversary f non-faulty replicas YES ≥2f+1 replicas: enough for agreement Confused Majority 2f+1 non-faulty replicas Enough to agree to change views P Adversary f non-faulty replicas P f faulty replicas f faulty replicas Faulty replicas cease helping with agreement 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao 41

42 Liveness Replicas must move to a new view if they are unable to execute a request To avoid starting a view change too soon, a replica that multicasts a view-change message for view v+1, waits for 2f+1 view-change messages and then starts the timer T If the timer T expires before receiving new-view message it starts the view change for view v+2 The timer will wait 2T before starting a view-change from v+2 to v+3 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

43 Liveness If a replica receives f+1 valid view-change messages from other replicas for views greater than its current view, it sends a view-change message for the smallest view in the set, even if T has not expired Faulty replicas cannot cause a view-change by sending a view-change message since a view-change will happen only if at least f+1 replicas send view-change message The above techniques guarantee liveness, unless message delays grow faster than the timeout period indefinitely 4/4/2019 EEC688/788: Secure & Dependable Computing Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

44 Zyzzyva: Speculative BFT
A replica speculatively executes a request as soon as it receives a valid pre-prepare msg Commitment of a request is moved to the client A request is said to have completed at hthe issuing client if the corresponding reply can be safely delivered If a request completes at a client, the request will eventually be committed at the server replicas Prepare and commit phases are reduced to a single phase View change has one more additional phase Building Dependable Distributed Systems, Copyright Wenbing Zhao

45 Zyzzyva: Speculative BFT
History hash: helps client determine if its request has been ordered appropriately Server replica maintains a history hash for each request ordered and appends the history hash hs = H(hs-1,ds) to the reply for the request with sequence number s ds: digest for the request Previous hash: hs-1, new hash with request s executed: hs H(): hash function Prefix concept: his is a prefix of hsj if sj > si, and there exist a set of requests with sequence numbers si+1, si+2, …, sj-1 with digests dsi+1, dsi+2,…,dsj-1 such that hsi+1=H(hsi, dsi+1), hsi+2 = H(hsi+1,dsi+2),…,hsj=H(hsj-1,dsj) Building Dependable Distributed Systems, Copyright Wenbing Zhao

46 Zyzzyva: Speculative BFT
Safety: given any two requests that have completed, they must have been assigned two different sequence numbers. Furthermore, if the two sequence numbers are i and j an i < j, the history hash hi must be a prefix of hj Liveness: if a nonfaulty client issues a request, the request eventually completes Building Dependable Distributed Systems, Copyright Wenbing Zhao

47 Zyzzyva: Agreement Protocol
A client maintains a complete timer for each request A request may complete at the client in one of two cases: Case 1: client receives 3f+1 matching replies => all replicas have executed the request in the same total order Case 2: client receives at least 2f+1 matching replies when timer expires. The client then initiates another round of msg exchange with the server replicas before the request is declared as complete Building Dependable Distributed Systems, Copyright Wenbing Zhao

48 Zyzzyva: Agreement Protocol, Case 1
Building Dependable Distributed Systems, Copyright Wenbing Zhao

49 Zyzzyva: Agreement Protocol, Case 2
Building Dependable Distributed Systems, Copyright Wenbing Zhao

50 Building Dependable Distributed Systems, Copyright Wenbing Zhao
Zyzzyva: Case 2 Commit msg: contains a commit certificate: A list of 2f+1 replica ids Signed component of spec-response from each of the 2f+1 replicas Local-commit: a replica sends a local-commit when it receives a valid commit msg from client Verify history hash When the client receives 2f+1 or more local-commit, the request is completed and can be delivered Building Dependable Distributed Systems, Copyright Wenbing Zhao

51 Zyzzyva: View Change Protocol
When a view change is triggered Sufficient number of backups timed out the current primary When the client receives spec-response msgs with different sequence numbers or different history hash values In the 2nd case, client broadcasts a POM msg to all replicas Replica initiates a view change A replica also rebroadcasts POM upon receiving one => to speed up view change Building Dependable Distributed Systems, Copyright Wenbing Zhao

52 Zyzzyva: View Change Protocol
What is special in Zyzzyva At most one round msg exchange during normal operation for agreement => equivalent to prepare phase (or commit phase), for case 2. Replicas would have commit certificate For case 1, replicas would not possess a commit certificate Impact on view change: Need an additional round msg exchange – “I hate the primary” Need to change condition on including a request in new-view msg Cannot use PBFT view change protocol: may lose liveness Faulty primary could force f nonfaulty replicas to suspect itself, but cooperate with other f+1 nonfaulty replicas f faulty replicas stops sending spec-response Client cannot complete any request: only receive f+1 matching replies Building Dependable Distributed Systems, Copyright Wenbing Zhao

53 Zyzzyva: View Change Protocol
PBFT view change A replica abandons the current view as soon as it suspects the primary: stops participating agreement for current view Zyzzyva view change Make sure a view change will actually take place before abandoning the current view Accomplished by using the “I hate primary” msg exchange: a replica abandons the current view only when it receives f+1 “I hate primary” msgs Building Dependable Distributed Systems, Copyright Wenbing Zhao

54 Zyzzyva: View Change Protocol
Zyzzyva view change: deal with case 1 A replica includes all order-req msgs received since latest stable checkpoint, or the most recent commit certificate New primary adopts request-seq# binding if f+1 or more matching order-req msgs The primary may see multiple sets for same seq# (for different requests): can select any of the bindings because none of them could have been completed A backup should accept the decision by the new primary for the ordering Building Dependable Distributed Systems, Copyright Wenbing Zhao


Download ppt "Building Dependable Distributed Systems, Copyright Wenbing Zhao"

Similar presentations


Ads by Google