Byzantine fault tolerance

Slides:



Advertisements
Similar presentations
Replication techniques Primary-backup, RSM, Paxos Jinyang Li.
Advertisements

NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research.
1 Attested Append-Only Memory: Making Adversaries Stick to their Word Byung-Gon Chun (ICSI) October 15, 2007 Joint work with Petros Maniatis (Intel Research,
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Yee Jiun Song Cornell University. CS5410 Fall 2008.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Fault-tolerance techniques RSM, Paxos Jinyang Li.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
Attested Append-only Memory: Making Adversaries Stick to their Word Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
EEC 688 Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Byzantine Fault Tolerance CS 425: Distributed Systems Fall Material drived from slides by I. Gupta and N.Vaidya.
1 The Design of a Robust Peer-to-Peer System Gisik Kwon Dept. of Computer Science and Engineering Arizona State University Reference: SIGOPS European Workshop.
EEC 688/788 Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.
Byzantine fault tolerance
Practical Byzantine Fault Tolerance and Proactive Recovery
Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2012 Lecture 26 November 29, 2012 Presented By: Imranul Hoque 1.
CSE 486/586 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
EEC 688/788 Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Byzantine Fault Tolerance
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
BChain: High-Throughput BFT Protocols
Byzantine Fault Tolerance
Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 Benjamin Wester1, James Cowling2, Edmund B. Nightingale3, Peter.
Distributed Systems – Paxos
View Change Protocols and Reconfiguration
Byzantine Fault Tolerance
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
Providing Secure Storage on the Internet
Principles of Computer Security
Jacob Gardner & Chuan Guo
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
IS 651: Distributed Systems Byzantine Fault Tolerance
Active replication for fault tolerance
Fault-tolerance techniques RSM, Paxos
EEC 688/788 Secure and Dependable Computing
From Viewstamped Replication to BFT
IS 651: Distributed Systems Fault Tolerance
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
EEC 688/788 Secure and Dependable Computing
IS 651: Distributed Systems Final Exam
Replicated state machine and Paxos
EEC 688/788 Secure and Dependable Computing
Building Dependable Distributed Systems, Copyright Wenbing Zhao
View Change Protocols and Reconfiguration
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
The SMART Way to Migrate Replicated Stateful Services
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
Sisi Duan Assistant Professor Information Systems
Presentation transcript:

Byzantine fault tolerance Jinyang Li With PBFT slides from Liskov

What we’ve learnt so far: tolerate fail-stop failures Traditional RSM tolerates benign failures Node crashes Network partitions A RSM w/ 2f+1 replicas can tolerate f simultaneous crashes

A traditional RSM client N0: primary N1: backup N2: backup Req-x Reply-x Req-y vid=0, seqno=2,req-y vid=0, seqno=2,reply Reply-y N0: primary vid=0, seqno=1,reply vid=0, seqno=1,req-x N1: backup N2: backup What must N2 check before executing <vid=0, seqno=1, req>?

A reconfigurable RSM (lab 6) client Req-x Req-x N0: primary vid=1, seqno=2,req-x N1: backup vid=0, seqno=1,req-x Prepare vid=1, myh=N0:1 Accept vid=1, N0:1, {N0,N1}, lastseq=1 decide vid=1, N0:1, {N0,N1}, lastseq=1 N2: backup

Byzantine fault tolerance Nodes fail arbitrarily they lie they collude Causes Malicious attacks Software errors Seminal work is PBFT Practical Byzantine Fault Tolerance, M. Castro and B. Liskov, SOSP 1999

What does PBFT achieve? Achieve sequential consistency (linearizability) if … Tolerate f faults in a 3f+1-replica RSM What does that mean in a practical sense?

Practical attacks that PBFT prevents (or not) Prevent consistency attacks E.g. a bad node fools clients into accepting at a stale bank balance Protection is achieved only when <= f nodes fail With enough bad nodes, attacker can fool clients into accept arbitrary bank balance, not just a stale one. If an attacker hacks into one server, how likely is he to hack into the rest? Does not prevent attacks like: Turn a machine into a botnet node Steal SSNs from servers

Why doesn’t traditional RSM work with Byzantine nodes? Cannot rely on the primary to assign seqno Malicious primary can assign the same seqno to different requests! Cannot use Paxos for view change Paxos uses a majority accept-quorum to tolerate f benign faults out of 2f+1 nodes Does the intersection of two quorums always contain one honest node? Bad node tells different things to different quorums! E.g. tell N1 accept=val1 and tell N2 accept=val2

Paxos under Byzantine faults Prepare vid=1, myn=N0:1 OK val=null N2 N0 N1 nh=N0:1 nh=N0:1 Prepare vid=1, myn=N0:1 OK val=null

Paxos under Byzantine faults accept vid=1, myn=N0:1, val=xyz OK N2 X N0 N1 N0 decides on Vid1=xyz nh=N0:1 nh=N0:1

Paxos under Byzantine faults prepare vid=1, myn=N1:1, val=abc OK val=null N2 X N0 N1 N0 decides on Vid1=xyz nh=N0:1 nh=N0:1

Paxos under Byzantine faults accept vid=1, myn=N1:1, val=abc OK N2 X N0 N1 N0 decides on Vid1=xyz nh=N0:1 nh=N1:1 N1 decides on Vid1=abc Agreement conflict!

PBFT main ideas Static configuration (same 3f+1 nodes) To deal with malicious primary Use a 3-phase protocol to agree on sequence number To deal with loss of agreement Use a bigger quorum (2f+1 out of 3f+1 nodes) Need to authenticate communications

BFT requires a 2f+1 quorum out of 3f+1 nodes … … … … 1. State: 2. State: 3. State: 4. State: A A A Servers X write A write A write A write A Clients For liveness, the quorum size must be at most N - f

BFT Quorums 1. State: … 2. State: … 3. State: … 4. State: … A A B B B Servers X write B write B write B write B Clients For correctness, any two quorums must intersect at least one honest node: (N-f) + (N-f) - N >= f+1 N >= 3f+1

PBFT Strategy Primary runs the protocol in the normal case Replicas watch the primary and do a view change if it fails

Replica state A replica id i (between 0 and N-1) Replica 0, replica 1, … A view number v#, initially 0 Primary is the replica with id i = v# mod N A log of <op, seq#, status> entries Status = pre-prepared or prepared or committed 17

Normal Case Client sends request to primary or to all

Normal Case Primary sends pre-prepare message to all Pre-prepare contains <v#,seq#,op> Records operation in log as pre-prepared Keep in mind that primary might be malicious Send different seq# for the same op to different replicas Use a duplicate seq# for op

Normal Case Replicas check the pre-prepare and if it is ok: Record operation in log as pre-prepared Send prepare messages to all Prepare contains <i,v#,seq#,op> All to all communication

Normal Case Replicas wait for 2f+1 matching prepares Record operation in log as prepared Send commit message to all Commit contains <i,v#,seq#,op> Trust the group, not the individuals

Normal Case Replicas wait for 2f+1 matching commits Record operation in log as committed Execute the operation Send result to the client

Normal Case Client waits for f+1 matching replies

BFT Client Primary Replica 2 Replica 3 Replica 4 Request Pre-Prepare Commit Reply

View Change Replicas watch the primary Request a view change Commit point: when 2f+1 replicas have prepared

View Change Replicas watch the primary Request a view change send a do-viewchange request to all new primary requires 2f+1 requests sends new-view with this certificate Rest is similar

Additional Issues State transfer Checkpoints (garbage collection of the log) Selection of the primary Timing of view changes

Possible improvements Lower latency for writes (4 messages) Replicas respond at prepare Client waits for 2f+1 matching responses Fast reads (one round trip) Client sends to all; they respond immediately

BFT Performance Phase BFS-PK BFS NFS-sdt 1 25.4 0.7 0.6 2 1528.6 39.8 26.9 3 80.1 34.1 30.7 4 87.5 41.3 36.7 5 2935.1 265.4 237.1 total 4656.7 381.3 332.0 Table 2: Andrew 100: elapsed time in seconds M. Castro and B. Liskov, Proactive Recovery in a Byzantine-Fault-Tolerant System, OSDI 2000

PBFT inspires much follow-on work BASE: Using abstraction to improve fault tolerance, R. Rodrigo et al, SOSP 2001 R.Kotla and M. Dahlin, High Throughput Byzantine Fault tolerance. DSN 2004 J. Li and D. Mazieres, Beyond one-third faulty replicas in Byzantine fault tolerant systems, NSDI 07 Abd-El-Malek et al, Fault-scalable Byzantine fault-tolerant services, SOSP 05 J. Cowling et al, HQ replication: a hybrid quorum protocol for Byzantine Fault tolerance, OSDI 06 Zyzzyva: Speculative Byzantine fault tolerance SOSP 07 Tolerating Byzantine faults in database systems using commit barrier scheduling SOSP 07 Low-overhead Byzantine fault-tolerant storage SOSP 07 Attested append-only memory: making adversaries stick to their word SOSP 07

A2M’s main insights Main worry in PBFT is that malicious nodes lie differently to different replicas Decide on Seq1=req-x Decide on Seq1=req-y

A2M’s proposal Introduce a trusted abstraction: attested append-only-memory A2M properties: Trusted (Attacker can corrupt the RSM implementation, but not A2M itself) Prevent malicious nodes from making different lies to different replicas

A2M’s abstraction A2M implements a trusted log Append: append a value to log Lookup: lookup the value at position i End: lookup the value at the end of log Truncate: garbage collection old values Advance: skip positions in log

What A2M achieves Smaller quorum size for BFT Achieve correctness & liveness if <= f Byzantine faults with 2f+1 nodes Or Achieve correctness if <= 2f nodes fail. Achieve liveness if <=f nodes fail