BChain: High-Throughput BFT Protocols

Slides:



Advertisements
Similar presentations
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
Advertisements

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Attested Append-Only Memory: Making Adversaries Stick to their Word Byung-Gon Chun (ICSI) October 15, 2007 Joint work with Petros Maniatis (Intel Research,
Yee Jiun Song Cornell University. CS5410 Fall 2008.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 688 Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Byzantine fault tolerance
Byzantine Fault Tolerance CS 425: Distributed Systems Fall Material drived from slides by I. Gupta and N.Vaidya.
BFT3W'091 Intrusion Tolerance: The Killer App for BFT (?) Alysson Bessani, Miguel Correia, Paulo Sousa, Nuno Ferreira Neves, Paulo Veríssimo Universidade.
Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
1 The Design of a Robust Peer-to-Peer System Gisik Kwon Dept. of Computer Science and Engineering Arizona State University Reference: SIGOPS European Workshop.
Project Presentation Students: Yan Michalevsky Asaf Cidon Supervisors: Alexander Shraer Assoc. Prof. Idit Keidar.
EEC 688/788 Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.
Practical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance Jayesh V. Salvi
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Byzantine fault tolerance
Practical Byzantine Fault Tolerance and Proactive Recovery
S-Paxos: Eliminating the Leader Bottleneck
Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2012 Lecture 26 November 29, 2012 Presented By: Imranul Hoque 1.
EEC 688/788 Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
Fault Tolerance
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
Privacy-Preserving and Fault-Tolerant
Byzantine Fault Tolerance
The consensus problem in distributed systems
Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 Benjamin Wester1, James Cowling2, Edmund B. Nightingale3, Peter.
Distributed Systems – Paxos
Alternative system models
Secure Causal Atomic Broadcast, Revisited
Principles of Computer Security
Principles of Computer Security
Byzantine Fault Tolerance
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
Providing Secure Storage on the Internet
Principles of Computer Security
Jacob Gardner & Chuan Guo
Replication Improves reliability Improves availability
EEC 688/788 Secure and Dependable Computing
IS 651: Distributed Systems Byzantine Fault Tolerance
PERSPECTIVES ON THE CAP THEOREM
EEC 688/788 Secure and Dependable Computing
From Viewstamped Replication to BFT
IS 651: Distributed Systems Fault Tolerance
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Fault-Tolerant State Machine Replication
IS 651: Distributed Systems Final Exam
Distributed Systems CS
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
The SMART Way to Migrate Replicated Stateful Services
EEC 688/788 Secure and Dependable Computing
Cryptography Lecture 24.
Blockchains Lecture 1.
Sisi Duan Assistant Professor Information Systems
Sisi Duan Assistant Professor Information Systems
Presentation transcript:

BChain: High-Throughput BFT Protocols Haibin Zhang (UConn) haibin.zhang@uconn.edu

Research Interests: Security and Reliability Connecting theory and practice Applied cryptography Systems and Distributed Systems Cloud storage and cloud computing

This Talk State Machine Replication (SMR) Crash Fault-Tolerant (CFT) Byzantine Fault-Tolerant (BFT) BChain: A high-throughput BFT SMR protocol

Client-Server Model client client Server/state machine client client Scenario 1: With a single server

Client-Server Model client client replicated servers/state machines Scenario 2: With replicated servers

State Machine Replication (SMR) Replicas maintain the same state Replicas start in the same state Operations are deterministic Replicas execute operations in the same order Replicas send replies to clients Clients vote on replica replies

Crash Fault-Tolerant SMR Example: Paxos: SMR for crash failures The “most” important backbone architecture Each major service BigTable, Chubby, Spanner, Azure, Amazon Web Services, Ceph, IBM SAN, VMware NSX, … [Lamport, ACM TOCS 1998]; going back to 1980s

BFT SMR = BFT Protocols Traditionally important Powerful: arbitrary failures & attacks Systems, distributed systems, theory, crypto, security, … Recently gain prominence Cryptocurrencies, or blockchains Secure and reliable cloud Pub/Sub, SDN, Storage …

Leader-Based (BFT) SMR Leader-based SMR Primary (one of the replicas) orders the operations Other replicas follow the order Other replicas monitor the primary and do a view change if primary fails/behaves maliciously

Leader-Based SMR: Broadcast- vs. Chain-Based Broadcast-based SMR CFT: e.g., Paxos BFT: e.g., PBFT Zyzzyva Reasonable performance + Robust against attacks Chain-based SMR CFT: Chain replication BFT: BChain (this talk) Better performance + No such protocols until BChain [Castro and Liskov, ACM TOCS 2002]; earlier version [OSDI 1999] [Kotla et al. SOSP 07] [Renesse and Schneider,OSDI 04] [Duan, Meling, Sean, and Zhang, OPODIS 2014]

Some Efforts towards Chain-Based BFT Aliph-Chain A sub-protocol of a BFT protocol Only works in the failure-free case Has to switch to a (slower) backup BFT protocol; The switch is slow Byzantine Chain Replication Relies on trusted data center Olympus (to help achieve liveness) [Aublin et al., ACM TOCS 2015]; earlier version [EuroSys 2010] [Renesse,Ho, and Schiper,OPODIS 12]

BChain Fully fledged BFT High throughput Failure handling Avoiding view changes (for most failure scenarios) Proactive security Reconfiguring failures Not as robust as broadcast-based BFT under certain performance attacks

BChain Overview: BChain-3 and BChain-5 Chaining Failure-free case Re-chaining Normal case: there are failures but primary is correct Reconfiguration May or may not need View Change Primary is faulty

BChain-3 3f+1 replicas Replicas are in a chain Two sets A: Agreement set (2f+1 replicas) B: Backup set – For failure reconfiguration

BChain-3: Chaining Free-free case Client sends a request to the head

BChain-3: Chaining Head assigns sequence number and sends <chain> message Replicas in A execute request and send <chain> message

BChain-3: Chaining Proxy tail sends <reply> to the client and commits An <ack> message is sent backward to the head Set A replicas verify <ack> message and commit Replicas that have committed the requests forward <chain> messages to set B

BChain-3: Re-chaining Normal case: during failures but head is correct Much faster than view change or protocol switch

BChain-3: Re-chaining Replica monitors its successor Sets up a timer when sending <chain> Suspects its successor if it did not receive <ack> in time When there are “reported” failures Re-chaining: Head reassigns the order of the chain Reconfiguration: Replicas in set B get reconfigured

BChain-3: Re-chaining Algorithm

BChain-3: Re-chaining Type I: Faulty replica (in yellow, replica 4) did not send <ack> (or <chain>) in time

BChain-3: Re-chaining Type II: Faulty replica (in yellow, replica 3) tries to frame its correct successor (replica 4).

BChain-3: Reconfiguration When replicas are moved to set B It is replaced with a new one Faulty replicas are reconfigured before they are moved back to set A Replicas in set A keeps running without waiting

BChain-3 Summary 3f+1 replicas Reconfiguration needed Re-chaining algorithms are simple Proofs are very complex

BChain-5 5f+1 replicas; Byzantine quorum is 3f+1 Re-chaining: When a replica suspects its successor, both are moved to set B No reconfiguration needed (but you still can!)

BChain Optimizations Almost all signatures can be replaced with MACs A hybrid protocol Signatures needed only for re-chaining In its most applicable case for BChain-3: f=1 and n=4 No reconfiguration needed All MACs

Implementation and Evaluation Failure-free: BChain is almost as efficient as Aliph-Chain

Implementation and Evaluation Under failures: BChain quickly recovers steady state performance

BFT Three most important security goals: Integrity (= Safety) Availability (= Liveness) Confidentiality BFT naturally achieves Integrity and Availability What about BFT with Confidentiality?

Cloud Storage BFT + Confidentiality? Achieving confidentiality in replicated state machines is difficult Why? replication increases reliability replication reduces confidentiality (by, say, taking control of the weakest replica)

CBFT and CP-BFT CBFT: confidential BFT [Yin et al., SOSP 03] CBFT: confidential BFT Separating agreement from execution CP-BFT: causality-preserving BFT A confidentiality notion that only exists in BFT/atomic broadcast/distributed systems [Duan and Zhang, SRDS 16] [Reiter and Birman, TOPLAS 94] [Reiter and Zhang, 2016]

CP-BFT Scenario A trading service that trades stocks. A client issues a request to purchase stock shares. A corrupt replica could collude with a corrupt client to issue a request for the same stock. If the new request is processed earlier than the original request, this may adjust the demand for the stock

MACS Related Projects UC Isolated Computation UC Keystone Secure and Fault-Tolerant Keystone and Swift

Thank you!