P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.

Slides:



Advertisements
Similar presentations
The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.
Advertisements

6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Systems Overview Ali Ghodsi
1 The Case for Byzantine Fault Detection. 2 Challenge: Byzantine faults Distributed systems are subject to a variety of failures and attacks Hacker break-in.
Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Timeliness, Failure Detectors, and Consensus Performance Alex Shraer Joint work with Dr. Idit Keidar Technion – Israel Institute of Technology In PODC.
Consensus Hao Li.
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
Byzantine Generals Problem: Solution using signed messages.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.
Yee Jiun Song Cornell University. CS5410 Fall 2008.
Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform)
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07.
Timeliness, Failure Detectors, and Consensus Performance Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
© 2006 Andreas Haeberlen, MPI-SWS 1 The Case for Byzantine Fault Detection Andreas Haeberlen MPI-SWS / Rice University Petr Kouznetsov MPI-SWS Peter Druschel.
1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.
1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.
Byzantine fault tolerance
BFTW 3 workshop (Sep 22, 2009)© 2009 Andreas Haeberlen 1 The Fault Detection Problem Andreas Haeberlen MPI-SWS Petr Kuznetsov TU Berlin / Deutsche Telekom.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
SysRép / 2.5A. SchiperEté The consensus problem.
PeerReview: Practical Accountability for Distributed Systems SOSP 07.
SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov.
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
© 2007 P. Kouznetsov On the Weakest Failure Detector Ever Petr Kouznetsov (Max Planck Institute for SWS) Joint work with: Rachid Guerraoui (EPFL) Maurice.
BChain: High-Throughput BFT Protocols
Agreement Protocols CS60002: Distributed Systems
Distributed Systems, Consensus and Replicated State Machines
Principles of Computer Security
Replication Improves reliability Improves availability
Presented By: Md Amjad Hossain
EEC 688/788 Secure and Dependable Computing
From Viewstamped Replication to BFT
Consensus, FLP, and Paxos
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Implementing Consistency -- Paxos
Distributed systems Consensus
Presentation transcript:

P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems

2 Why Distributed ≠ Centralized ?  Failures: a process can deviate from its specification  There are problems that cannot be solved fault-tolerant (even if just one process might fail)

3 Crash failures  Crash fault-tolerant consensus cannot be achieved in an asynchronous system [FLP85]  A process crashes = prematurely halts all its activities

4 Abstracting out crash failures  Failure detectors [Chandra and Toueg, 1996] Engineering side: can be specified and implemented independently of algorithms Theory side: can be used for comparing and classifying problems (the weakest failure detectors)

5 Using failure detectors Eventually strong FD <>S [Chandra and Toueg, 1996]: outputs a list of suspected processes. There is a time after which: every crashed process is suspected by every correct process some correct process is never suspected by any correct process  Consensus is solvable with <>S and a majority of correct processes

6 Using failure detectors, contd.  Abstracting out a majority assumption : Quorum failure detector Σ [DFG, 2004] : outputs a list of processes, called a quorum Every two quorums (output at any processes at any times) intersect There is a time after which every output quorum contain only correct processes

7 The weakest failure detector  <>S is necessary to solve consensus [CHT, 1996]  Σ is the weakest FD to implement a RW register [DFG, 2004] => (<>S, Σ) is the weakest FD to solve consensus

8 State machine replication [Lamport, 1984; Schneider, 1993;…] ClientsServers requests response

9 State machine replication Client: broadcast request to all servers wait until a response is received Server: repeat forever if there are unserved requests use consensus (<>S, Σ) to agree on the order in which the requests are served send the results of served requests to the clients

10 Useful abstractions  SMR (Totally ordered broadcast) = reliable broadcast + consensus [Toueg, Hadzilacos, 1993]  Consensus = (<>S, Σ)

11 Detectable Byzantine failures Crash Mute Ignorant Byzantine failures Detectable Byzantine

12 Byzantine failure detectors  BFDs are parameterized with the specification of the correct system behavior  The output of BFD depends solely on detectable failures: no information about steps performed by correct processes can be extracted (necessary to distinguish algorithms from BFDs)

13 Byzantine FD abstraction BFDAutomaton Ai Network Monitoring algorithms (Peerreview, HotDep 2006) Enforcing algorithms (SMR) Application

14 State machine replication: classics Client: broadcast requests to all servers wait until a response is received Server: repeat forever if there are unserved requests use consensus to agree on the order in which the requests are served send the results of served requests to the clients (!) a single malicious process can ignore correct requests and inject bogus requests

15 BFT state machine replication [Doudou et al, 2005] reliable broadcast + weak interactive consistency WIConsistency: every correct process proposes a value and decides on a set of values  the decided set contains at least one value proposed by a correct process  no two correct processes decide differently SMR can be implemented using RB and WIConsistency

16 The question  SMR = RB + WIConsistency?  No: (<>S B, Σ B ) can implement SMR but cannot implement WIConsistency => WIConsistency > SMR

17 <>S B [MR97,DS98,KMM03] Outputs a list of suspected to be mute processes. There is a time after which: every mute process is suspected by every correct process some correct process is never suspected by any correct process

18 Byzantine quorum FD Σ B Outputs a list of processes, called quorum Every two quorums (output at any two correct processes at any times) share at least one correct process There is a time after which every output quorum contain only correct processes

19 SMR using (<>S B, Σ B )  (<>S B, Σ B ) can be used to implement BFT replication system  Adaptation of BFT [Castro, Liskov, 1999]: wait until receive acks from 2f+1 processes => wait until receive acks from Σ B If the primary replica is timed-out then initiate a view change => If the primary replica is in <>S B then initiate a view change

20 WIConsistency using (<>S B, Σ B ) ? Assume an algorithm exists  Let processes in Q be correct and the rest crash initially  E: Q decide on V (set of values proposed by Q)  E’: an extension of E in which some pi not in Q decides V  E’’: an extension of E in which all processes in V are faulty and pi is correct => contradiction

21 Related work  State machine replication [Lamport 84, 89; Schneider, 1990; Doudou et al., 2005;…]  Failure detectors [Chandra, Toueg, 1991; Chandra et al., 1992; Delporte et al., 2003;…]  Byzantine quorum systems [Malkhi, Reiter, 1997]  Byzantine failure detection [MR97; DS98; KMM03; AMPR01; BAR, 2005; …]

22 Conclusions Byzantine FD abstraction does make sense!  BFT state machine replication using (<>S B, Σ B )  BFT SMR is strictly weaker than WIConsistency  Is the lower bound tight?  How to implement Byzantine FDs?

23 Monitoring: PeerReview [HKD06] BFD produces three types of indications for the application layer: trusted, suspected, and exposed. Completeness:  Eventually, every detectably ignorant node is forever suspected by every correct node  Eventually, every detectably malicious node is exposed by every correct node Accuracy:  No correct node is forever suspected by a correct node  No node is exposed by a correct node, unless it is detectably malicious

24 PeerReview approach  Nodes locally observe message traffic and classify other nodes as trusted, suspected, or exposed  Quick overview: Every node keeps a log of all its local inputs and outputs Use crypto techniques to ensure that log is accurate & linear Nodes can audit each others' log at any time To check for faulty behavior, auditors replay the contents of the log In case of misbehavior, produce evidence that can be verified independently by other nodes  Eventually complete and accurate! State machine (e.g. NFS) Application Network PeerReview detector {trusted, suspected, exposed}

25 Typical consensus algorithm repeat round++ c = round mod n if p=c then try to “lock” the current estimate help in locking until a decided value is received from c, or c is suspected by <>S until a decided value is received