Copyright 2006 Koren & Krishna ECE655/ByzGen.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655.

Slides:



Advertisements
Similar presentations
Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Advertisements

+ The Byzantine Generals Problem Leslie Lamport, Robert Shostak and Marshall Pease Presenter: Jose Calvo-Villagran
Byzantine Generals. Outline r Byzantine generals problem.
The Byzantine Generals Problem Leslie Lamport, Robert Shostak and Marshall Pease Presenter: Phyo Thiha Date: 4/1/2008.
Agreement: Byzantine Generals UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau Paper: “The.
BASIC BUILDING BLOCKS -Harit Desai. Byzantine Generals Problem If a computer fails, –it behaves in a well defined manner A component always shows a zero.
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
The Byzantine Generals Problem Boon Thau Loo CS294-4.
The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Distributed Algorithms A1 Presented by: Anna Bendersky.
Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Byzantine Generals Problem: Solution using signed messages.
Byzantine Generals Problem Anthony Soo Kaim Ryan Chu Stephen Wu.
The Byzantine Generals Problem (M. Pease, R. Shostak, and L. Lamport) January 2011 Presentation by Avishay Tal.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
A Look at Byzantine Generals Problem R J Walters.
Distributed Computing Principles Keith Marzullo. 2 It’s all about distributed systems now…
The Byzantine Generals Problem L. Lamport R. Shostak M. Pease Presented by: Emmanuel Grumbach Raphael Unglik January 2004.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
The Byzantine Generals Problem Leslie Lamport Robert Shostak Marshall Pease.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
9/14/20151 Lecture 18: Distributed Agreement CSC 469H1F / CSC 2208H1F Fall 2007 Angela Demke Brown.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Practical Byzantine Fault Tolerance Jayesh V. Salvi
1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve.
1 The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Presented by Radu Handorean.
Byzantine Fault Tolerance in Stateful Web Service Yilei ZHANG 30/10/2009.
1 Resilience by Distributed Consensus : Byzantine Generals Problem Adapted from various sources by: T. K. Prasad, Professor Kno.e.sis : Ohio Center of.
The Byzantine General Problem Leslie Lamport, Robert Shostak, Marshall Pease.SRI International presented by Muyuan Wang.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
CSE 486/586 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Fault Tolerance Chapter 7. Topics Basic Concepts Failure Models Redundancy Agreement and Consensus Client Server Communication Group Communication and.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Behavior of Byzantine Algorithm Chun Zhang. Index Introduction Experimental Setup Behavior Observation Result Analysis Conclusion Future Work.
Distributed Agreement. Agreement Problems High-level goal: Processes in a distributed system reach agreement on a value Numerous problems can be cast.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
reaching agreement in the presence of faults
Synchronizing Processes
Coordination and Agreement
The OM(m) algorithm Recall what the oral message model is.
DC7: More Coordination Chapter 11 and 14.2
COMP28112 – Lecture 14 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 13-Oct-18 COMP28112.
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 19-Nov-18 COMP28112.
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Distributed Consensus
Jacob Gardner & Chuan Guo
Byzantine Generals Problem
Byzantine Faults definition and problem statement impossibility
Consensus in Synchronous Systems: Byzantine Generals Problem
The Byzantine Generals Problem
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 22-Feb-19 COMP28112.
Byzantine Generals Problem
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
Presentation transcript:

Copyright 2006 Koren & Krishna ECE655/ByzGen.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655 Byzantine Failures

Copyright 2006 Koren & Krishna ECE655/ByzGen.2 Failure Types  Fail-Stop: Fails by stopping. No output is produced.  Consistent Output: All users see the same wrong output.  Byzantine Failure: No consistency is guaranteed: different users may see different values of the output.

Copyright 2006 Koren & Krishna ECE655/ByzGen.3 Original Motivation  A computer system uses sensor inputs to control some process.  Is there a way to ensure that the output of a sensor is seen consistently by all the functional processors despite the sensor exhibiting Byzantine failure?  The problem was originally uncovered at NASA Langley and studied by contractors from SRI International in 1978.

Copyright 2006 Koren & Krishna ECE655/ByzGen.4 Byzantine Generals Problem  The Byzantine army is besieging a city, each division commanded by its own general. The commander-in-chief coordinates the divisions.  There could be traitors among these commanders.  The c-in-c has two alternatives:  Attack  Retreat  If the actions taken by each division is not consistent with that of the others, the army will be defeated.  Commands are sent in the form of two-party oral messages. Assume messages are not lost.

Copyright 2006 Koren & Krishna ECE655/ByzGen.5 Requirements for Success  The decision algorithm must satisfy the following conditions of interactive consistency:  IC1 All loyal generals must agree on the same plan of action.  IC2. If the c-in-c is loyal, the loyal generals must obey his order.  It is NOT an aim of the algorithm to identify the traitors.

Copyright 2006 Koren & Krishna ECE655/ByzGen.6 Traitors’ Aim  One or more of the generals (including the c-in-c) could be a traitor.  The aim of the traitors is to cause the Byzantine army to be defeated by violating the conditions for victory.

Copyright 2006 Koren & Krishna ECE655/ByzGen.7 Impossibility Result  Try to solve the problem with the c-in-c and two divisional generals, A and B.  Case 1. The c-in-c is a traitor: tells A to attack and B to retreat.  Case 2. The c-in-c is loyal but A is a traitor. He tells both A and B to attack; however, A tells B that his orders from the c-in-c are to retreat. What should B do?  Conclusion: The problem cannot be solved for a 3- node system with one Byzantine failure.

Copyright 2006 Koren & Krishna ECE655/ByzGen.8 Byzantine Generals Algorithm  General Approach:  The c-in-c sends his order to each of the divisional generals.  Each divisional general recursively uses the Byzantine Generals algorithm to disseminate to his colleagues the order he received from the c-in-c.  The key algorithm parameter is m, the maximum number of traitors to be allowed for.

Copyright 2006 Koren & Krishna ECE655/ByzGen.9 Algorithm OM(0)  Algorithm OM(0)  The c-in-c sends his order to each divisional general.  Each divisional general obeys the order he receives from the c-in-c.

Copyright 2006 Koren & Krishna ECE655/ByzGen.10 Algorithm OM(m)  Step 1. The c-in-c sends his order to each divisional general.  Step 2. Each divisional general uses OM(m-1) to disseminate the order he got from the c-in-c. At the end of this step, each divisional general has a vector containing: (a) The order he received from the c-in-c and (b) The order disseminated by every other divisional general.  Step 3. Each divisional general follows the majority decision from the vector obtained in Step 2.

Copyright 2006 Koren & Krishna ECE655/ByzGen.11 Claim  If the total number of generals is N>=3m+1, running OM(m) will ensure that the interactive consistency conditions are satisfied.  If the total number of generals is N<3m+1, no algorithm exists that can ensure the interactive consistency conditions are satisfied for this fault model.

Copyright 2006 Koren & Krishna ECE655/ByzGen.12 Proof of Correctness  Induction proof: induction basis of m=0 is obvious.  Suppose the result holds for up to m=M. Consider what happens if m=M+1. N>=3m+1 is a condition.  Case 1. The c-in-c is a traitor: There are up to m-1 traitors among the N-1 divisional generals. Now, use the induction hypothesis.  Case 2. The c-in-c is loyal: The c-in-c sent a consistent order to everyone. There are up to m traitors among the N-1 divisional generals.

Copyright 2006 Koren & Krishna ECE655/ByzGen.13 Modification: Signed Messages  Orders are sent by means of signed messages.  Assumptions about signed messages:  A loyal general’s signature cannot be forged.  Anyone can authenticate a general’s signature.

Copyright 2006 Koren & Krishna ECE655/ByzGen.14 Signed Messages Algorithm  Step 1. The c-in-c sends a signed order to the divisional generals.  Step 2. When a divisional general receives an order, he  Adds it to a vector of copies of this order, V.  If this order has fewer than m distinct signatures on it, he then »Adds his own signature to the order »Sends the order (augmented with his signature) to every divisional general who has not signed it.  Step 3. After all generals have been heard from (or a timeout has passed), decide on course of action. [If the c-in-c is unmasked as a traitor, use a pre-agreed default action.]