Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Systems Overview Ali Ghodsi
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Byzantine Generals Problem: Solution using signed messages.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.
1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform)
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 10: SMR with Paxos.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Aran Bergman Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 Principles of Reliable Distributed Systems Lecture 1: Introduction.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Agreement Protocols CS60002: Distributed Systems
Distributed Systems, Consensus and Replicated State Machines
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed systems Consensus
Presentation transcript:

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform Consensus Spring 2008 Prof. Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Today’s Material Distributed Algorithms, Nancy Lynch –Ch. 6 Distributed Computing, Attiya and Welch –Ch. 5

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: State Machine Replication (SMR) Client A Client B atomic broadcast

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Replica Coordination Requirements Agreement: all replicas receive all client requests –What happens when a replica (server) fails? –What happens when a client fails? Order: replicas process requests in the same order

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Uniform Atomic Broadcast Uniform Reliable Broadcast –Validity: if a correct process broadcasts m then all correct processes eventually deliver m –Uniform Agreement: if any process delivers m then all correct processes eventually deliver m –Integrity: m is delivered by a correct process at most once, and only if it was previously broadcast Uniform Total Order –If any two processes deliver both m and m’, they deliver them in the same order

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Today’s Problem: Uniform Consensus Each process has an input, should on decide an output (one-shot problem) Uniform Agreement: every two decisions are the same Validity: every decision is an input of one of the processes Termination: eventually all correct processes decide

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring (Uniform) Consensus versus (Uniform) Atomic Broadcast From Atomic Broadcast to Consensus From Consensus to Atomic Broadcast –Homework question From now on, we will focus mainly on consensus, and keep in mind that it suffices for Atomic Broadcast and SMR

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Today’s Model(s) Round-based synchronous Static set P = {p 1, …, p n } of processes Fault tolerance: 1. Crash failures, reliable links 2. Message loss

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Round Synchronous Round-Based Model Synchronous rounds: –Send messages to any set of processes; –Receive messages from this round; –Do local processing (possibly decide, halt)

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model 1 : Round-Based Failstop If process p i does not crash in round r, and p j does not crash in or before round r then any message sent by p i to p j in round r is received by p j in round r Note: If p i crashes in a round, then any subset of the messages p i sends in this round can be lost

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Round-Based Failstop Model If no message from p j is received, then p j is suspected If p i fails in round r, then any subset of the messages p i sends in r may arrive If p i is suspected in round r, p i fails in round r or r-1 –No further messages from p i will arrive round 1round 2 p1 p2 p3 p 1 crashes in round 2; p 2 receives p 1 ’s round 2 message p 3 suspects p 1 in round 2

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring t-Resilient Algorithm t is a threshold on the number of potential failures –The algorithm is correct as long as no more than t processes fail In the following algorithm, 0 ≤ t < n We denote by f the number of actual failures that occur in a given run, 0 ≤ f ≤ t We’d like t to be big (robust algorithm) –But f will usually be small (failures are rare)

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Notation P = {p 1, …, p n } is the set of processes init i is p i ’s initial value (input) The decide action determines the output Show code for process p i Local variables of p i are denoted: v i, Alive i

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring t-Resilient Failstop Uniform Consensus Algorithm v i =init i ; Alive i = P in every round 1 ≤ k ≤ t+2: send v i to all receive round k messages for all p j if (received v j ) then v i = min(v i, v j ) otherwise p j is suspected if ( (  p j  Alive i : received v j = v i ) && !decided ) then decide v i. for all p j if (suspect p j ) then Alive i =Alive i  {p j }

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Validity Lemma: For every process p i, v i always includes the initial value init j of some process p j.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Uniform Agreement Lemma: –If exist value v, round r, and process p i s.t. –all processes that are in Alive i at the beginning of round r send v in round r, –then v is the only possible decision value from r onward.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Uniform Agreement (Cont’d) From the Lemma, we get that if some process decides v in round r, then v is the only possible decision value from r onward. Now look at the first round in which some process decides.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Termination After a round r in which no process fails, all processes have the same v i forever. –Because all receive the same messages in r, –By induction… Consider a run where f processes fail. –In f+2 rounds, there is at least one failure-free round followed by a round in which Alive i does not change for a correct process pi. –Thus, after at most f+2 rounds, there is a round in which Alive i does not change and all received values are the same.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring How Long Does it Take? Early-deciding: in a run with f failures, decision is reached by the end of round f+2 This is optimal –For Uniform Consensus, but not for Consensus –As long as f < t-1

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Deciding vs. Stopping (Halting) The algorithm is not early-stopping: –It continues running for t+2 rounds –Even after reaching a decision Homework question: can you change the algorithm to be early-stopping? –Stop (halt) after f+k rounds in runs with t≥f≥0 failures for some constant k

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model 2: Message Loss Aka “Two Generals Problem”

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Example: Coordinated Attack Let’s attack A B

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model Two generals (processes) –Do not fail Synchronous –Pocessing and communication Lossy communication

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring The Coordinated Attack Problem Requirements: –Both generals must decide the same: either to attack or not to attack –If both are not ready to attack they must not attack –If both are ready to attack then they must attack Motivation: atomic transaction commit in distributed databases [Gray 78]

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Coordinated Attack is 2-Process Uniform Consensus Agreement: If both generals decide, they decide the same Termination: Every general eventually decides Validity: If both inputs are “not ready” the decision is “no attack”; if both inputs are “ready” then the decision is “attack”

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring A Simple Solution General A sends vote (“yes” or “no”) General B responds with his vote If both say yes, they attack Otherwise they do not Aka 2-phase commit Problems?

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Not so Fast… Any number of messengers can be captured (message loss) Agreement impossible Proof on the board

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Coordinated Attack Definition: Take II Revised requirements: –Both generals must decide the same: either to attack or not to attack –If both are not ready to attack they must not attack –If both are ready to attack and no messages are lost then they must attack Note: this is not an assumption about the model. It’s a conditional requirement that has to hold only in runs in which no messages are lost. Proof on the board!