 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 1 Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.

Slides:

Advertisements

Similar presentations

Impossibility of Distributed Consensus with One Faulty Process

Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.

CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.

The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.

A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)

IMPOSSIBILITY OF CONSENSUS Ken Birman Fall Consensus… a classic problem  Consensus abstraction underlies many distributed systems and protocols.

6.852: Distributed Algorithms Spring, 2008 Class 7.

P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.

Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus

How to Choose a Timing Model? Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.

Timeliness, Failure Detectors, and Consensus Performance Alex Shraer Joint work with Dr. Idit Keidar Technion – Israel Institute of Technology In PODC.

1 © R. Guerraoui Implementing the Consensus Object with Timing Assumptions R. Guerraoui Distributed Programming Laboratory.

Byzantine Generals Problem: Solution using signed messages.

Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:

1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.

© Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.

Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)

1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.

Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )

CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 10: SMR with Paxos.

Timeliness, Failure Detectors, and Consensus Performance Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.

Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.

1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.

Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Paxos Spring.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.

1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.

CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.

Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.

On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.

Composition Model and its code. bound:=bound+1.

Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:

Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms.

Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.

Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.

Consensus and Its Impossibility in Asynchronous Systems.

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.

SysRép / 2.5A. SchiperEté The consensus problem.

1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.

Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.

Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,

DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.

1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.

CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.

On the Performance of Consensus Algorithms: Theory and Practice Idit Keidar Technion & MIT.

Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.

Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.

Distributed Systems, Consensus and Replicated State Machines

Distributed systems Consensus

Presentation transcript:

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors Spring 2009 Prof. Idit Keidar

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Material Chandra and Toueg, Unreliable Failure Detectors for Reliable Distributed Systems. Mostefaoui and Raynal, Solving Consensus using Chandra-Toueg’s Unreliable Failure Detectors: A General Approach. Keidar and Rajsbaum, On the Cost of Fault- Tolerant Consensus When There are no Faults: A Tutorial.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: Consensus Each process has an input, should irrevocably decide an output Agreement: correct processes’ decisions are the same Validity: decision is input of one process Termination: eventually all correct processes decide

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Asynchronous Model No bounds on message delays, processing times Good for unpredictable settings, e.g., Internet

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Asynchronous Model with Crash Failures Asynchronous –Messages can be delayed arbitrarily Safety or liveness?? –Processes take steps at asynchronous times No clocks Crash failures –A process that crashes at any point in a run is faulty in that run Reliable links

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Fault-Tolerant Asynchronous Consensus is Impossible Every asynchronous fault-tolerant consensus algorithm has a fair run in which no process decides [ FLP85 ] Fair run – if some event can happen (is enabled) long enough, this event happens –E.g., with reliable links, every sent message is eventually delivered Note: fairness is a condition on the environment, not the consensus protocol

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Key Difficulty Distinguish slow process from faulty one When to timeout?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring So What Should We Do? Use synchronous model? –Always possible: messages never take more than 2 days –Use long rounds (conservative timeouts) to ensure that all messages arrive on time In practice, avg. latency can be < [Cardwell, Savage, Anderson 2000], [Bakr-Keidar 2002] max. latency 100 long timeout

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Motivation: Choosing a Model Example network: –99% of packets arrive within 10 µsec –Upper bound of 1000 µsec on message latency What would we choose the round duration for a round-based synchronous system? –Implication? We would like to choose a timeout of 10 µsec, but without violating safety…

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring The Middle Ground We can choose timeouts that usually hold –During long stable periods, delays and processing times are bounded like synchronous model –Some unstable periods like asynchronous model We can design algorithms that always ensure safety, but ensure liveness only at stable times

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring How Do We Model This? Assume that in each run there is a Global Stabilization Time (GST) after which the system is stable Unbounded Unknown

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Eventual Synchrony (ES) Model [Dwork, Lynch, Stockmeyer 88] Processes have clocks with bounded drift There are upper bounds –  on message delay, and –  on processing time GST, global stabilization time –Until GST, unstable: bounds do not hold –After GST, stable: bounds hold –GST unbounded, unknown

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Eventual Synchrony (ES) in Practice For , , choose bounds that hold with high probability Stability forever? –Model: assume yes – clean model –In practice, no need for it – stability has to last “long enough” for given algorithm to terminate Does it make it a bad model? 13

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 A Note on “Good Models” Accurate – analysis yields truths about the analyzed system/object Tractable – analysis is possible Accurate and tractable models are hard to define –Need to abstract away issues that do not affect the phenomena of interest –Include exactly those attributes that do 14

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Why Assume Stability Forever? Real world: Model: That was too short – no decision yet That was long enough – decided Algorithms that work in the real-world work in the model and vice versa!

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Time-Free Algorithms Describe algorithms using a failure detector abstraction [Chandra, Toueg 96] Goal: abstract away time, get simpler algorithms What makes a good abstraction? –Implementable in abstracted model (ES) –Sufficient for applications (consensus)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring The Failure Detector Abstraction [Chandra, Toueg 96] Each process has a local failure detector oracle –Typically outputs list of processes suspected to have crashed at any given time Algorithm A 1 FD {p 3,p 7 } Network Algorithm A n FD {p 3 } …

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring A Natural Failure Detector Implementation in Eventual Synchrony Model Send heartbeat messages at regular times Implement failure detector using timeouts: –When expecting a message process i should send at time t, wait until t +  clock skew before suspecting i –Whenever a message from i arrives, unsuspect i In stable periods,  always hold, hence no false suspicions FD Builder

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring The Resulting Failure Detector Is ◊P - Eventually Perfect Strong Completeness: From some point on, every faulty process is suspected by every correct process Eventual Strong Accuracy: From some point on, no correct process is suspected Is it implementable in asynchronous systems?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring t 0 q does not suspect p 00 t 0 p crashes '0'0 t 1 q suspects p t 0 p’s msgs delayed 11 t 1 q suspects pt 2 q does not suspect p '1'1 Are we done? Now,  1 is fair Build a Fair Run W/out Failures s.t. There Is No Time After Which q Does Not Suspect p t0t0 t 1 q suspects p t 2 p crahses t 3 q suspects p 22 t0t0 t 1 q suspects p t 2 p’s msgs delayed t 3 q suspects p Continue by induction to build an infinite fair run in which q is correct, suspected at t 1,t 3,t 5, …

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring The Failure Detector Abstraction Asynchronous model with failure detectors Higher level abstraction than ES model –Forget about ,  Each process has a failure detector oracle Alg Builder Algorithm A 1 FD {p 3,p 7 } ES is hidden here

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Weaker Failure Detector: ◊S – Eventually Strong Strong Completeness Eventual Weak Accuracy: There exists some correct process that is not suspected by any correct process from some point on –Processes do not know who this process is I suspect Josh and Joe I suspect Joe and John I suspect Joe Joanne Josh Joe I suspect Josh John

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Some Notes on ◊S ◊P is a subset of ◊S –Every failure detector of class ◊P is also of class ◊S Strictly weaker than ◊P –Sometimes homework question Equivalent to the weakest for consensus

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model n processes 1,…,n t<n/2 of them can crash –This is optimal; we will show later Reliable links between correct processes Asynchronous with ◊S oracle Alg Builder

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring ◊S-based Consensus: MR Algorithm [ Mostefaoui, Raynal 99 ] Asynchronous rounds: –Each process locally progresses through rounds r = 1, 2, 3, … –Different processes can progress at different times Rotating coordinator –Process i mod n is the coordinator of round i Each round consists of two phases

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring MR Algorithm val  input; est   || for r =1, 2, … do coord  (r-1 mod n)+1 if I am coord, then send (r,val) to all wait for ( (r, v) from coord OR suspect coord (by ◊S)) if receive v from coord then est  v else est   send (r, est) to all wait for (r,e) from n-t processes if any non-  value e received then val  e if all received e’s have same non-  value v then send (“decide”, v) to all return(v) || Upon receive (“decide”, v), forward to all; return(v) 1 2

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Failure-Free Suspicion-Free Run 11 2 n (1, v 1 ) 1 2 n all have est = v 1 all decide v 1 (decide, v 1 )

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Coordinator is Suspected 11 2 n (1, v 1 ) 1 2 n (1,  ) all have est =  delayed (2, v 2 ) delayed no decision

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring One Suspicion Per-Round is Enough for FLP 11 2 n (1, v 1 ) 1 2 n (1,  ) est =  delayed no decision

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Phase 1 Rationale Ensure that for every p i : est i  {val coord,  } –Do all processes have the same est? Progress –Why does the 1 st phase terminate?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Phase 2 Rationale Ensure Agreement –If process p i decides v during round r, and process p j progresses to round r+1, then p j does so with val j = v. Progress –Why does the 2 nd phase terminate?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Phase 2 Rationale (Cont’d) The 2 nd phase ends upon receiving (r, est) from a majority of processes (n-t is a majority) Why is the majority important? –Every two majority sets intersect –If one process gets n-t messages with v, then every other correct process gets at least one message with v

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Possible Scenarios in Phase 2 p i gets only v –p i decides –All other processes get v at least once p i gets only  –All other processes get  at least once –Nobody decides p i gets both v and  –Some other process might decide v –p i sets val i to v Can p i get two different values v and v’?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Validity Proof For every i, val i and est i always store the initial value of some process or  By induction on the length of the execution: –Initially, for every process i, val i stores i’s initial value, and est i is  –Subsequently, they can only change to store a val j or est i value sent by some process j

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Lemmas Lemma 1: If in some round r, two messages (r,v) and (r,v’) are sent such that v ≠  and v’ ≠ , then v=v’. Lemma 2: If in some round r, n-t processes send (r,v), then for every round r’>r, if a message (r’,v’) with v’ ≠  is sent, then v=v’. –Hint: n-t > n/2.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Agreement Proof Assume by contradiction that two different decisions, v ≠ v’ are made. Let r (r’) be the first round in which some process i (i’) decides v (v’) when it receives n-t (r,v) ((r’,v’)) messages. By Lemma 1, r ≠ r’, and by Lemma 2, neither r > r’ nor r’>r. A contradiction.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Termination Proof Steps Progress: until some process decides, no process is ever “stuck” in a round forever First decision: some correct process eventually decides Subsequent decisions: if some correct process decides, then all correct processes eventually decide

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring What Do We Need “Decide” For? val  input; est   || for r =0,1, 2, … do coord  (r mod n)+1 if I am coord, then send (r,val) to all wait for ( (r, val) from coord OR suspect coord (by ◊S)) if receive val from coord then est  val else est   send (r, est) to all wait for (r,est) from n-t processes if any non-  est received then val  est if all ests have same non-  value v then send (“decide”, v) to all return(v) od

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Why Send “Decide”? 11 2 n (1, v 1 ) 1 2 n suspect 1 est =  delayed no decision delayed (1,  ) Decide

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Disseminating the Decision OK, so we need the 1 st “decide”. Why forward to all? Hint: reliable broadcast

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Why Forward “Decide”? n=4, t= (1, v 1 ) suspect 1 est =  no decision (1,  ) decide (2, v 1 ) X X 4 2 stuck, no n-t

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring How Long Does It Take? The algorithm can take unbounded time –What if no failures occur? Is this inevitable? Can we say more than “decision is reached eventually” ?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Performance Metric Number of communication steps in well-behaved runs Well-behaved: –No failures –Stable (synchronous) from the beginning –With failure detector: no false suspicions Motivation: common case

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring The Algorithm’s Running Time in Well-Behaved Runs In round 1, the coordinator is correct, not suspected by any process All processes decide at the end of phase two of round 1 –Decision in two communication steps –Halting (stopping) takes three steps –Same as in synchronous model For Uniform Consensus

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Back to Our Example Example network: –99% of packets arrive within 10 µsec –Upper bound of 1000 µsec on message latency Now we can choose a timeout of 10 µsec, without violating safety! Most of the time, the algorithm will be just as fast as a synchronous uniform consensus algorithm –We did pay a price in resilience, though

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Indulgent Algorithms ◊S or ◊P failure detector’s output can be wrong (even arbitrary) for an unbounded (finite) prefix of a run An algorithm that tolerates unbounded periods of asynchrony is called indulgent [ Guerraoui 98 ]

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Observations on Indulgent Consensus Algorithms Every indulgent consensus algorithm also solves uniform consensus [ Guerraoui 98 ] It is impossible to solve t-resilient indulgent consensus when t ≥ n/2 [ Chandra, Toueg 96; Guerraoui 98 ] –Proof on the board (see next slide)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 33 Example: n=4, t=2 p1 p2 P Decide 0 by validity and termination p3 p4 Q p1 p2 P Decide 1 p3 p4 Q 11 22 x x x x Decide 0 by validity and termination Can all fail because |Q| ≤ t Decide 1

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 33 Example: n=4, t=2 p1 p2 P Decide 0 by validity and termination p3 p4 Q Decide 0 by validity and termination Decide 1 All messages between P and Q arrive after decisions are already made

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Alternative Weak Failure Detector  – Leader –Outputs one trusted process –From some point, all correct processes trust the same correct process Can easily implement ◊S Is the weakest for consensus [Chandra, Hadzilacos, Toueg 96]

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring A Natural  Implementation Use ◊P implementation Output lowest id non-suspected process

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Summary Prepare for the worst –Safety under asynchrony Hope for the best –Liveness & good performance in common cases Nice clean models –Eventual stability –Time-free abstractions: unreliable failure detectors