Chapter 3 - Motivating Self-Stabilization3-1 Chapter 3 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights Reserved.

Slides:

Advertisements

Similar presentations

Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, Duplicating and non-FIFO Dynamic Networks Shlomi Dolev 1, Ariel Hanemann 1, Elad.

Advertisements

Impossibility of Distributed Consensus with One Faulty Process

Chapter 8 Fault Tolerance

Primitives for Achieving Reliability 3035/GZ01 Networked Systems Kyle Jamieson Department of Computer Science University College London.

CS 542: Topics in Distributed Systems Diganta Goswami.

Chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights.

Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 2 - Definitions, Techniques and Paradigms2-1 Chapter 2 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of May 2003, Shlomi.

PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,

Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

1 Complexity of Network Synchronization Raeda Naamnieh.

Chapter 8 - Self-Stabilizing Computing1 Chapter 8 – Self-Stabilizing Computing Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of January 2004 Shlomi.

CS 582 / CMPE 481 Distributed Systems Fault Tolerance.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Chapter 4 - Self-Stabilizing Algorithms for Model Conservation4-1 Chapter 4: roadmap 4.1 Token Passing: Converting a Central Daemon to read/write 4.2 Data-Link.

CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.

Multicast Protocols Jed Liu 28 February Introduction  Recall Atomic Broadcast:  All correct processors receive same set of messages.  All messages.

CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.

Chapter Resynchsonous Stabilizer Chapter 5.1 Resynchsonous Stabilizer Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of Jan 2004, Shlomi.

Chapter 4 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of October 2003 Shlomi Dolev, All Rights Reserved ©

Self Stabilization Classical Results and Beyond… Elad Schiller CTI (Grece)

Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.

Composition Model and its code. bound:=bound+1.

Chapter 18.3: Distributed Coordination Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Chapter.

Chapter 7 - Local Stabilization1 Chapter 7 – Local Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of January 2004 Shlomi Dolev, All.

 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.

On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.

Selected topics in distributed computing Shmuel Zaks

Data Link Control Protocols

McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Chapter 11 Data Link Control Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction.

Andreas Larsson, Philippas Tsigas SIROCCO Self-stabilizing (k,r)-Clustering in Clock Rate-limited Systems.

On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.

Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.

Chapter 14 Asynchronous Network Model by Mikhail Nesterenko “Distributed Algorithms” by Nancy A. Lynch.

Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.

Data Link Protocols Reliable FIFO communication using less reliable channels By Ken Schmidt.

Copyright 2008 Kenneth M. Chipps Ph.D. Controlling Flow Last Update

Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.

CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.

Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic

Several sets of slides by Prof. Jennifer Welch will be used in this course. The slides are mostly identical to her slides, with some minor changes. Set.

Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.

CS 542: Topics in Distributed Systems Self-Stabilization.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.

Lecture Focus: Data Communications and Networking  Data Link Layer  Data Link Control Lecture 22 CSCS 311.

1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb

Superstabilizing Protocols for Dynamic Distributed Systems Authors: Shlomi Dolev, Ted Herman Presented by: Vikas Motwani CSE 291: Wireless Sensor Networks.

© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Principles of reliable data transfer 0.

Fault Tolerance (2). Topics r Reliable Group Communication.

DATA LINK CONTROL. DATA LINK LAYER RESPONSIBILTIES  FRAMING  ERROR CONTROL  FLOW CONTROL.

Computer Networking Lecture 16 – Reliable Transport.

Data Link Layer.

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

Computer Communication & Networks

Reliable Transmission

Part III Datalink Layer 10.

8.6. Recovery By Hemanth Kumar Reddy.

CMPT 371 Data Communications and Networking

MODULE I NETWORKING CONCEPTS.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Getting Connected (Chapter 2 Part 3)

Presentation transcript:

Chapter 3 - Motivating Self-Stabilization3-1 Chapter 3 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights Reserved ©

Chapter 3 - Motivating Self-Stabilization3-2 Chapter 3: Motivating Self-Stabilization  Converging to a desired behavior from any initial state enables the algorithm to converge from an arbitrary state caused by faults  Why should one have interest in self-stabilizing algorithms? Its applicability to distributed systems Recovering from faults of a space shuttle. Faults may cause malfunction for a while. Using a self- stabilizing algorithm for its control will cause an automatically recovery, and enables the shuttle continue in its task

Chapter 3 - Motivating Self-Stabilization3-3 What is a Self-Stabilizing Algorithm ?  This question will be answered using the “Stabilizing Orchestra” example  The Problem: The conductor is unable to participate – harmony is achieved by players listening to their neighbor players Windy evening – the wind can turn some pages in the score, and the players may not notice the change

Chapter 3 - Motivating Self-Stabilization3-4 The “Stabilizing Orchestra” Example  Our Goal: To guarantee that harmony is achieved at some point following the last undesired page turn  Imagine that the drummer notices a different page of the violin next to him … (solutions and their problems): 1. The drummer turns to its neighbors new page – what if the violin player noticed the difference as well ? 2. Both the drummer and violin player start from the beginning - what if the player next to the violin player notices the change only after sync between the other 2 ?

Chapter 3 - Motivating Self-Stabilization3-5 The “Stabilizing Orchestra” Example – the Self-Stabilizing Solution  Every player will join the neighboring player who is playing the earliest page (including himself)  Note that the score has a bounded length. What happens if a player goes to the first page of the score before harmony is achieved? This case is discussed in details in chapter 6.  In every long enough period in which the wind does not turn a page, the orchestra resumes playing in synchrony

Chapter 3 - Motivating Self-Stabilization3-6 Chapter 3: roadmap 3.1 Initialization of a Data-Link Algorithm in the Presence of Faults 3.2 Arbitrary Configuration Because of Crashes 3.3 Frequently Asked Questions

Chapter 3 - Motivating Self-Stabilization3-7 The Data Link Algorithm  The task of delivering a message is sophisticated, and may cause message corruption or even loss Physical Layer Data link Layer Tail Packet Frame Network Layer Head Physical Layer Data link Layer Tail Packet Frame Network Layer Head The layers involved: Physical Layer Data link Layer Tail Packet Frame Network Layer Head Physical Layer Data link Layer Tail Packet Frame Network Layer Head The layers involved: TailPacket Frame Head The sender sends sequences of bits to the receiver

Chapter 3 - Motivating Self-Stabilization3-8 The alternating-bit algorithm Is used to cope with possibility of frame corruption or loss 01 initialization 02 begin 03 i := 1 04 bit s := 0 05 send(  bit s,im i  ) (*im i is fetched*) 06 end (*end initialization*) 07 upon a timeout 08 send(  bit s,im i  ) 09 upon frame arrival 10 begin 11receive(FrameBit) 12if FrameBit = bit s then 13 begin 14bit s := (bit s + 1) mod 2 15i := i end 17send(  bit s,im i  ) (*im i is fetched*) 18 end Sender 01 initialization 02 begin 03 j := 1 04 bit r := 1 05 end (*end initialization*) 06 upon frame arrival 07 begin 08receive(  FrameBit, msg  ) 09if FrameBit  bit r then 10 begin 11bit r := FrameBit 12j := j om j := msg 14end 15send(bit r ) 16 end Receiver Every message from the sender is repeatedly sent in a frame to the receiver until acknowledges arrives acknowledgement Send acknowledgement

Chapter 3 - Motivating Self-Stabilization3-9 The alternating-bit algorithm – run sample SR bit s = 0 bit R = 1 SR bit s = 0 bit R = 1 Upon a timeout … SR.... bit s = 0 bit R = 0 R received m 1 Upon a timeout … SR bit s = 0 bit R = 0 SR bit s = 1 bit R = 0 S received ack. R received m 1 again SR bit s = 0 bit R = 0 R received m 1 again SR bit s = 1 bit R = 0 Upon a timeout … SR bit s = 1 bit R = 1 R received m 2 SR bit s = 1 bit R = 1 Once the sender receives an acknowledgment, no frame with sequence number 0 exists in the system

Chapter 3 - Motivating Self-Stabilization3-10 There Is No Data-link Algorithm that can Tolerate Crashes  It is usually assumed that a crash causes the sender/receiver to reach an initial state  No initialization procedure exists such that we can guarantee that every message fetched by the sender, following the last crash, will arrive at its destination  The next Execution will demonstrate this point. Denote: Crash R – receiver crash Crash S – sender crash  Crash X causes X to perform an initialization procedure

Chapter 3 - Motivating Self-Stabilization3-11 The Pumping Technique The idea : repeatedly crash the sender and the receiver and to replay parts of the RE in order to construct a new execution E’ Reference Execution (RE) = Crash S, Crash R, send S (f s1 ), receive R (f s1 ), send R (f r1 ), receive S (f r1 ), send S (f s2 ), …, receive S (f rk ) SR f s1 S sends f s1 SR f r1 R receives f s1 and sends f r1 SR Crash S S crashes f r1 SR f s2 f s1 S sends f s1 receives f r1 and sends f s2 SR f s2 f s1 Crash R R crashes SR f r1 f r2 R receives f s1, sends f r1, receives f s2 and sends f r2 SR f r1 f r2... f rk R receives f s1, sends f r1, receives f s2, sends f r2, …, receives f sk and sends f rk SR f r1 f r2... f rk Crash R Now S and R crash Crash S f sk... f s2 f s1 We let S send f si and receive f ri (i from 1 to k) SR If these k frames are lost, no information about the message exists in the system SR SR Crash S Suppose Crash S and Crash R occurred Crash R SR Crash S f r1 f r2... f r(k-1) S crashes..m2m1..m2m1 SR f sk... f s2 f s1 S sends f s1 receives f r1, sends f s2 receives f r2, …, receives f r(k-1) and sends f sk m2m2 m1m1 m2m Continue with the same technique

Chapter 3 - Motivating Self-Stabilization3-12 Conclusion !  It is possible to show that there is no guarantee that the k th message will be received  We want to require that eventually every message fetched by the sender reaches the receiver, thus requiring a Self-Stabilizing Data-Link Algorithm

Chapter 3 - Motivating Self-Stabilization3-13 Chapter 3: roadmap 3.1 Initialization of a Data-Link Algorithm in the Presence of Faults 3.2 Arbitrary Configuration Because of Crashes 3.3 Frequently Asked Questions

Chapter 3 - Motivating Self-Stabilization3-14 Arbitrary configuration because of crashes  A combination of crashes and frame losses can bring a system to any arbitrary states of processors and an arbitrary configuration

Chapter 3 - Motivating Self-Stabilization3-15 Any Configuration Can be Reached by a Sequence of Crashes  The pumping technique is used to reach any arbitrary configuration starting with the reference execution Reference Execution (RE) = Crash S, Crash R, send S (f s1 ), receive R (f s1 ), send R (f r1 ), receive S (f r1 ), send S (f s2 ), …, receive S (f rk )  The technique is used to accumulate a long sequence of frames

Chapter 3 - Motivating Self-Stabilization3-16 Reaching an Arbitrary Configuration  Our first goal – creating an execution in which RE appears i times in a row (RE) i SR f r1 f r2... f rk First we use the Pumping Technique to receive RE SR f s1 f r1 f r2... f rk S sends fs1 SR Crash S f s1 f r1 f r2... f rk S crashes SR f sk, …, f s2, f s1, f s1, S sends f s1, receives f r1, sends f s2, receives f r2, …, sends f sk, receives f rk, SR F sE f r1 R receives f s1 and sends f r1 Denote : F rE (F sE ) – the sequence of frames sent by the receiver (sender) in RE SR F sE f r1 Crash R R crashes SR f r1 F rE R receives f s1 sends f r1 … receives f sk and sends f rk SR Crash S f r1 F rE S crashes SR f s2 f s1 F rE S sends f s1, receives f r1, sends f s2 SR Crash S f s2 f s1 F rE S crashes SR F sE f s2 f s1 S sends f s1, receives f r1, …, sends f sk, receives f rk SR F sE F sE S received the first F rE, crashed and received the second..... Continue with the same technique F i rE (F i sE ) = the sequence F r(s)E F r(s)E … F r(s)E (i times) SR F i sE For any finite i, the technique can be extended to reach a configuration in which F i sE appears in q s,r

Chapter 3 - Motivating Self-Stabilization3-17 Reaching an Arbitrary Configuration  Our second goal – achieving c a (an arbitrary configuration)  Denote k 1 (k 2 )- the number of frames in q s,r (q r,s ) in c a  i = k 1 +k 2 +2 SR F i sE Using the previous technique we accumulate F i sE SR F k1+1 sE F k2+1 rE R replays RE k 2 +1 times S'S'R F k1+1 sE q r,s S replays RE using the first F rE until it reaches its desired state (loosing the frames sent by it and the leftovers of F k2 rE that are not in q r,s ) S'S'R'R' q s,r q r,s We do the same with R, reaching the arbitrary configuration c a

Chapter 3 - Motivating Self-Stabilization3-18 Crash-Resilient Data-Link Algorithm,With a Bound on the Number of Frames in Transit  Crashes are not considered severe type of faults (Byzantine are more severe - chapter 6)  The algorithm uses the initialization procedure, following the crashes of S and R  bound – the maximal number of frames that can be in transit SR S,in after-crash state, invokes a clean procedure SR Crash S S crashes SR.... SR... SR... SR... S received, then sends repeatedly until it will receive..... Continue until S receives SR... When the sender receives the first it can be sure that the only label in transit is bound+1, and can initialize the alternating bit algorithm (similarly R can initialize as well) SR bit s = 0 bit R = 1

Chapter 3 - Motivating Self-Stabilization3-19 Crash-Resilient Data-Link Algorithm – R crashes SR Crash R R crashes SR bit R = i SR bit R =FrameBit R received msg and assigned FrameBit to bit R it then delivers msg to the output queue – The Problem : extra copy of msg in the output queue

Chapter 3 - Motivating Self-Stabilization3-20 Crash-Resilient Data-Link Algorithm – R crashes Can we guarantee at most one delivery, and exactly- once delivery after the last crash?  bit R initialization should assure that a message fetched after the crash will be delivered  A solution: S sends each message in a frame with label 0, until Ack. arrives and then sends the same message with label 1 until an Ack. arrives R delivers a message only with label 1 that arrives immediately after label 0

Chapter 3 - Motivating Self-Stabilization3-21 Chapter 3: roadmap 3.1 Initialization of a Data-Link Algorithm in the Presence of Faults 3.2 Arbitrary Configuration Because of Crashes 3.3 Frequently Asked Questions

Chapter 3 - Motivating Self-Stabilization3-22 What is the Rational behind assuming that the states of the processors can be corrupted while the processors’ programs cannot ? The program is stored in a long-term memory device which makes it possible to 1. Reload program statements periodically 2. Protect the memory segment using a read-only memory device If the program is subjected to corruption, any configuration is possible. The Byzantine model allows 1/3 of processors to execute corrupted programs

Chapter 3 - Motivating Self-Stabilization3-23 Safety Properties  Safety and Liveness properties should be satisfied by a distributed algorithm Safety ensures avoiding bad configurations Liveness ensures achieving the systems’ goal  The designer of a self-stabilizing algorithm wants to ensure that even if the safety property is violated, the system execution will reach a suffix in which both properties hold  What use is an algorithm that doesn’t ensure that a car never crashes? If the faults are severe enough to make the algorithm reach an arbitrary configuration, the car may crash no matter what the algorithm is chosen

Chapter 3 - Motivating Self-Stabilization3-24 Safety Properties … A safety property for a car controller might be: never turn into a one-way road When no specification exists the car can continue driving on this road and crash with other cars A self Stabilization controller will recover from this non-legal init (by turning the car)

Chapter 3 - Motivating Self-Stabilization3-25 Processors Can Never be Sure that a Safe Configuration is Reached What use is an algorithm in which the processors are never sure about the current global state?  The question confuses the assumptions (transient faults occurrence) with the algorithm that is designed to fit the severe assumptions. A self-stabilizing algorithm can be designed to start in a particular (safe) state  A self-stabilizing algorithm is at least good as a non-self-stabilizing one for the same task, and is in fact much better !!!