Self-Stabilization: An approach for Fault-Tolerance in Distributed Systems Stéphane Devismes 16/12/2013MAROC'2013.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

Chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights.
Lecture 8: Asynchronous Network Algorithms
Teaser - Introduction to Distributed Computing
Chapter 15 Basic Asynchronous Network Algorithms
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty.
Snap-Stabilization in Message-Passing Systems Sylvie Delaët (LRI) Stéphane Devismes (CNRS, LRI) Mikhail Nesterenko (Kent State University) Sébastien Tixeuil.
Self-Stabilization in Distributed Systems Barath Raghavan Vikas Motwani Debashis Panigrahi.
Snap-Stabilizing Detection of Cutsets Alain Cournier, Stéphane Devismes, and Vincent Villain HIPC’2005, December , Goa (India)
Fast Leader (Full) Recovery despite Dynamic Faults Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
Introduction to Self-Stabilization Stéphane Devismes.
Byzantine Generals Problem: Solution using signed messages.
Broadcasting Protocol for an Amorphous Computer Lukáš Petrů MFF UK, Prague Jiří Wiedermann ICS AS CR.
From Self- to Snap- Stabilization Alain Cournier, Stéphane Devismes, and Vincent Villain SSS’2006, November 17-19, Dallas (USA)
1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.
Fast Distributed Algorithm for Convergecast in Ad Hoc Geometric Radio Networks Alex Kesselman, Darek Kowalski MPI Informatik.
Practical Belief Propagation in Wireless Sensor Networks Bracha Hod Based on a joint work with: Danny Dolev, Tal Anker and Danny Bickson The Hebrew University.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
LSRP: Local Stabilization in Shortest Path Routing Anish Arora Hongwei Zhang.
CS294, YelickSelf Stabilizing, p1 CS Self-Stabilizing Systems
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Self-Stabilization An Introduction Aly Farahat Ph.D. Student Automatic Software Design Lab Computer Science Department Michigan Technological University.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Distributed Computing 5. Synchronization Shmuel Zaks ©
Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Andreas Larsson, Philippas Tsigas SIROCCO Self-stabilizing (k,r)-Clustering in Clock Rate-limited Systems.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Snap-Stabilizing PIF and Useless Computations Alain Cournier, Stéphane Devismes, and Vincent Villain ICPADS’2006, July , Minneapolis (USA)
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
DISTRIBUTED SYSTEMS II A POLYNOMIAL LOCAL SOLUTION TO MUTUAL EXCLUSION Prof Philippas Tsigas Distributed Computing and Systems Research Group.
A Self-Stabilizing O(n)-Round k-Clustering Algorithm Stéphane Devismes, VERIMAG.
Self-Stabilizing K-out-of-L Exclusion on Tree Networks Stéphane Devismes, VERIMAG Joint work with: – Ajoy K. Datta (Univ. Of Nevada) – Florian Horn (LIAFA)
Self-Stabilizing K-out-of-L Exclusion on Tree Networks Stéphane Devismes, VERIMAG Joint work with: – Ajoy K. Datta (Univ. Of Nevada) – Florian Horn (LIAFA)
Approximation of δ-Timeliness Carole Delporte-Gallet, LIAFA UMR 7089, Paris VII Stéphane Devismes, VERIMAG UMR 5104, Grenoble I Hugues Fauconnier, LIAFA.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
The Cost of Fault Tolerance in Multi-Party Communication Complexity Binbin Chen Advanced Digital Sciences Center Haifeng Yu National University of Singapore.
Weak vs. Self vs. Probabilistic Stabilization Stéphane Devismes (CNRS, LRI, France) Sébastien Tixeuil (LIP6-CNRS & INRIA, France) Masafumi Yamashita (Kyushu.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Fault Management in Mobile Ad-Hoc Networks by Tridib Mukherjee.
SysRép / 2.5A. SchiperEté The consensus problem.
Snap-Stabilization in Message-Passing Systems Sylvie Delaët (LRI) Stéphane Devismes (CNRS, LRI) Mikhail Nesterenko (Kent State University) Sébastien Tixeuil.
Anish Arora Ohio State University Mikhail Nesterenko Kent State University Local Tolerance to Unbounded Byzantine Faults.
CS 542: Topics in Distributed Systems Self-Stabilization.
Sorting on Skip Chains Ajoy K. Datta, Lawrence L. Larmore, and Stéphane Devismes.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Self-Stabilizing Algorithm with Safe Convergence building an (f,g)-Alliance Fabienne Carrier Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Yvan Rivierre.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Snap-Stabilization in Message-Passing Systems Sylvie Delaët (LRI) Stéphane Devismes (CNRS, LRI) Mikhail Nesterenko (Kent State University) Sébastien Tixeuil.
Superstabilizing Protocols for Dynamic Distributed Systems Authors: Shlomi Dolev, Ted Herman Presented by: Vikas Motwani CSE 291: Wireless Sensor Networks.
Self-stabilizing (f,g)-Alliances with Safe Convergence Fabienne Carrier Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Yvan Rivierre.
Snap-Stabilizing Depth-First Search on Arbitrary Networks Alain Cournier, Stéphane Devismes, Franck Petit, and Vincent Villain OPODIS 2004, December
Around Self-Stabilization Part 2: Strengthened Forms of Self-Stabilization Stéphane Devismes Post-Doc CNRS at the LRI (Paris VII)
CSE-591: Term Project Self-stabilizing Network Algorithms by Tridib Mukherjee ASU ID :
Competitive Self-Stabilizing k-Clustering
Distributed Maintenance of Spanning Tree using Labeled Tree Encoding
第1部: 自己安定の緩和 すてふぁん どぅゔぃむ ポスドク パリ第11大学 LRI CNRS あどばいざ: せばすちゃ てぃくそい
New Variants of Self-Stabilization
Alternating Bit Protocol
Distributed Systems, Consensus and Replicated State Machines
Introduction to locality sensitive approach to distributed systems
A Snap-Stabilizing DFS with a Lower Space Requirement
Robust Stabilizing Leader Election
Distributed Computing:
Algorithms for Extracting Timeliness Graphs
Introduction to Self-Stabilization
Snap-Stabilization in Message-Passing Systems
Presentation transcript:

Self-Stabilization: An approach for Fault-Tolerance in Distributed Systems Stéphane Devismes 16/12/2013MAROC'2013

Roadmap Distributed Systems Self-Stabilization Competitive Self-Stabilizing k-Clustering 16/12/2013MAROC'2013

Distributed Systems 16/12/2013MAROC'2013

Distributed Systems Machines ≈ Processes 16/12/2013MAROC'2013

Distributed Systems Machines ≈ Processes Characteristics: – No central control Local programs Local memories 16/12/2013MAROC'2013

Distributed Systems Machines ≈ Processes Characteristics: – No central control Local programs Local memories – Asynchronous – No global time 16/12/2013MAROC'2013

Distributed Systems Machines ≈ Processes Characteristics: – No central control Local programs Local memories – Asynchronous – No global time – Interconnected 16/12/2013MAROC'2013

Distributed Systems Machines ≈ Processes Characteristics: – No central control Local programs Local memories – Asynchronous – No global time – Interconnected Asynchronous & FIFO message-passing 16/12/2013MAROC'2013

Distributed Systems Assumptions – Bidirectional links 16/12/2013MAROC'2013

Distributed Systems Assumptions – Bidirectional links – Unique Ids 16/12/2013MAROC'

Distributed Systems Assumptions – Bidirectional links – Unique Ids – Static connected topology (≈graph) 16/12/2013MAROC'

Distributed Systems Assumptions – Bidirectional links – Unique Ids – Static connected topology (≈graph) – Deterministic machines 16/12/2013MAROC'

Distributed Algorithm 16/12/2013MAROC'2013

Distributed Algorithm Example: Computing a Spanning Tree 16/12/2013MAROC'2013

Distributed Inputs Distributed Algorithm Example: Computing a Spanning Tree 16/12/2013MAROC'2013 Root = false Root = true Root = false

Distributed Inputs Distributed Algorithm Example: Computing a Spanning Tree 16/12/2013MAROC'2013 R R

Distributed Algorithm Example: Computing a Spanning Tree Distributed Inputs Distributed Computations – Local memories – Local programs – Message-passing – Local decision 16/12/2013MAROC'2013 R R

Distributed Algorithm Example: Computing a Spanning Tree Distributed Inputs Distributed Computations – Local memories – Local programs – Message-passing – Local decision Distributed Outputs 16/12/2013MAROC'2013 R R

Distributed Algorithm Example: Computing a Spanning Tree Distributed Inputs Distributed Computations – Local memories – Local programs – Message-passing – Local decision Distributed Outputs Global Task 16/12/2013MAROC'2013 R R

Classical problems Data Exchanges: Routing, Broadcast, PIF, … Agreement: Consensus, Leader Election, Atomic Register, … Self-Organization: Spanning Tree, Clustering Resource Allocation: Mutual Exclusion, L- Exclusion, K-out-of-L-Exclusion… 16/12/2013MAROC'2013

Performance Evaluation #Messages – O(#Processes) Volume (in bits) – Polynomial in #Processes Time Complexity (in rounds) – O(Diameter) Local Space(in bits) – O(Degree) 16/12/2013MAROC'2013 There are efficient solutions for most of the classical problems! … assuming the system is fault-free

Challenges Modern distributed systems are large-scale and made of cheap heterogeneous units, e.g. – Internet (10 billions of connected machines in 2016) Internet of things – Wireless Sensor Networks Message losses due to the radio medium Process crashes due to limited batteries ⇒ High probability of faults ⇒ Human intervention impossible ⇒ Need of Fault-Tolerant Distributed Algorithms 16/12/2013MAROC'2013

Fisher, Lynch, and Paterson, /12/2013MAROC'2013 “The deterministic consensus cannot be solved in a asynchronous distributed system in spite of at most one faulty process” (no information about the fault) Even if – the communications are reliable – The network is fully connected

Consensus 16/12/2013MAROC' Input in {0,1} 1 1 1

Consensus 16/12/2013MAROC' Input in {0,1} Output in {0,1} 1 1 1

Consensus 16/12/2013MAROC' Input in {0,1} Output in {0,1} – Agreement 1 1 1

Consensus 16/12/2013MAROC' Input in {0,1} Output in {0,1} – Agreement – Termination (for all corrects) 1 1 1

Consensus 16/12/2013MAROC' Input in {0,1} Output in {0,1} – Agreement – Termination (for all corrects) – Integrity (1 write) 1 1 1

Consensus 16/12/2013MAROC' Input in {0,1} Output in {0,1} – Agreement – Termination (for all corrects) – Integrity (1 write) – Validity 0 0 0

Consensus 16/12/2013MAROC' Input in {0,1} Output in {0,1} – Agreement – Termination (for all corrects) – Integrity (1 write) – Validity 0 0 0

Consensus 16/12/2013MAROC' Input in {0,1} Output in {0,1} – Agreement – Termination (for all corrects) – Integrity (1 write) – Validity 1 1 1

Consensus 16/12/2013MAROC' Input in {0,1} Output in {0,1} – Agreement – Termination (for all corrects) – Integrity (1 write) – Validity 1 1 1

Strenght of the result Most of the distributed problem can be reduced to the consensus, e.g. – Atomic broadcast – Atomic register – Replicated state machine – … 16/12/2013MAROC'2013

Circumvent the impossibility Relax the hypothesis, e.g., – Initial crash – Partial Synchronous Assumptions – Add information about the failures (failure detectors) Relax the solved problem – Probabilistic consensus – Self-stabilization 16/12/2013MAROC'2013

Self-Stabilization 16/12/2013MAROC'2013

Self-Stabilization Dijkstra, 1974 Versatile technique to tolerate arbitrary transient failures 16/12/2013MAROC'2013

Transient Failures Location: node or link Duration: finite Frequency: low e.g. Node: memory corruption Link: message losses, message corruption, message duplication, message creation, reordering 16/12/2013MAROC'2013

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] R ,1 1,0 0 0,

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] In case of transient faults ? R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] In case of transient faults ? R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] In case of transient faults ? R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] In case of transient faults ? R

16/12/2013MAROC'2013 BFS Spanning Tree [Huang & Chen, 1992] In case of transient faults ? R

16/12/2013MAROC'2013 Definition: Closure + Convergence + Correctness States of the System Illegitimate States Legitimate States Convergence Closure+Correctness

Advantages of Self-Stabilization Tolerate transient faults 16/12/2013MAROC'2013

Advantages of Self-Stabilization Lightweight – Low overhead No initialization – Large-scale network – Self-organization in wireless sensor network Tolerate (detectable) topological changes 16/12/2013MAROC'2013

Advantages of Self-Stabilization Easy to compose: – Collateral Composition A  B A and B runs in parallel B does not write into A variables Example – Compose Spanning tree construction and Node-Counting along a tree 16/12/2013MAROC'2013

Composition Node-Counting 16/12/2013MAROC'2013 0,2 R 2,1 3,4 5,2 0,2 3,8

Composition Node-Counting 16/12/2013MAROC'2013 6,6 R 4,2 6,2 1,4 1,1

Composition Node-Counting 16/12/2013MAROC' , 11 R 2,6 3,6 1,2

Composition Node-Counting 16/12/2013MAROC'2013 6,6 R 2, 11 2, 11 3, 11 1,6

Composition Node-Counting 16/12/2013MAROC'2013 6,6 R 2,6 3,6 1, 11

Composition Node-Counting 16/12/2013MAROC'2013 6,6 R 2,6 3,6 1,6

16/12/2013MAROC'2013 Composition: Spanning Tree + Node Counting 3,1 2,2 4,1 3,1 1,1 R

16/12/2013MAROC'2013 Composition: Spanning Tree + Node Counting 1,1 4,1 2,1 1,1 R

16/12/2013MAROC'2013 Composition: Spanning Tree + Node Counting 4,1 7,7 1,1 2,1 1,1 R

16/12/2013MAROC'2013 Composition: Spanning Tree + Node Counting 4,7 7,7 1,7 2,1 1,7 1,1 R

16/12/2013MAROC'2013 Composition: Spanning Tree + Node Counting 4,7 7,7 1,7 2,7 1,7 1,1 R

16/12/2013MAROC'2013 Composition: Spanning Tree + Node Counting 4,7 7,7 1,7 2,7 1,7 R

Drawbacks of Self-Stabilization Temporary Loss of Safety – Goal: Minimize the stabilization time – Stronger forms of Self-Stabilization Fault-Containment [Ghosh & al, 1996], Superstabilization [Dolev & al, 1997], Safe Convergence [Kakugawa & al, 2002], … No local detection of stabilization – Permanent local checks Overhead 16/12/2013MAROC'2013

Performance Evaluation Time Complexity – Mainly, the Stabilization Time Memory Requirement Overhead (Algo Self /OptAlgo Safe ) Necessary knowledges (Local vs Global) 16/12/2013MAROC'2013

Competitive Self-Stabilizing k-Clustering [Datta, Devismes, Heurtefeux, Larmore, Rivierre, ICDCS’2012] 16/12/2013MAROC'2013

k-Clustering 16/12/2013MAROC'2013

k-Clustering 16/12/2013MAROC'2013

k-Clustering Ex. k=2 16/12/2013MAROC'2013 ≤k≤k

k-Clustering Ex. k=2 16/12/2013MAROC'2013 ≤k≤k

k-Clustering Goal: Minimize the number of clusters Find the optimal k-Clustering of an arbitrary graph is NP-Hard [Garey & Johnson, 1979] Contribution: Self-stabilizing k-Clustering of bounded size 16/12/2013MAROC'2013

Roadmap Solution for tree networks Generalization for arbitrary connect networks Study of special cases: – Unit Disk Graphs (UDG) – Approximate Disk Graphs (ADG) 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

k-Clusterheads Selection: α 16/12/2013MAROC'2013

Sum Up In trees : – O(log n + log k) space – O(n) rounds – #clusterheads: Optimal In arbitrary networks ? 16/12/2013MAROC'2013

Arbitrary Networks 16/12/2013MAROC'2013 O(log n + log k) space O(n) rounds #clusterheads: Not optimal, but bounded Any Spanning Tree Tree k-Clustering e.g., [Huand & Chen, 1992]

Arbitrary Networks 16/12/2013MAROC'2013

In Unit Disk Graph (UDG) ? 16/12/2013MAROC'2013 1

Result in UDG k+0(1)-competitive if An algorithm is X-competitive if it builds a k- clustering of size at most X times the smallest possible number of k-clusters. |Clr| ≤ X.|Min| 16/12/2013MAROC'2013 MIS Tree Tree k-Clustering

MIS Tree 16/12/2013MAROC'2013 Maximal Independent Set

k-clustering vs MIS 16/12/2013MAROC'2013 (|Clr| - 1) k/2 ≤ |MIS| - 1

MIS vs CLR opt Let C be any cluster of CLR opt 16/12/2013MAROC'2013

MIS vs CLR opt Let C be any cluster of CLR opt Let I be any independent set of C 16/12/2013MAROC'2013

MIS vs CLR opt Let C be any cluster of CLR opt Let I be any independent set of C UDG: ∀ p,q ∊ I, d(p,q) > 1 16/12/2013MAROC'2013

MIS vs CLR opt Let C be any cluster of CLR opt Let I be any independent set of C UDG: ∀ p,q ∊ I, d(p,q) > 1 16/12/2013MAROC'2013

MIS vs CLR opt Let C be any cluster of CLR opt Let I be any independent set of C UDG: ∀ p,q ∊ I, d(p,q) > 1 16/12/2013MAROC'2013 k

MIS vs CLR opt Let C be any cluster of CLR opt Let I be any independent set of C UDG: ∀ p,q ∊ I, d(p,q) > 1 16/12/2013MAROC'2013 K

Result 16/12/2013MAROC'2013

In Approximate Disk Graphs 16/12/2013MAROC'2013 7,2552λ 2 k+O(1)-competivity

Conclusion Self-stabilization is funny ! 16/12/2013MAROC'2013

Bibliography Stéphane Devismes, Franck Petit, and Vincent Villain. Autour de l'Auto-stabilisation. Partie I : Techniques généralisant l'approche. Technique et Science Informatiques (TSI), Vol 30(7), pages Stéphane Devismes, Franck Petit, and Vincent Villain. Autour de l'Auto-stabilisation. Partie II : Techniques spécialisant l'approche. Technique et Science Informatiques (TSI), Vol 30(7), pages Ajoy K. Datta, Lawrence L. Larmore, Stéphane Devismes, Karel Heurtefeux, and Yvan Rivierre. Self-Stabilizing Small k-Dominating Sets. International Journal of Networking and Computing, Volume 3, Issue 1, pages Ajoy K. Datta, Stéphane Devismes, Karel Heurtefeux, Lawrence L. Larmore, and Yvan Rivierre. Competitive Self-Stabilizing k-Clustering. In Proceedings of The 32nd International Conference on Distributed Computing Systems (ICDCS'12). Pages , June 18-21, 2012, Macau, China. Ajoy K. Datta, Stéphane Devismes, and Lawrence L. Larmore. A Self-Stabilizing O(n)-Round k- Clustering Algorithm. In Proceedings of SRDS'2009, 28th International Symposium on Reliable Distributed Systems. Pages , September 27-30, 2009, Niagara Falls, New York, USA. 16/12/2013MAROC'2013

Thank you! 16/12/2013MAROC'2013