Presented By: Md Amjad Hossain

Slides:

Advertisements

Similar presentations

Impossibility of Distributed Consensus with One Faulty Process

Advertisements

Teaser - Introduction to Distributed Computing

Distributed Systems Overview Ali Ghodsi

P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.

Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.

1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.

Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001.

Byzantine Generals Problem: Solution using signed messages.

Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:

Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )

CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.

1 Secure Failure Detection in TrustedPals Felix Freiling University of Mannheim San Sebastian Aachen Mannheim Joint Work with: Marjan Ghajar-Azadanlou.

Outline Why distributed computing? Atomic Broadcast The atom system Relevance for e-textiles What’s next? Q&A.

1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.

An Energy Efficient Leaser Election Algorithm for Mobile Ad Hoc Networks Paolo Cemim, Vinicius De Antoni Instituto de Informatica Universidade Federal.

Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.

Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.

Composition Model and its code. bound:=bound+1.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms.

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.

1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.

BFTW 3 workshop (Sep 22, 2009)© 2009 Andreas Haeberlen 1 The Fault Detection Problem Andreas Haeberlen MPI-SWS Petr Kuznetsov TU Berlin / Deutsche Telekom.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Comparison of Tarry’s Algorithm and Awerbuch’s Algorithm CS 6/73201 Advanced Operating System Presentation by: Sanjitkumar Patel.

SysRép / 2.5A. SchiperEté The consensus problem.

Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.

1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.

Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.

Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,

1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.

Behavior of Byzantine Algorithm Chun Zhang. Index Introduction Experimental Setup Behavior Observation Result Analysis Conclusion Future Work.

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.

Coordination and Agreement

The consensus problem in distributed systems

Distributed Systems – Paxos

The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Alternating Bit Protocol

Agreement Protocols CS60002: Distributed Systems

Distributed Systems, Consensus and Replicated State Machines

Performance Comparison of Tarry and Awerbuch Algorithms

Ho-Ramammorthy 2 phase snapshot algorithm PRESENTATION

Raymond Exclusive Algorithm

PERSPECTIVES ON THE CAP THEOREM

EEC 688/788 Secure and Dependable Computing

Sungho Kang Yonsei University

EEC 688/788 Secure and Dependable Computing

FLP Impossibility of Consensus

Distributed Algorithms for Failure Detection in Crash Environments

Algorithms for Extracting Timeliness Graphs

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Ho-Ramamoorthy 2-Phase Deadlock Detection Algorithm

The SMART Way to Migrate Replicated Stateful Services

EEC 688/788 Secure and Dependable Computing

Failure Detectors motivation failure detector properties

Corona Robust Low Atomicity Peer-To-Peer Systems

Distributed systems Consensus

Presentation transcript:

Presented By: Md Amjad Hossain Performance analysis of Chandra and Toueg's first consensus algorithm with S failure detector Presented By: Md Amjad Hossain 12/6/2018 Advanced Operating Systems

Advanced Operating Systems Outline Introduction Overview of the Algorithm Experimental setup Experimental Results Conclusion and Future Work Code Defense 12/6/2018 Advanced Operating Systems

Advanced Operating Systems Introduction Consensus problem: Each correct process proposes a value All correct processes decide some values which are proposed by some of them. System Model Asynchronous: To decide, each process waits to receive all others processes’ values. But some of them may be crashed. No bound on message delay for asynchronous system. So, need failure detector that can suspect processes as crashed. Failure Detector S: Strong Completeness: Eventually every process that crashes is permanently suspected by every correct process Weak Accuracy: Some correct process is never suspected 12/6/2018 Advanced Operating Systems

Overview of the Algorithm - tolerates up to n-1 crashes, - Satisfies uniform agreement three phases Weak accuracy – some correct processes never suspected 12/6/2018 Advanced Operating Systems

Overview of the Algorithm - Never suspected process = 1 - Completely connected topology Process ID =1 Dp = { 3 } Vp = {L, L, L, L} Decide = Process ID =1 Dp = { 3 } Vp = {L, 0, L, L} Decide = In Each round: - Send new values to all - Wait to receive values from all correct processes. (L,0,L,L) Process ID =0 Dp = { 2 , 3 } Vp = {L, L, L, L} Decide = Process ID =0 Dp = { 2 , 3 } Vp = {2, L, L, L} Decide = Process ID =0 Dp = { 2 , 3 } Vp = {2, 0, L, L} Decide = (2,L,L,L) Process ID =3 Dp = { 0, 2 } Vp = {L, L, L, L} Decide = Process ID =3 Dp = { 0, 2 } Vp = {L, L, L, 2} Decide = (2,L,L,L) (2,L,L,L) Process ID =2 Dp = { 0 , 3} Vp = {L, L, L, L} Decide = Process ID =2 Dp = { 0 , 3} Vp = {L, L, 3, L} Decide = Process 0 send values to all and waits to receive value only from Process 1 - Receive value from 1, move to next round 12/6/2018 Advanced Operating Systems

Advanced Operating Systems Experimental Setup In Detector module - Dp is populated randomly. - Size Dp is considered randomly as well as N/2. Topology : Completely connected graph Three parameters of the algorithm are mainly measured For both size of Dp, average number of round needed to receive all process’s values for different number of Processes in the system. Ran for N = 4, 8, 12, … 60. For each round, the percentage of processes that get values from all other processes. Here N = 8 and 60 is considered. Average number of waiting time for each process in phase1 for N =8 and 60. Waiting time is the number of times a presses is selected to execute but has to wait to receive all processes’ values who are not in its Dp. 12/6/2018 Advanced Operating Systems

Advanced Operating Systems Experimental Results 1) Average number of Round needed to receive all processes’ values a) Random sized Dp b) N/2 sized Dp. Dp size 30 in this case If each process’s Dp size is closed to N, Suppose N-1, or N-2 , then every process has to go through all round to see if it is possible to receive any value via the only correct process. 12/6/2018 Advanced Operating Systems

Advanced Operating Systems Experimental Results Average Size of the set Dp of Crashed processes For large value of N, Dp size is slightly larger than N/2. So More round may be needed. 12/6/2018 Advanced Operating Systems

Advanced Operating Systems Experimental Results 2) Percentage of Processes per round that received all others’ values - As the topology completely connected graph. More nodes in the system more ways to get other nodes’ values. 12/6/2018 Advanced Operating Systems

Advanced Operating Systems Experimental Results 3) Average waiting time for Processes in a round at Phase 1. This is a measurement of how well the selection of processes for execution is distributed - Waiting time is the number of times a presses is selected to execute but has to wait to receive all processes’ values who are not in its Dp. - Processes are selected randomly to execute actions. The waiting time also depends on the size of Dp for the processes. 12/6/2018 Advanced Operating Systems

Conclusion and Future Work Providing a balanced Dp size ( close to N/2) for all processes in the system it is possible to obtain all other processes’ value with significantly smallest number of rounds. Almost <=3 for N >=8. A process simply can exit from phase1 just when it receives all other processes’ values. So, this can be a simple improvement of the algorithm As a future work one can consider to implement this improved algorithm. As I have considered only completely connected topology, One can consider arbitrary topology to see how the current and the improved algorithm behaves. 12/6/2018 Advanced Operating Systems

Advanced Operating Systems Code Defense - I have implemented the algorithm in c++. The implementation was straight forward. Here some points where I faced little bit difficulty to implement: Generating crashed process in Dp randomly for random size. How to keep a process waiting in Phase 1 and Phase 2 – Solved by maintaining a boolean variable “waiting”. To run the algorithm for different value of N in same Run of the program to collect experimental data. – Solved Externally deleting all processes’ input and output channels Click here to See Code 12/6/2018 Advanced Operating Systems

Advanced Operating Systems References Tushar Deepak Chandra and Sam Toueg. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM 43, 2 (March 1996), 225-267. DOI=10.1145/226643.226647 http://doi.acm.org/10.1145/226643.226647 Slide “Failure detectors T16 ” at “http://vega.cs.kent.edu/~mikhail/classes/aos.f13/” 12/6/2018 Advanced Operating Systems

Advanced Operating Systems Questions? 12/6/2018 Advanced Operating Systems