UPV - EHU An Evaluation of Communication-Optimal P Algorithms Mikel Larrea Iratxe Soraluze Roberto Cortiñas Alberto Lafuente Department of Computer Architecture.

Slides:

Advertisements

Similar presentations

Costas Busch Louisiana State University CCW08. Becomes an issue when designing algorithms The output of the algorithms may affect the energy efficiency.

Advertisements

Impossibility of Distributed Consensus with One Faulty Process

DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.

Teaser - Introduction to Distributed Computing

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.

Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.

IMPOSSIBILITY OF CONSENSUS Ken Birman Fall Consensus… a classic problem  Consensus abstraction underlies many distributed systems and protocols.

Distributed Systems Overview Ali Ghodsi

An evaluation of ring-based algorithms for the Eventually Perfect failure detector class Joachim Wieland Mikel Larrea Alberto Lafuente The University of.

Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.

Introduction to Self-Stabilization Stéphane Devismes.

1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.

Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001.

UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,

Byzantine Generals Problem: Solution using signed messages.

Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:

UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.

Lab 2 Group Communication Andreas Larsson

Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.

Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )

CS 582 / CMPE 481 Distributed Systems Communications.

1 Secure Failure Detection in TrustedPals Felix Freiling University of Mannheim San Sebastian Aachen Mannheim Joint Work with: Marjan Ghajar-Azadanlou.

Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.

UPV / EHU Brief Announcement: An Efficient Failure Detector for Omission Environments R. Cortiñas, I. Soraluze, A. Lafuente, M. Larrea University of the.

1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.

An Energy Efficient Leaser Election Algorithm for Mobile Ad Hoc Networks Paolo Cemim, Vinicius De Antoni Instituto de Informatica Universidade Federal.

1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.

Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.

Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.

Distributed Systems Terminating Reliable Broadcast Prof R. Guerraoui Distributed Programming Laboratory.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

Composition Model and its code. bound:=bound+1.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms.

Total Order Broadcast and Multicast Algorithms: Taxonomy and Survey (Paper by X. Défago, A. Schiper, and P. Urbán) ACM computing Surveys, Vol. 36,No 4,

Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Data Link Layer Part I – Designing Issues and Elementary.

Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.

Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.

Farnaz Moradi Based on slides by Andreas Larsson 2013.

Approximation of δ-Timeliness Carole Delporte-Gallet, LIAFA UMR 7089, Paris VII Stéphane Devismes, VERIMAG UMR 5104, Grenoble I Hugues Fauconnier, LIAFA.

Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.

Exercises for Chapter 15: COORDINATION AND AGREEMENT From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley.

SysRép / 2.5A. SchiperEté The consensus problem.

1 Eventual Leader Election in Evolving Mobile Networks Luciana Arantes 1, Fabiola Greve 2, Véronique Simon 1, and Pierre Sens 1 1 Université de Paris 6.

1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.

Lecture 10: Coordination and Agreement (Chap 12) Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.

1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.

Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.

Fault-Tolerant Broadcast Terminology: broadcast(m) a process broadcasts a message to the others deliver(m) a process delivers a message to itself 1.

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.

Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.

Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.

ACM Transactions on Information and System Security, November 2001

Distributed systems Reliable Broadcast

Distributed Algorithms for Failure Detection in Crash Environments

Algorithms for Extracting Timeliness Graphs

Introduction to Self-Stabilization

Failure Detectors motivation failure detector properties

Distributed systems Consensus

Broadcasting with failures

Presentation transcript:

UPV - EHU An Evaluation of Communication-Optimal P Algorithms Mikel Larrea Iratxe Soraluze Roberto Cortiñas Alberto Lafuente Department of Computer Architecture and Technology The University of the Basque Country

UPV - EHU 2 PDP 2008 − Toulouse, France, February 13-15, 2008 Contents Motivation System Model Communication Optimality The Algorithms Complexity Analysis Performance Evaluation Conclusion

UPV - EHU 3 PDP 2008 − Toulouse, France, February 13-15, 2008 Motivation Unreliable failure detectors have been used to address Consensus and related problems in asynchronous crash-prone distributed systems –Theory: impossibility/possibility results, minimality results –Practice: efficient implementations and transformations The P class satisfies the following properties –Strong Completeness: eventually every process that crashes is permanently suspected by every correct process –Eventual Strong Accuracy: there is a time after which correct processes are not suspected by any correct process

UPV - EHU 4 PDP 2008 − Toulouse, France, February 13-15, 2008 Communication efficiency –Number of links used forever –Periodic communication cost Non communication-efficient algorithms Communication-efficient algorithms Communication-optimal algorithms Motivation: Implementing P Efficiently ring-based } Sporadic communication overhead –Number of messages to manage a suspicion Quality of Service –Query accuracy probability –Crash detection latency

UPV - EHU 5 PDP 2008 − Toulouse, France, February 13-15, 2008 System Model Finite set of n processes  = {p 1, p 2,..., p n } that communicate only by message-passing Every pair of processes is connected by two unidirectional and reliable communication links, one in each direction Processes can fail by crashing. Once a process crashes, it does not recover –Up to n-1 processes may crash –C is the (unknown) number of correct processes Processes are arranged in a logical ring Partially synchronous system

UPV - EHU 6 PDP 2008 − Toulouse, France, February 13-15, 2008 p1p1 p3p3 p4p4 p6p6 p5p5 p2p2 Communication Optimality A ring arrangement of processes

UPV - EHU 7 PDP 2008 − Toulouse, France, February 13-15, 2008 p1p1 p3p3 p4p4 p6p6 p5p5 p2p2 Communication Optimality Communication-efficient algorithms: n links are used forever

UPV - EHU 8 PDP 2008 − Toulouse, France, February 13-15, 2008 p1p1 p3p3 p4p4 p6p6 p5p5 p2p2 Communication Optimality Communication-optimal algorithms: C links are used forever

UPV - EHU 9 PDP 2008 − Toulouse, France, February 13-15, 2008 The Algorithms We have implemented several ring-based communication-optimal P algorithms Algorithms are based on reporting failure suspicions (and suspicion refutations) Three communication patterns –Algorithm 1: based on Reliable Broadcast RBcast is a communication primitive guaranteeing that all correct processes deliver the same set of messages. This set includes at least all messages broadcast by correct processes –Algorithm 2: based on one-to-one communication –Algorithm 3: based on one-to-all communication

UPV - EHU 10 PDP 2008 − Toulouse, France, February 13-15, 2008 The Algorithms Algorithm 1: RBcast-based p1p1 p3p3 p4p4 p6p6 p5p5 p2p2 O(n 2 ) messages required to communicate a suspicion Low crash detection latency

UPV - EHU 11 PDP 2008 − Toulouse, France, February 13-15, 2008 The Algorithms Algorithm 2: one-to-one based p1p1 p3p3 p4p4 p6p6 p5p5 p2p2 Suspected 1 = {p 3, p 5, p 6 } O(n) messages required to communicate a suspicion High crash detection latency

UPV - EHU 12 PDP 2008 − Toulouse, France, February 13-15, 2008 The Algorithms Algorithm 3: one-to-all based p1p1 p3p3 p4p4 p6p6 p5p5 p2p2 Suspected 1 = {p 3, p 5, p 6 } O(n) messages required to communicate a suspicion Low crash detection latency

UPV - EHU 13 PDP 2008 − Toulouse, France, February 13-15, 2008 Complexity Analysis

UPV - EHU 14 PDP 2008 − Toulouse, France, February 13-15, 2008 Performance Evaluation: Query Accuracy

UPV - EHU 15 PDP 2008 − Toulouse, France, February 13-15, 2008 Performance Evaluation: Crash Detection Latency

UPV - EHU 16 PDP 2008 − Toulouse, France, February 13-15, 2008 Conclusion We have presented several communication- optimal algorithms implementing P Which to use: Algorithm 2 or Algorithm 3? Best choice: hybrid approach –Initially (erroneous suspicions), use Algorithm 2 –When the ring has probably stabilized, switch to Algorithm 3 –Issues What about crashes during stabilization? How do we know (guess) that the system has stabilized?

UPV - EHU 17 PDP 2008 − Toulouse, France, February 13-15, 2008 ? Questions

UPV - EHU 18 PDP 2008 − Toulouse, France, February 13-15, 2008 Future Work Current scenario: –Local area network settings –Uniform communication delays –1-to-all communication supported easily (Ethernet, WiFi) Future scenario: –Wide area network settings –Non-uniform communication delays –1-to-all communication not supported –Local communication patterns required For periodic messages (heartbeats)  ring For sporadic messages (suspicions and refutations)