Outline Why distributed computing? Atomic Broadcast The atom system Relevance for e-textiles What’s next? Q&A.

Slides:

Advertisements

Similar presentations

Impossibility of Distributed Consensus with One Faulty Process

Advertisements

DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.

A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)

Teaser - Introduction to Distributed Computing

Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.

Distributed Systems Overview Ali Ghodsi

Consensus Hao Li.

1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.

Byzantine Generals Problem: Solution using signed messages.

Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:

Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.

Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)

1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )

Distributed Systems Fall 2009 Coordination and agreement, Multicast, and Message ordering.

1 Secure Failure Detection in TrustedPals Felix Freiling University of Mannheim San Sebastian Aachen Mannheim Joint Work with: Marjan Ghajar-Azadanlou.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.

1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.

1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.

Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.

Consensus and Related Problems Béat Hirsbrunner References G. Coulouris, J. Dollimore and T. Kindberg "Distributed Systems: Concepts and Design", Ed. 4,

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms.

Epidemics Michael Ford Simon Krueger 1. IT’S JUST LIKE TELEPHONE! 2.

A Randomized Error Recovery Algorithm for Reliable Multicast Zhen Xiao Ken Birman AT&T Labs – Research Cornell University.

Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.

Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.

Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.

Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.

BFTW 3 workshop (Sep 22, 2009)© 2009 Andreas Haeberlen 1 The Fault Detection Problem Andreas Haeberlen MPI-SWS Petr Kuznetsov TU Berlin / Deutsche Telekom.

CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.

Exercises for Chapter 15: COORDINATION AND AGREEMENT From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley.

SysRép / 2.5A. SchiperEté The consensus problem.

Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.

Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.

Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.

Fault-Tolerant Broadcast Terminology: broadcast(m) a process broadcasts a message to the others deliver(m) a process delivers a message to itself 1.

Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb

PROCESS RESILIENCE By Ravalika Pola. outline: Process Resilience  Design Issues  Failure Masking and Replication  Agreement in Faulty Systems  Failure.

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.

Exercises for Chapter 11: COORDINATION AND AGREEMENT

Coordination and Agreement

8.2. Process resilience Shreyas Karandikar.

Distributed systems Total Order Broadcast

Distributed Systems, Consensus and Replicated State Machines

FLP Impossibility & Weakest Failure Detector

Presented By: Md Amjad Hossain

Active replication for fault tolerance

EEC 688/788 Secure and Dependable Computing

Distributed Systems: Group Communication

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Failure Detectors motivation failure detector properties

Distributed systems Consensus

Distributed Systems Terminating Reliable Broadcast

Presentation transcript:

Outline Why distributed computing? Atomic Broadcast The atom system Relevance for e-textiles What’s next? Q&A

Why Distributed Computing? Spread and balance the computational weight of applications Solve bigger problems Deal with problems locally instead of centralizing all the data

Example Space filtering vs. raw consensus –Acoustic Beam Forming: master collects information from slaves and decides according to the relevance of data –Consensus: no master, all processes decide upon one common value

Atomic Broadcast: Definition (1) Atomic Broadcast = the same set of messages is delivered by all the processes in the same order Consensus = all processes decide upon one common value among those proposed

Atomic Broadcast: Definition (2) Validity: If a correct process broadcasts a message m it will eventually receive it Uniform agreement: If a process delivers a message m then every correct process will deliver it Uniform integrity: Every message m is delivered at most once and only if it was reliably broadcasted by sender(m) Total order: If 2 correct processes p and q deliver 2 messages m and m’ then p delivers m before m’ iff q delivers m before m’

Atomic Broadcast: Bad News Impossibly to achieve in a totally asynchronous system [Fisher, Lynch, Patterson 85]

Atomic Broadcast: Good News Can be done using unreliable failure detectors Based on a Consensus algorithm described in [Chandra, Toueg 96]

Atom Open source Atomic Broadcast system

Atom One_run do_decide do_Consensus AB task 2 AB task 3 AB task1 RB FD trust FD suspect R-broadcast Producer Consumer A-deliver A-broadcast start cancel

Relevance to E-textiles Synchronization of data Coordination of decisions and actions Light-weight process Buffer sizes can be predicted

What’s Next? Scalability is a problem for classic fault- tolerant distributed algorithms Bimodal Multicast [Ken Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu, Yaron Minsky – 1998] –Gossip protocol –Relaxes the “strong” reliability guarantees replacing them with probabilistic guarantees –Converges to “strong” reliability in the absence of failures –Scalable with steady throughput

Questions …