Distributed Programming for Dummies A Shifting Transformation Technique Carole Delporte-Hallet, Hugues Fauconnier, Rachid Guerraoui, Bastian Pochon.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Byzantine Generals. Outline r Byzantine generals problem.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Gossip and its application Presented by Anna Kaplun.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
Byzantine Generals Problem: Solution using signed messages.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.
The Byzantine Generals Problem (M. Pease, R. Shostak, and L. Lamport) January 2011 Presentation by Avishay Tal.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform)
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Teaching material based on Distributed Systems: Concepts and Design, Edition 3, Addison-Wesley Copyright © George Coulouris, Jean Dollimore, Tim.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
Last Class: Weak Consistency
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
1 © P. Kouznetsov A Note on Set Agreement with Omission Failures Rachid Guerraoui, Petr Kouznetsov, Bastian Pochon Distributed Programming Laboratory Swiss.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
12. Recovery Study Meeting M1 Yuuki Horita 2004/5/14.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Reaching Agreement in the Presence of Faults M. Pease, R. Shotak and L. Lamport Sanjana Patel Dec 3, 2003.
SysRép / 2.5A. SchiperEté The consensus problem.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Randomized Algorithms for Distributed Agreement Problems Peter Robinson.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
1 Fault Tolerance and Recovery Mostly taken from
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
When Is Agreement Possible
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Agreement Protocols CS60002: Distributed Systems
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Fault-Tolerant State Machine Replication
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Abstractions for Fault Tolerance
Distributed systems Consensus
Presentation transcript:

Distributed Programming for Dummies A Shifting Transformation Technique Carole Delporte-Hallet, Hugues Fauconnier, Rachid Guerraoui, Bastian Pochon

Agenda Motivation Failure patterns Interactive Consistency problem Transformation algorithm Performance Conclusions

Motivation Distributed programming is not easy

Motivation Provide programming abstractions Hide low level detail Allow working on a strong model Give weaker models automatically

Models Distributed programming semantics and failure patterns

Processes We have n distributed processes All processes are directly linked Synchronized world In each round, each process: 1. Receive an external input value 2. Send a message to all processes 3. Receive all messages sent to it 4. Local computation and state change

PSR Perfectly Synchronized Round-based model Processes can only have atomic failures They are only allowed to crash/stop They can only crash if they are not in the middle of sending out a message

Crash Processes can only have crash failures They are only allowed to crash/stop They can also crash in the middle of sending out a message A message might be sent only to several other processes upon a crash

Omission Processes can have crash failures Processes can have send-omission failures They can send out a message to only a subset of processes in a given round

General Processes can have crash failures Processes can have general-omission failures They can fail to send or receive a message to or from a subset of processes in a given round

Failure models PSR(n,t) Crash(n,t) Omission(n,t) General(n,t) We’d like to write protocols for PSR and run them in weaker failure models

Interactive Consistency An agreement algorithm

Interactive Consistency Synchronous world We have n processors Each has a private value We want all of the “good” processors to know the vector of all values of all the “good” processors Let’s assume that faulty processors can only lie about their own value (or omit messages)

IC Algorithm a d b c A

IC Algorithm: 1 st step a d b c Each client sends “my value is p” message to all clients BCDBCD

IC Algorithm: 2 nd step a d b c Each client sends “x told my that y has the value of z; y told me that …” B, B(c), B(d) C, C(b), C(d) D, D(b), D(c)

IC Algorithm: i th step a d b c B, B(c), B(d), B(c(d)), … C, C(b), C(d), … D, D(b), D(c) Each client sends “x told my that y told me that z has the value of q; y told me that …”

IC Algorithm: and faults? When a processor omits a message, we just assume NIL as his value Example: NIL(b(d)) “d said nothing about b’s value”

IC Algorithm: deciding Looking at all the “rumors” that a knows about the private value of b We choose the rumor value if a single one exists or NIL otherwise If b is non-faulty, then we have B or NIL as its results If b is faulty, then a and c will have the same value for it (single one or NIL result)

IC Algorithm We need k+1 rounds for k faulty processes We’re sending out a lot of messages

PSR Synchronous model We are not going to do anything with this Performance Automatically transforming a protocol from PSR to a weaker model is costly We are going to deal only with time

Why? IC costs t+1 rounds PSR of K rounds costs K(t+1) rounds Optimizations of IC can do 2 rounds for failure-free runs Now we get to K rounds in 2K+f rounds for actual f failures We would like to get K+C rounds

Transformation Algorithm

The algorithm If a process realizes it is faulty in any way – it simulates a crash in PSR We run IC algorithms in parallel, starting one in each round for each PSR round There can be several IC algorithms running in parallel at the same time Each process runs the algorithm of all processes to reconstruct the failure patterns

The algorithm for phase r do input:= receiveInput() start IC instance r with input execute one round for all pending IC instances for each decided IC do update states, decision vector and failures list modify received message by faulty statuses simulate state transition for all processes

Knowledge algorithm Each process sends only its input value The protocol executed on all other processes is known to him He can execute the protocols of other processes by knowing their input values only

Extension No knowledge of other processes’ protocols We now send out both the input and the message we would normally send out This is done before we really know our own state  we are running several rounds in parallel

One small problem… Since we don’t know our state, how can we continue to the next round? We send out extended set of states All of the states we might come across in our next few rounds of computation Compute the future in all of them and optimize as we get more messages

State of the process Until now, the input values did not depend on the state of the process For a finite set of inputs, we can again use the same technique for an extended set of inputs

Performance Not real…

Number of rounds We need K+f phases Result for the first IC round takes f+2 phases All of our rounds are at a 1-phase interval

Size of messages For the simple algorithm suggested: nlog 2 |Input|per process, per round, per IC n  log 2 |Input|per process, per phase  - the number of phases needed to decide an IC

Size of messages For the extended transformation: 2 n  possible states in a phase A coded state takes  =2log 2 |State|+(n+1)log 2 |Message| Message size is  n  2 n  Long…

Conclusions

Summary We showed how to translate PSR into 3 different weaker models We can try doing the same for the Byzantine model