Timed Distributed System Models  A. Mok 2014 CS 386C.

Slides:



Advertisements
Similar presentations
Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
Logical Clocks (2).
1 CS 194: Distributed Systems Process resilience, Reliable Group Communication Scott Shenker and Ion Stoica Computer Science Division Department of Electrical.
CS 542: Topics in Distributed Systems Diganta Goswami.
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,
6.852: Distributed Algorithms Spring, 2008 Class 7.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
CIS 720 Concurrency Control. Timestamp-based concurrency control Assign a timestamp ts(T) to each transaction T. Each data item x has two timestamps:
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture IX: Coordination And Agreement.
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
Teaching material based on Distributed Systems: Concepts and Design, Edition 3, Addison-Wesley Copyright © George Coulouris, Jean Dollimore, Tim.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Josef WidderBooting Clock Synchronization1 The  - Model, and how to Boot Clock Synchronization in it Josef Widder Embedded Computing Systems Group
Synchronization in Distributed Systems. Mutual Exclusion To read or update shared data, a process should enter a critical region to ensure mutual exclusion.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
Cloud Computing Concepts
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Composition Model and its code. bound:=bound+1.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Distributed Systems Foundations Lecture 1. Main Characteristics of Distributed Systems Independent processors, sites, processes Message passing No shared.
Logical Clocks (2). Topics r Logical clocks r Totally-Ordered Multicasting r Vector timestamps.
Distributed Systems – CS425/CSE424/ECE428 – Fall Nikita Borisov — UIUC1.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Andreas Larsson, Philippas Tsigas SIROCCO Self-stabilizing (k,r)-Clustering in Clock Rate-limited Systems.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
Lecture 4: Sun: 23/4/1435 Distributed Operating Systems Lecturer/ Kawther Abas CS- 492 : Distributed system & Parallel Processing.
1 Distributed Process Management Chapter Distributed Global States Operating system cannot know the current state of all process in the distributed.
Consensus with Partial Synchrony Kevin Schaffer Chapter 25 from “Distributed Algorithms” by Nancy A. Lynch.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Distributed Systems Topic 5: Time, Coordination and Agreement
1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Fault Tolerance (2). Topics r Reliable Group Communication.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
Coordination and Agreement
When Is Agreement Possible
Sarah Diesburg Operating Systems COP 4610
COT 5611 Operating Systems Design Principles Spring 2012
Alternating Bit Protocol
Agreement Protocols CS60002: Distributed Systems
CSE 486/586 Distributed Systems Leader Election
Andy Wang Operating Systems COP 4610 / CGS 5765
CS 425 / ECE 428  2013, I. Gupta, K. Nahrtstedt, S. Mitra, N. Vaidya, M. T. Harandi, J. Hou.
Understanding Fault-Tolerant Distributed Systems  A. Mok 2018
CIS 720 Concurrency Control.
COT 5611 Operating Systems Design Principles Spring 2014
Presentation transcript:

Timed Distributed System Models  A. Mok 2014 CS 386C

System Attributes 1) Synchrony 1) Synchronous 2) Asynchronous 3) Mixed 2) Time source 1) Global clock / Local clocks 2) Real-time / Logical Time 3) Failure semantics 4) Communication framework 1) Message passing / Shared memory 2) Latency / Buffering 3) Addressing and routing

The Two Generals Problem Two honest Generals, X and Y want to coordinate an attack by passing messages The message passing system has performance failure semantics At midnight, X sends a message to inform Y that he wants to attack at a certain time, say at dawn Y replies with a “Agreed” message Y’s reply may be late (after dawn), lost or even early (before midnight – assuming arbitrary failure semantics). What can the two generals do to assure a simultaneous attack?

Two Generals with Unsynchronized Clocks At midnight, X sends a message to inform Y that he wants to attack at dawn. Y replies with a “Agreed” message Y’s reply may be late (after dawn) or lost If Y’s reply is late, X may not attack at dawn Even if X gets a reply before dawn, Y does not know if X has gotten his reply before dawn Even worse, if X and Y’s clocks are unsynchronized, they cannot attack at the same time at dawn So let us at least give them synchronized clocks

Two Generals with Synchronized Clocks At midnight, X sends a message to inform Y that he wants to attack at dawn. Y replies with a “Agreed” message Y’s reply may be late (after dawn) or lost or can it even be early (before midnight)? If Y’s reply is late, X may not attack at dawn Even if X gets Y’s reply before dawn, Y does not know if X has gotten his reply before dawn So Y waits for a confirmation from X. Should Y attack after getting a confirmation from X before dawn? But then Y will not know if X knows that Y has received the confirmation … We are back to square one

Two Generals with Synchronized Clocks and Message Counter (1) To keep track of the messages, let us introduce a message counter: [count] initialized to 1 At midnight, X sends a message [1]”Attack at Dawn?” to inform Y that he wants to attack at dawn. Y replies with a [2]”Agreed” message Y’s reply may be late (after dawn), lost or even early (before midnight). If Y’s reply is late, X may not attack at dawn. Even if X gets a reply before dawn, Y does not know if X has gotten his reply before dawn. But if X gets Y’s reply, X knows that Y must have gotten the [1]”Attack at Dawn?”. X replies to Y’s [2]”Agreed” message with [3]”Confirmed” What happens now?

Two Generals with Synchronized Clocks and Message Counter (2) At midnight, X sends a message [1]”Attack at Dawn?” to inform Y that he wants to attack at dawn. Y replies with a [2]“Agreed” message. If X gets Y’s reply, X knows that Y must have gotten the [1]“Attack at Dawn?”. X then replies to Y’s [2]”Agreed” message with [3]“Confirmed” message. If Y gets [3]“Confirmed” from X, Y knows X has gotten his [2]“Agreed” message. So Y knows that X has agreed to attack. But Y does not know that X knows that Y agrees to attack. Does this matter? It does matter if Y will not attack unless Y can confirm to X X’s [3]“Confirmed” message. This message exchange pattern could go on forever

What Do the Generals Know? If X gets message [2], both X and Y know that both messages [1] and [2] have been sent, and [1] has been received. But only X knows that [2] has been received. If Y gets message [3], both X and Y know that messages [1], [2] and [3] have been sent, and [1] and [2] have been received. But only Y knows that [3] has been sent and received. Messages exchanged so far [1] “Attack at Dawn?” Both know message [1] has been sent and received [2] “Agreed” Both know message [2] has been sent and received [3] “Confirmed” Both know message [3] has been sent but only Y knows it has been received What is the least knowledge that both X and Y must know before they can attack with full confidence of the other general’s support?

Minimum Sufficient Mutual Knowledge A message is mutual knowledge if both X and Y know that both parties know the contents of the message What is the minimum mutual knowledge that is sufficient for both X and Y to decide that they can attack with full confidence of the other general’s support? Both X and Y must know that they are acting on the same set of messages M and that they will make the same decision based on the contents of M Suppose X and Y have a pre-agreement that both will act on messages [1] and [2] after they are sure that messages [1] and [2] are mutual knowledge. Then X and Y can both attack at dawn with full confidence! But they do not have such a pre-agreement. So the message exchange will go on.

Attaining Mutual Knowledge If enough messages have been received, both parties can determine from the message count that for some n, both parties must know that at least the first n messages have been received by both parties, i.e., the first n messages is mutual knowledge. Mutual knowledge attained after message 4 has been sent and received: X→Y: [1] “Attack at Dawn?” Y→X: [2] “Agreed” X→Y: [3] “Confirmed” Y→X: [4] “Reconfirmed” After Y receives message [3], Y knows that X must have received message [2] which implies that Y knows that X has sent and therefore must know message [1]. Likewise, after X receives message [2], X knows Y has sent message [2], and so X knows that Y must know message [1]. Message [1] is now mutual knowledge.

Probabilistic Guarantee In the absence of any pre-agreement, how many messages must be sent to attain high confidence that both generals will attack simultaneously if every message can be lost with a given probability p? Eventually, both generals might attain sufficient mutual confidence to attack but then they might not in any given finite time. This is determined by the probabilistic assumption about message loss and the number of duplicate messages that can be sent before the deadline for taking joint action (before dawn when the attack is supposed to occur). An optimization is to resend a confirmation only if an expected confirmation does not arrive. This way, absence of message would be evidence of consensus. Use sufficient long time-outs to ensure high confidence.

Timed Asynchronous System Model Assumptions 1) Network topology Every process is known to every other process Communication is by messages. Automated routing assumed 2) Synchrony Service times have known upper bounds Local clocks have bounded drift with known rates 3) Failure semantics Processes have crash or performance failures Message delivery has omission or performance failures 4) Message buffering Finite message buffers. Buffer overflow does not block sender FIFO message delivery not assumed