D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.

Slides:



Advertisements
Similar presentations
1 Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create (
Advertisements

Time, Clocks, and the Ordering of Events in a Distributed System
Chapter 12 Message Ordering. Causal Ordering A single message should not be overtaken by a sequence of messages Stronger than FIFO Example of FIFO but.
Global States.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
1 Algorithms and protocols for distributed systems We have defined process groups as having peer or hierarchical structure and have seen that a coordinator.
Time and Clock Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems Spring 2009
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CS 582 / CMPE 481 Distributed Systems Replication.
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Time, Clocks and the Ordering of Events in a Distributed System - by Leslie Lamport.
Clock Synchronization and algorithm
EEC-681/781 Distributed Computing Systems Lecture 10 Wenbing Zhao Cleveland State University.
EEC-681/781 Distributed Computing Systems Lecture 10 Wenbing Zhao Cleveland State University.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Distributed Systems Foundations Lecture 1. Main Characteristics of Distributed Systems Independent processors, sites, processes Message passing No shared.
Synchronization Chapter 6 Part I Clock Synchronization & Logical clocks Part II Mutual Exclusion Part III Election Algorithms Part IV Transactions.
Logical Clocks (2). Topics r Logical clocks r Totally-Ordered Multicasting r Vector timestamps.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
Distributed Mutex EE324 Lecture 11.
Logical Clocks n event ordering, happened-before relation (review) n logical clocks conditions n scalar clocks condition implementation limitation n vector.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Page 1 Logical Clocks Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Logical Clocks. Topics Logical clocks Totally-Ordered Multicasting Vector timestamps.
Lamport’s Logical Clocks & Totally Ordered Multicasting.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Event Ordering Greg Bilodeau CS 5204 November 3, 2009.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport Massachusetts Computer Associates,Inc. Presented by Xiaofeng Xiao.
Lecture 10 – Mutual Exclusion Distributed Systems.
Distributed Coordination. Turing Award r The Turing Award is recognized as the Nobel Prize of computing r Earlier this term the 2013 Turing Award went.
CIS825 Lecture 2. Model Processors Communication medium.
9/14/20051 Time, Clocks, and the Ordering of Events in a Distributed System by L. Lamport CS 5204 Operating Systems Vladimir Glina Fall 2005.
Logical Clocks. Topics Logical clocks Totally-Ordered Multicasting Vector timestamps.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Feb 15, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
11-Jun-16CSE 542: Operating Systems1 Distributed systems Time, clocks, and the ordering of events in a distributed system Leslie Lamport. Communications.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Distributed Mutex EE324 Lecture 11.
Replication and Consistency
Replication and Consistency
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Reliable Multicast --- 2
Replication and Consistency
Presentation transcript:

D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

A Classic Paper Time, Clocks, and the Ordering of Events in Distributed Systems Leslie Lamport, CACM, July 1978 Introduced: – logical clocks (“Lamport clocks”) – state-machine replication model – Causal multicast – physical clocks that respect causality – (Almost but not) vector clocks

Concurrency and time A B C C What do these words mean? after? last? subsequent? eventually?

Same world, different timelines Which of these happened first? A B W(x)=v R(x) Message send Message receive “Event e1a wrote W(x)=v” e1a e1b e2 e3be4 e3a e1a is concurrent with e1b e3a is concurrent with e3b This is a partial order of events.

Concurrency and time Premise: nodes communicate only by messages. Nodes can observe a remote event only through some chain of messages. – Message patterns define the observability of events. Event e 1 can affect or cause event e 2 iff the node initiating e 2 could have already observed e 1. – Message patterns define a potential causality relation. – Also known as happened-before (  ). Event e 1 precedes e 2 iff e 1 could have caused e 2. – We can view the potential causality relation as logical time. Events are concurrent if neither precedes the other.

Axiom 1: happened-before (  ) C A B C 1.If e1, e2 are in the same process/node, and e1 comes before e2, then e1  e2. - Also called program order Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

Axiom 2: happened-before (  ) C A B C 2. If e1 is a message send, and e2 is the corresponding receive, then e1  e2. Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

Axiom 3: happened-before (  ) C A B C 3.  is transitive happened-before is the transitive closure of the relation defined by #1 and #2 Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

Potential causality: example A B C A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 A1 < B2 < C2 B3 < A3 C2 < A4

Logical clocks We need clocks that reflect logical time. Start with logical clocks: 1. Each node maintains a monotonically increasing logical clock value LC. 2. Events are timestamped with the current LC value at the generating node. – Increment LC on each new event: LC = LC Piggyback current clock value on all messages. – On receive, advance receiver’s LC to sender’s LC. – If LC s > LC then LC = LC s + 1

The Clock Condition e 1  e 2 implies that LC(e 1 ) < LC(e 2 ) LC ordering respects potential causality. Is the converse also true? i.e., Does LC(e 1 ) < LC(e 2 ) imply that e 1  e 2 ? What if e 1 and e 2 are concurrent?

Logical clocks: example A B C C5: LC update advances receiver’s clock if it is “running slow” relative to sender. A6-A10: receiver’s clock is unaffected because it is “running fast” relative to sender.

Potential causality A SS C W(x)=v R(x) v? OK “Try it now” “Event e1a wrote W(x)=v” If nodes observe events in an order that violates causality, they may perceive an inconsistency.

Causal consistency A SS C W(x)=v R(x) 0???? This ordering violates causal consistency. “Try it now” “Event e1a wrote W(x)=v”

Same world, unified timelines? A B W(x)=v R(x) e1b e2 e4 e3a This is a total order of events. Also called a sequential schedule. It is consistent with the partial order induced by happened-before (causal order). External witness e1a e5 X e3b

Same world, unified timelines? A B W(x)=v R(x) e1b e2 e3b e4 e3a Here is another total order of the same events. Like the last one, it is consistent with the partial order: it does not change any existing orderings; it only assigns orderings to events that are concurrent in the partial order. External witness e1a X X

Replicated State Machines, revisited Replicas are “consistent” if all replicas apply the same (deterministic) updates in the same total order. Suppose a client can read from any replica while writes are propagating. Does the order matter? op A op B op A op B op B op A

Challenge: A Decentralized Mutex Implement a distributed mutex, respecting these conditions: 1.Resource holder must release before algorithm grants the resource to another Process; 2.Different acquire requests must be granted “in the order in which they are made”; 3.If every holder eventually releases, then every request is eventually granted. Note: ordering is arbitrary for concurrent requests.

Definition of a lock (mutex) Acquire + release ops on L are strictly paired. – After acquire completes, the caller holds (owns) the lock L until the matching release. Acquire + release pairs on each L are ordered. – Total order: each lock L has at most one holder. – That property is mutual exclusion; L is a mutex.

Lamport’s Mutex Problem The trick here is to implement a mutex service in a decentralized way, without a central lock server. Nodes/processes exchange messages and execute requests locally in the same order. – Replicated State Machine Lamport’s condition II says the order must be causal. – (why?).

Lamport’s Proposed Solution Process group of N processes/nodes/peers with unique IDs. Each process is a state machine: – Two operations: request (acquire) and release – Internal state per-process: request queue Basic idea: – Requester sends request to all peers (multicast) – Peers have some means to order the requests: receive code may defer or reorder messages before delivery. – All peers agree on the same order of delivery. – Therefore, all peers agree on the sequence of acquisitions.

Causal Multicast (“cbcast”) 1.Assume FIFO delivery at transport: messages sent by P are received by others in their send order (program order). 2.Sender timestamps each messages with its logical clock. 3.Receiver acknowledges a received request with a message back to the sender, timestamped with logical clock of receiver (ack sender). Propagates knowledge of peer timestamp values. 4.The logical clocks induce a partial order on the messages. To make it a total order: if two messages have the same timestamp, use the unique process ID to break the tie. This total order is “arbitrary”, but it respects potential causality. 5.A node delivers a request R to the local application when R is stable: it has the earliest timestamp T of all undelivered requests, and there can be no preceding request still in the network (e.g., the node has received a later-timestamped message from every peer).

The Lamport Mutex Algorithm 1.Use causal multicast. 2.Processes cache the acquire requests they have received on their request queue, in timestamp order. Including their own acquire requests 3.If a release message is received, remove the corresponding request from the queue. 4.Take the lock when your request is at the front of the queue. This request is next, and it is stable.

Enhanced causal multicast: Isis Later refinements to causal multicast (“cbcast”) improved concurrency (e.g., in Birman’s Isis group system). – Lamport’s approach delivers messages in a total order. – Lamport’s use of logical clocks imposes an ordering in some instances where causality does not require it. – Isis advanced cbcast (causal broadcast): deliver concurrent messages in any order; never delay your own messages to self; use various batching optimizations for message bursts. – How to know if messages are concurrent? Lamport’s state-machine service doesn’t handle failures: addressed in later work (e.g., Isis again). – Requires failure detection, uniform atomicity, and views.