CS514: Intermediate Course in Operating Systems Professor Ken Birman Krzys Ostrowski: TA.

Slides:



Advertisements
Similar presentations
Global States.
Advertisements

Logical Clocks.
1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.
Distributed Systems 2006 Overcoming Failures in a Distributed System * *With material adapted from Ken Birman.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Ken Birman Cornell University. CS5410 Fall
Distributed Systems Spring 2009
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Distributed Systems Overview
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
Logical Clocks Ken Birman. Time: A major issue in distributed systems  We tend to casually use temporal concepts  Example: “p suspects that q has failed”
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Distributed Systems: Time and Mutual Exclusion. 2 Distributed Systems Definition: Loosely coupled processors interconnected by network Distributed system.
Ken Birman Cornell University. CS5410 Fall
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Reliable Distributed Systems More on Group Communication.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.
CS5412: REPLICATION, CONSISTENCY AND CLOCKS Ken Birman 1 CS5412 Spring 2014 (Cloud Computing: Birman) Lecture X.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 6 Synchronization.
Logical Clocks n event ordering, happened-before relation (review) n logical clocks conditions n scalar clocks condition implementation limitation n vector.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.
Logical Clocks. Topics Logical clocks Totally-Ordered Multicasting Vector timestamps.
Lamport’s Logical Clocks & Totally Ordered Multicasting.
CSC 536 Lecture 1. Outline Synchronization Clock synchronization Logical clocks Lamport timestamps Vector timestamps Global states and distributed snapshot.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Distributed Systems Fall 2010 Logical time, global states, and debugging.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Coordination. Turing Award r The Turing Award is recognized as the Nobel Prize of computing r Earlier this term the 2013 Turing Award went.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
CIS825 Lecture 2. Model Processors Communication medium.
Consensus and leader election Landon Cox February 6, 2015.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Logical Clocks. Topics Logical clocks Totally-Ordered Multicasting Vector timestamps.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Building Dependable Distributed Systems, Copyright Wenbing Zhao
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
The consensus problem in distributed systems
EEC 688/788 Secure and Dependable Computing
Chapter 5 (through section 5.4)
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
CS514: Intermediate Course in Operating Systems
COT 5611 Operating Systems Design Principles Spring 2014
Presentation transcript:

CS514: Intermediate Course in Operating Systems Professor Ken Birman Krzys Ostrowski: TA

Recap… Consistent cuts On Monday we saw that simply gathering the state of a system isn’t enough Often the “state” includes tricky relationships Consistent cuts are a way of collecting state that “could” have arisen concurrently in real-time

What time is it? In distributed system we need practical ways to deal with time E.g. we may need to agree that update A occurred before update B Or offer a “lease” on a resource that expires at time 10: Or guarantee that a time critical event will reach all interested parties within 100ms

But what does time “mean”? Time on a global clock? E.g. with GPS receiver … or on a machine’s local clock But was it set accurately? And could it drift, e.g. run fast or slow? What about faults, like stuck bits? … or could try to agree on time

Lamport’s approach Leslie Lamport suggested that we should reduce time to its basics Time lets a system ask “Which came first: event A or event B?” In effect: time is a means of labeling events so that… If A happened before B, TIME(A) < TIME(B) If TIME(A) < TIME(B), A happened before B

Drawing time-line pictures: p m snd p (m) q rcv q (m) deliv q (m) D

Drawing time-line pictures: A, B, C and D are “events”. Could be anything meaningful to the application So are snd(m) and rcv(m) and deliv(m) What ordering claims are meaningful? p m A C B snd p (m) q rcv q (m) deliv q (m) D

Drawing time-line pictures: A happens before B, and C before D “Local ordering” at a single process Write and p q m A C B rcv q (m) deliv q (m) snd p (m) D

Drawing time-line pictures: snd p (m) also happens before rcv q (m) “Distributed ordering” introduced by a message Write p q m A C B rcv q (m) deliv q (m) snd p (m) D

Drawing time-line pictures: A happens before D Transitivity: A happens before snd p (m), which happens before rcv q (m), which happens before D p q m D A C B rcv q (m) deliv q (m) snd p (m)

Drawing time-line pictures: B and D are concurrent Looks like B happens first, but D has no way to know. No information flowed… p q m D A C B rcv q (m) deliv q (m) snd p (m)

Happens before “relation” We’ll say that “A happens before B”, written A  B, if 1.A  P B according to the local ordering, or 2.A is a snd and B is a rcv and A  M B, or 3.A and B are related under the transitive closure of rules (1) and (2) So far, this is just a mathematical notation, not a “systems tool”

Logical clocks A simple tool that can capture parts of the happens before relation First version: uses just a single integer Designed for big (64-bit or more) counters Each process p maintains LT p, a local counter A message m will carry LT m

Rules for managing logical clocks When an event happens at a process p it increments LT p. Any event that matters to p Normally, also snd and rcv events (since we want receive to occur “after” the matching send) When p sends m, set LT m = LT p When q receives m, set LT q = max(LT q, LT m )+1

Time-line with LT annotations LT(A) = 1, LT(snd p (m)) = 2, LT(m) = 2 LT(rcv q (m))=max(1,2)+1=3, etc… p q m D A C B rcv q (m) deliv q (m) snd p (m) LT q LT p

Logical clocks If A happens before B, A  B, then LT(A)<LT(B) But converse might not be true: If LT(A)<LT(B) can’t be sure that A  B This is because processes that don’t communicate still assign timestamps and hence events will “seem” to have an order

Can we do better? One option is to use vector clocks Here we treat timestamps as a list One counter for each process Rules for managing vector times differ from what did with logical clocks

Vector clocks Clock is a vector: e.g. VT(A)=[1, 0] We’ll just assign p index 0 and q index 1 Vector clocks require either agreement on the numbering, or that the actual process id’s be included with the vector Rules for managing vector clock When event happens at p, increment VT p [index p ] Normally, also increment for snd and rcv events When sending a message, set VT(m)=VT p When receiving, set VT q =max(VT q, VT(m))

Time-line with VT annotations p q m D A C B rcv q (m) deliv q (m) snd p (m) VT q VT p VT(m)=[2,0] Could also be [1,0] if we decide not to increment the clock on a snd event. Decision depends on how the timestamps will be used.

Rules for comparison of VTs We’ll say that VT A ≤ VT B if  I, VT A [i] ≤ VT B [i] And we’ll say that VT A < VT B if VT A ≤ VT B but VT A ≠ VT B That is, for some i, VT A [i] < VT B [i] Examples? [2,4] ≤ [2,4] [1,3] < [7,3] [1,3] is “incomparable” to [3,1]

Time-line with VT annotations VT(A)=[1,0]. VT(D)=[2,4]. So VT(A)<VT(D) VT(B)=[3,0]. So VT(B) and VT(D) are incomparable p q m D A C B rcv q (m) deliv q (m) snd p (m) VT q VT p VT(m)=[2,0]

Vector time and happens before If A  B, then VT(A)<VT(B) Write a chain of events from A to B Step by step the vector clocks get larger If VT(A)<VT(B) then A  B Two cases: if A and B both happen at same process p, trivial If A happens at p and B at q, can trace the path back by which q “learned” VT A [p] Otherwise A and B happened concurrently

Consistent cuts If we had time, we could revisit these using logical and vector clocks In fact there are algorithms that find a consistent cut by Implementing some form of clock Asking everyone to record their state at time now+  (for some large  ) And this can be made to work well…

Replication Another use of time arises when we talk about replicating data in distributed systems The reason is that: We replicate data by multicasting updates over a set of replicas They need to apply these updates in the same order And order is a temporal notion

… and replication is powerful! Replicate data or a service for high availability Replicate data so that group members can share loads and improve scalability Replicate locking or synchronization state Replicate membership information in a data center so that we can route requests Replicate management information or parameters to tune performance

Let’s look at time vis-à-vis updates Maybe logical notions of time can help us understand when one update comes before another update Then we can think about building replicated update algorithms that are optimized to run as fast as possible while preserving the needed ordering

Drill down: Life of a group p q r s t Q does an update. It needs to reach the three members u0u0 u1u1 Initial group membership is {r, s, t} Now s goes offline for a while. Maybe it crashed Here, two pdates occur concurrently: r sees u 0 followed by u 1, but t sees u 1 first, then u 0. Process s is still offline; it sees neither. This illustrates two issues: update order and group membership If p tries to “reliably” multicast to s, it won’t get an ack and will wait indefinitely. But how can p be sure that s has failed? If p is wrong, s will be missing an update! Now s is back online and presumably should receive the update q is sending. Here we see the update ordering issue again in a “pure” form. Which came first, p’s update or the one from q?

Questions to ask about order Who should receive an update? What update ordering to use? How expensive is the ordering property?

Questions to ask about order Delivery order for concurrent updates Issue is more subtle than it looks! We can fix a system-wide order, but… Sometimes nobody notices out of order delivery System-wide ordering is expensive If we care about speed we may need to look closely at cost of ordering

Ordering example System replicates variables x, y Process p sends “x = x/2” Process q sends “x = 83” Process r sends “y = 17” Process s sends “z = x/y” To what degree is ordering needed?

Ordering example x=x/2 x=83 These clearly “conflict” If we execute x=x/2 first, then x=83, x will have value 83. In opposite order, x is left equal to 41.5

Ordering example x=x/2 y=17 These don’t seem to conflict After the fact, nobody can tell what order they were performed in

Ordering example z=x/y This conflicts with updates to x, updates to y and with other updates to z

Commutativity We say that operations “commute” if the final effect on some system is the same even if the order of those operations is swapped In general, a system worried about ordering concurrent events need not worry if the events commute

Single updater In many systems, there is only one process that can update a given type of data For example, the variable might be “sensor values” for a temperature sensor Only the process monitoring the sensor does updates, although perhaps many processes want to read the data and we replicate it to exploit parallelism Here the only “ordering” that matters is the FIFO ordering of the updates emitted by that process

Single updater If p is the only update source, the need is a bit like the TCP “fifo” ordering p r s t

Mutual exclusion Another important case we’ll study closely Arises in systems that use locks to control access to shared data This is very common, for example in “transactional” systems (we’ll discuss them next week) Very often without locks, a system rapidly becomes corrupted

Mutual exclusion Suppose that before performing conflicting operations, processes must lock the variables This means that there will never be any true concurrency And it simplifies our ordering requirement

Mutual exclusion Dark blue when holding the lock How is this case similar to “FIFO” with one sender? How does it differ? p q r s t

Mutual exclusion Are these updates in “FIFO” order? No, the sender isn’t always the same But yes in the sense that there is a unique path through the system (corresponding to the lock) and the updates are ordered along that path Here updates are ordered by Lamport’s happened before relation: 

Types of ordering we’ve seen Deliver updates in an order matching the FIFO order in which they were sent Deliver updates in an order matching the  order in which they were sent For conflicting concurrent updates, pick an order and use that order at all replicas Deliver an update to all members of a group according to “membership view” determined by ordering updates wrt view changes cheapest More costly Most costly Still cheap

Types of ordering we’ve seen Deliver updates in an order matching the FIFO order in which they were sent Deliver updates in an order matching the  order in which they were sent For conflicting concurrent updates, pick an order and use that order at all replicas Deliver an update to all members of a group according to “membership view” determined by ordering updates wrt view changes fbcast abcast gbcast cbcast

Recommended readings In the textbook, we’re at the beginning of Part III (Chapter 14) We’ll build up the “virtual synchrony” replication model in the next lecture and see how it can be built with 2PC, 3PC, consistent cuts and ordering