Replace all references to “process” by referring to node – or say the synonimity expilictly at the beginning.

Slides:



Advertisements
Similar presentations
Synchronization.
Advertisements

Time and Clock Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Time and Clock Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.
Distributed Systems Spring 2009
Time in Embedded and Real Time Systems Lecture #6 David Andrews
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Teaching material based on Distributed Systems: Concepts and Design, Edition 3, Addison-Wesley Copyright © George Coulouris, Jean Dollimore, Tim.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Time and Global States Chapter 11. Why time? Time is an Important and interesting issue in distributes systems. One we can measure accurately. Can use.
EEC-681/781 Distributed Computing Systems Lecture 10 Wenbing Zhao Cleveland State University.
EEC-681/781 Distributed Computing Systems Lecture 10 Wenbing Zhao Cleveland State University.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Composition Model and its code. bound:=bound+1.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Lecture 9: Time & Clocks CDK4: Sections 11.1 – 11.4 CDK5: Sections 14.1 – 14.4 TVS: Sections 6.1 – 6.2 Topics: Synchronization Logical time (Lamport) Vector.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Distributed Systems Foundations Lecture 1. Main Characteristics of Distributed Systems Independent processors, sites, processes Message passing No shared.
Lecture 2-1 CS 425/ECE 428 Distributed Systems Lecture 2 Time & Synchronization Reading: Klara Nahrstedt.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Naming Name distribution: use hierarchies DNS X.500 and LDAP.
Page 1 Logical Clocks Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is.
Issues with Clocks. Context The tree correction protocol was based on the idea of local detection and correction. Protocols of this type are complex to.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Time This powerpoint presentation has been adapted from: 1) sApr20.ppt.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Real-Time & MultiMedia Lab Synchronization Distributed System Jin-Seung,KIM.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Distributed Systems Topic 5: Time, Coordination and Agreement
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
Hwajung Lee. Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.
Distributed Web Systems Time and Global State Lecturer Department University.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Proof of liveness: an example
Logical time Causality between events is fundamental to the design of parallel and distributed systems. In distributed systems, it is not possible to have.
Prof. Leonardo Mostarda University of Camerino
Distributed Computing
Time and Clock Primary standard = rotation of earth
Overview of Ordering and Logical Time
Time and Clock Primary standard = rotation of earth
EECS 498 Introduction to Distributed Systems Fall 2017
Time and Clock.
Logical time (Lamport)
湖南大学-信息科学与工程学院-计算机与科学系
COT 5611 Operating Systems Design Principles Spring 2012
Distributed Systems CS
Time and Clock.
Active replication for fault tolerance
Replace all references to “process” by referring to node – or say the synonimity expilictly at the beginning.
PERSPECTIVES ON THE CAP THEOREM
Time And Global Clocks CMPT 431.
EEC 688/788 Secure and Dependable Computing
CS 425 / ECE 428  2013, I. Gupta, K. Nahrtstedt, S. Mitra, N. Vaidya, M. T. Harandi, J. Hou.
ITEC452 Distributed Computing Lecture 10 Time in a Distributed System
Distributed Systems CS
Chapter 5 (through section 5.4)
CDK: Sections 11.1 – 11.4 TVS: Sections 6.1 – 6.2
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Logical time (Lamport)
Logical time (Lamport)
Chap 5 Distributed Coordination
Logical time (Lamport)
Proof of liveness: an example
COT 5611 Operating Systems Design Principles Spring 2014
Last Class: Naming Name distribution: use hierarchies DNS
Outline Theoretical Foundations
Distributed Systems and Algorithms
Presentation transcript:

Replace all references to “process” by referring to node – or say the synonimity expilictly at the beginning.

Reading material Course book: Chapter 6.8 and 9.1 of Burns & Wellings (not language specific parts) E-books at LiU library: See web page links

This lecture Overview of some basic notions in timing and how distributed systems are affected by them Time, clock synchronisation, order of events, and logical clocks ...

Applications Banking systems On-line access & electronic services Peer-to-Peer networks Distributed control Cars, Airplanes Sensor and ad hoc networks Buildings, Environment Grid computing

Common in all these? Distributed model of computing: Multiple processes Disjoint address spaces Inter-process communication Collective goal

Synchrony vs. asynchrony Model for distributed computations depends on the rate at which computations are done at each node (process) the expected delay for transmission of messages Synchronous: There is a bound on message delays, and the rates of computation at different processes can be related Asynchronous: No bounds on message delays and no known relation among the processing speeds at different nodes

The choice Which model is harder to use? What it means to be hard or easy to use? How do implementations of real systems relate to the various models?

Implications Synchronous: Asynchronous: Local clocks can be used to implement timeouts Lack of response from another node can be interpreted as detection of failure Asynchronous: In the absence of global (synchronised) time the only system wide abstraction of time is order of events Rita en interaktionsdiagram med två olika ankomstorder för två meddelanden och säg hur mottagren kan bestämma sig att behandla meddelandena.

Reasons for distribution Locality Engine control, brake system, gearbox control, airbag,… Organisation An extension of modularisation, and means for fault containment Load sharing Web services, search, parallelisation of heavy duty computations

Local control Simplistic view: It is all about data: each local controller can perform its computations properly if data it needs is accessed locally Design modules with high cohesion and low interaction! But when data needs to be shared, how do we ensure that nodes have fresh data and act in concert with other nodes? Local control (global guarantees) Justifying that the control applications do not make systems unsafe Exemplify with JAS seven control surfaces

Organisation and containment Simplistic view: If module interactions are well-defined they do not affect each other even if things go wrong But fault tolerance is a much harder problem in distributed systems, and timing has a big role in it More on this in dependability lecture

Sharing the load Simplistic view: Guarantee that a node can deal with what it accepts Spread the load so that tasks are (globally) serviced in a best effort manner But communication and cooperation overheads affect the global distributed service Load sharing Justifying that a (global) service is available even in presence of local load variations Need to exchange data on load and resources (How fresh is the data?) Refer back to the bidding and focussed addressing algorithms

Common issues Time: Sharing data may require knowledge of local time at the generating node, and comparison with the time at the consuming node State: Sometimes nodes need to agree on a common state/value in order to achieve a globally correct behaviour Faults in the system affect both

Major requirements In distributed systems: Interoperability Transparency Scalability Dependability This course focuses on dependability: fault tolerance and timing related issues

Brake-by-wire

Contributing to safety Redundancy: Having distributed sensors and actuators makes brake control more fault-tolerant Central decision: what if one node gets the signal incorrectly or late? Distributed decision: what if one node is acting differently? central decision or distributed decision?

Time in Distributed Systems The role of time in distributed systems Logical time vs. physical time Clock synchronisation algorithms Vector clocks

Time matters… Inaccurate local clocks can be a problem if the result of computations at different nodes depend on time Calculation of trajectories: if a missile was at a given point of time before a computation where will it be after the computation? If the break signal is issued separately in different wheels will the car stop, and when?

Banking and finance The rate of interest is applied to funds at a given point in time to a balance that reflects related transactions prior to that point The gain/loss on sales of stocks is dependent on dynamic values of stocks at a given time (the time of sale/purchase) Anecdote: Microsoft patches do not work if the computer clock is out of synch with a “real” clock

Local vs. global clock Most physical (local) clocks are not always accurate What is meant by accurate? Agreement with UTC Coordinated Universal Time (UTC) is in turn coordinated to adjust for the variations in the rotation of earth to agree with International Atomic Time (IAT) Local clocks need to be synchronised regularly An atomic global clock accurately measures IAT If local clocks are synchronised with an (accurate) global clock we may be able to use a synchronous model in the application According to IAT a second is defined as 9,192,631,770 orbital transitions of the Cesium 133 atom. The upkeep of IAT is done using an average of 300 atomic clocks in the world. Another source of accurate measure of time is GPS. But since it is based on measured distance to satellites and since satellites move relative to earth, the calculated time based on distance to GPS satellites need an adjustment of 38 msec per day. Thus a GPS-based signal needs to be regularly offset in order to agree with UTC.

Clock synchronisation Two types of algorithms: Internal synchronisation Tries to keep a set of clock values close to each other with a maximum skew of δ External synchronisation Tries to keep the values of a set of clocks agree with an accurate clock, with a skew of δ

Lamport/Melliar-Smith Algorithm Internal synchronisation of n clocks Each clock reads the value of all other clocks at regular intervals If the value of some clock drifts from the own clock by more than δ, that clock value is replaced by own clock value The average of all clocks is computed Own clock value is updated to the average value From 1985. Draw picture on the board!

Does it work? After each synchronisation interval the clocks get closer to each other If the drifts are within δ, and the clocks are initially synchronised then they are kept within δ from each other But what if some clocks give faulty values? By faulty I mean acting really strangely (other than drifting, drifting by even more than delta is “normal”).

Faulty clocks If a clock drifts by more than δ its value is eliminated – does not “harm” other clocks What if it drifts by exactly δ? check it as an exercise! What is the worst case?

A two-face faulty clock k c-2d c-d c+d i j k Will be considered as correct by i and j…

Bound on the faulty clocks To guarantee that the set will keep δ we need an assumption on the number of faulty clocks For t faulty clocks the algorithm works if the number of clocks n >3t Maximum drift for t faulty clocks is t(3delta). 3t.delta/n should be < delta -> n>3t

Logical time Sometimes order will do In the absence of exact synchronisation we may use order that is intrinsic in an application Client A Client B ReqA RepA Server ReqB

Logical clocks Based on event counts at each node May reflect causality Sending a message always precedes receiving it Messages sent in a sequence by one node are (potentially) causally related to each other I do not pay for an item if I do not first check the item’s availability

Happened before~~~~ Assume each process has a monotonically increasing physical clock Rule 1: if the time for event a is before the time for event b then a b Rule 2: if a denotes sending a message and b denotes receiving the same message then a b Rule 3: is transitive

A partial order Any events that are not in the “happened before” relation are treated as concurrent Logical clock: An event counter that respects the “happened before” ordering Sometimes referred to as Lamport’s clocks (author of first paper in this topic: 1978)

What do we know here? P g a Q b c h e R f d a is before b, g is before h, etc A is before g, e is before b and therefore before c and therefore before h, etc We know nothing about order between (a,e), (b,f), (g,c), (g,b), … g and e?? R f d

Implementing a logical clock LC1: Each time a local event takes place increment LC by 1 LC2: Each time a message m is sent the LC value at the sender is appended to the message (m_LC) LC3: Each time a message m is received set LC to max(LC, m_LC)+1

Exercise Calculate LC for all events in the given example

What does LC tell us? a b → LC(a) < LC(b) Note that: LC(d) < LC(h) does not imply d h Vector clocks gives the property we are looking for: VC(a) < VC(b) iff a happened before b. Independently discovered by Fidge (1988) and Matttern (1989).

Is concurrency transitive? e is concurrent with g g is concurrent with f but e is not concurrent with f! Vector clocks bring more... e is indeed concurrent with g since a not-before g and g not-before a g is also concurrent with f since e not-before f and f not-before g. But is NOT concurrent with f! Thus not transitive!

Vector clocks (VC) Every node maintains a vector of counted events (one entry for each other node) VC for event e, VC(e) = [1,…,n], shows the perceived count of events at nodes 1,…,n VC(e)[k] denotes the entry for node k “Perceived count” when event e happened

Example revisited P Q R a g b c h e f d Vector clocks gives the property we are looking for: VC(a) < VC(b) iff a happened before b. Independently discovered by Fidge (1988) and Mattern (1989).

Implementation of VC Rule 1: For each local event increment own entry Rule 2: When sending message m, append to m the VC(send(m)) as a timestamp T Rule 3: When receiving a message at node i, increment own entry: VC[i]:= VC[i]+1 For every entry j in the VC: Set the entry to max (T[j], VC[j])

Example [1,1,0] [2,1,0] [0,0,0] [2,2,4] [0,0,0] [0,1,0] [0,0,1] [0,0,2] [0,0,0] [2,1,3] [2,1,4]

Concurrent events in VC Relation < on vector clocks defined by: VC(a) < VC(b) iff For all i: VC(a)[i] ≤ VC(b)[i] For some i: VC(a)[i] < VC(b)[i] An event a precedes another event b if VC(a) < VC(b) If neither VC(a) < VC(b) nor VC(b) < VC(a) then a and b are concurrent VC(a) < VC(b) read as “dominated by”. Go back to previous slide and ask for two concurrent events! For example: (2,1,0) and (0,0,2)

Pros and cons Vector clocks are a simple means of capturing “known” precedence VC(a) < VC(b) → a b For large systems we have resource issues (bandwidth wasted), and maintainability issues Exercise: How about the other direction? VC(a) < VC(b) → a “happened before” b ? YES!

Vector clocks help to synchronise at event level Consistent snapshots But reasoning about response times and fault tolerance needs quantitative bounds

Distribution & Fault tolerance Distribution introduces new complications no global clock richer failure models Replication and group mechanisms transparency in treatment of faults We will come back to faults in lecture 6, and see that synchronisation is needed for tolerating some faults