1 Clock Synchronization Ronilda Lacson, MD, SM. 2 Introduction Accurate reliable time is necessary for financial and legal transactions, transportation.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

Byzantine Generals. Outline r Byzantine generals problem.
6.852: Distributed Algorithms Spring, 2008 Class 7.
The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Distributed Algorithms A1 Presented by: Anna Bendersky.
Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.
Time and Clock Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.
Byzantine Generals Problem: Solution using signed messages.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Byzantine Generals Problem Anthony Soo Kaim Ryan Chu Stephen Wu.
The Byzantine Generals Problem (M. Pease, R. Shostak, and L. Lamport) January 2011 Presentation by Avishay Tal.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 13: Clocks1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Teaching material based on Distributed Systems: Concepts and Design, Edition 3, Addison-Wesley Copyright © George Coulouris, Jean Dollimore, Tim.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
Clock Synchronization Ken Birman. Why do clock synchronization?  Time-based computations on multiple machines Applications that measure elapsed time.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Josef WidderBooting Clock Synchronization1 The  - Model, and how to Boot Clock Synchronization in it Josef Widder Embedded Computing Systems Group
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
Reaching Approximate Agreement in an Asynchronous Environment And what does it have to do with the Witness Protection Program.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Composition Model and its code. bound:=bound+1.
Time Supriya Vadlamani. Asynchrony v/s Synchrony Last class: – Asynchrony Event based Lamport’s Logical clocks Today: – Synchrony Use real world clocks.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Add Fault Tolerance – order & time Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport Optimal Clock Synchronization T.K. Srikanth.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
Parallel and Distributed Simulation Synchronizing Wallclock Time.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
CS603 Clock Synchronization February 4, What is the best we can do? Lundelius and Lynch ‘84 Assumptions: –No failures –No drift –Fully connected.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Time This powerpoint presentation has been adapted from: 1) sApr20.ppt.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Physical clock synchronization Question 1. Why is physical clock synchronization important? Question 2. With the price of atomic clocks or GPS coming down,
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Reaching Agreement in the Presence of Faults M. Pease, R. Shotak and L. Lamport Sanjana Patel Dec 3, 2003.
SysRép / 2.5A. SchiperEté The consensus problem.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Hwajung Lee. Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
Distributed Agreement. Agreement Problems High-level goal: Processes in a distributed system reach agreement on a value Numerous problems can be cast.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
Proof of liveness: an example
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Alternating Bit Protocol
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Distributed Consensus
Maya Haridasan April 15th
Byzantine Faults definition and problem statement impossibility
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Physical clock synchronization
Presentation transcript:

1 Clock Synchronization Ronilda Lacson, MD, SM

2 Introduction Accurate reliable time is necessary for financial and legal transactions, transportation and distribution systems and many other applications involving distributed resources For distributed internet applications, accuracy and reliability of a clock device is required A room temperature quartz oscillator may drift as much as a second per day

3 Topics of Discussion Definitions Lower bound on how closely clocks can be synchronized, even where clocks drift and with arbitrary faults – algorithm that shows this bound is tight 2 more algorithms : interactive convergence and interactive consistency algorithms Lower bound on the number of processes for f failures

4 Definitions A hardware clock is a mechanism that provides time information to a processor In a timed execution involving process p i, a hardware clock can be modeled as an increasing function HC i At real time t, HC i (t) is available as part of p i ’s transition function, but p i cannot change HC i HC i (t) = t

5 What is clock synchronization? Clock synchronization requires processes to bring their clocks close together by using communication between them

6 More Definitions The adjusted clock of a process p i AC(t) i is a function of the hardware clock HC(t) i and a variable adj i During the synchronization process, p i can change the value of adj i and thus change the value of AC(t) i  -synchronized clocks refer to achieving |AC(t) i -AC(t) j |   for all processes p i and p j after the algorithm terminates at time t f for all t  t f

7 Model HC 1 adj 1 AC 1 p 1 HC 2 adj 2 AC 2 p 2 HC n adj n AC n p n send/receive channels …

8 Lower Bound on  For every algorithm that achieves  - synchronized clocks,  is at least  (1- 1/n) where  is the uncertainty in the message delay

9 Algorithm Code for process p i Beginstep(u) Send HC i to all q  p Do forever if u=message V from process q then DIFF := V +  - HC i SUM := SUM + DIFF RESPONSES := RESPONSES + 1 endif if RESPONSES = n-1 then exit endif Endstep Beginstep(u) Enddo adj i := adj i + SUM/n Endstep

10 Assumptions No faulty processes No drift in the clock rates, thus the difference between the physical clocks of any 2 processes is a well-defined constant HC gives an accurate local time

11 Correctness Any admissible execution e of the algorithm synchronizes to within  where  =  (1-1/n) This can be rewritten as  = (2(  /2)+(n-2)  )/n

12 Key step D pq = estimated difference between the physical clocks of p and q as estimated by q  pq = the actual difference between the physical clocks of p and q Show |AC p (t)-AC q (t)|   (1-1/n) |AC p (t)-AC q (t)| = |(HC p (t) + adj p ) – (HC q (t) + adj q )| = (1/n)|  ((  rq -  rp ) – (D rq – D rp ))|  (1/n)  |((  rq -  rp ) – (D rq – D rp ))|  (1/n) (2  /2 + (n-2)  ) =  (1-1/n)

13 | D pq -  pq |  /2 = |C p (t) +  - C q (t’) -  pq | = |C q (t) +  pq +  - C q (t’) -  pq | = |  + C q (t) - C q (t’)| = |  - (t’-t)|   /2 Since  -  /2  (t’-t)   +  /2

14 Validity Another key property worth noting is  - validity. For any process p, there exists processes q and r such that HC q (t)-   AC p (t)  HC r (t)+  The algorithm is  /2-valid

15 Fault-Tolerant Clock Synchronization The problem is still keeping real-time clocks synchronized in a distributed system when processes may fail In addition, consider the case where hardware clocks are subject to drift. Thus, adjusted clocks may drift apart as time elapses and periodic resynchronization is necessary

16 More definitions Bounded drift : For all times t1 and t2, t2>t1, there exists a positive constant  (the drift) such that (1+  ) -1 (t2-t1)  HC i (t2) – HC i (t1)  (1+  )(t2-t1) A hardware clock stays within a linear envelope of the real time Clock-agreement : There exists a constant  such that in every admissible timed execution, for all times t and all non-faulty processes p i and p j, |AC i (t) – AC j (t)|   Clock-validity : There exists a positive constant  such that in every admissible timed execution, for all times t and all non- faulty processor p i, (1+  ) -1 (HC i (t)–HC i (0) )  AC i (t) – AC i (0)  (1+  )(HC i (t)–HC i (0))

17 Ratio of Faulty Processes There is no algorithm that satisfies clock agreement and clock validity if n  3f.

18 Byzantine Clock Synchronization Interactive convergence algorithm Interactive consistency algorithm

19 Algorithm CON Each process reads the value of every process’s clock and sets its own clock to the average of these values – except that if it reads a clock value differing from its own by more than , then it replaces that value by its own clock’s value when forming the average.

20 Assumptions n>3f Clocks are initially synchronized and they are synchronized often enough so that no 2 non- faulty clocks differ by more than  The error in reading other process’s clocks are not taken into account. The algorithm is asynchronous but it assumes immediate access to other process’s clocks. The algorithm does not guarantee clock- validity.

21 More Assumptions Since clocks do not really read all other process’s clocks at exactly the same time, they record the difference between another clock’s value and its own. When a process p reads process q’s clock c q, it calculates the difference between c q and the value of its own clock at the same time c p, where  qp =c q -c p. When computing the average, it takes  qp =  qp if |  qp | , 0 otherwise By taking the average of the n values  qp and adding it to its own clock value one gets the Adjusted Clock AC p

22 Legend Є = maximum error in reading the clock difference  qp  = maximum error in the rates at which the clocks run R = length of time between resynchronizations f = number of faulty processes  = (6f+2) є + (3f+1)  R = maximum difference between 2 non-faulty clocks = degree of synchronization maintained by this algorithm

23 How the clocks are synchronized  qp =c q -c p Let p and q be 2 non-faulty processes. If another process r is non-faulty, c pr =c qr, where c pr and c qr are the values used by processes p and q for r’s clock when computing the average. If r is faulty, then c pr and c qr will differ by at most 3 . c pr lies within  of p’s value, c qr lies within  of q’s value, and p and q lie within  of each other. Thus, the averages computed by p and q will differ by at most 3  (f)/n. Since n>3f, this value is less than . With repeated synchronizations, it appears that each one brings the clocks closer by a factor of 3f/n.

24 Algorithm COM(m) Instead of taking an average, this algorithm takes the median of all process’s clock values. The median will be approximately the same if the 2 conditions below hold: 1. Any 2 non-faulty processes obtain approximately the same value for any process r’s clock, even if r is faulty, and 2. If r is non-faulty, then every non-faulty process obtains approximately the correct value of r’s clock. If majority of the processes are non-faulty, this median would be approximately equal to the value of a good clock.

25 This reminds us of …

26 Algorithm OM(1) Process r sends its value to every other process, which in turn relays the value to the 2 remaining processes. Each process receives 3 copies of this value. The value obtained by a process is the median of these 3 copies.

27 Analysis 2 cases: 1. r is non-faulty 2. r is faulty

28 Modifications for COM(1) Instead of sending numbers, send the value of each process’s clock. The intermediate processes then send the difference between r’s clock and its own to the 2 other processes.

29 Next Modification Instead of having one leader r, apply the algorithm OM(1) 4 times, one for each process. This gives a process an estimate of every other process’s clock value, which is what we wanted. Take the median and this should be one’s adjusted clock value.

30 Algorithm OM(f), f>0 Algorithm OM(0) 1. The commander sends his value to every lieutenant. 2. Each lieutenant uses the value he receives from the commander, or RETREAT if he receives no value. Algorithm OM(f) 1. The commander sends his value to every lieutenant. 2. For each i, let v i be the value lieutenant i receives from the commander, or RETREAT if he receives no value. Lieutenant i acts as commander in algorithm OM(f-1) to send the value v i to each of the n-2 other lieutenants. 3. For each i, and each j  i, let v j be the value lieutenant i received from j in step 2, else RETREAT if he received no such value. Lieutenant i uses the value majority(v 1, …, v n-1 ).

31 Final Modification Modify OM(f) into COM(f) similar to the way we modified OM(1) into COM(1). This has the same assumptions as Algorithm CON. However, Algorithm COM keeps the clocks synchronized to within approximately (6f+4)є +  R. In contrast, CON has  = (6f+2)є + (3f+1)  R If the degree of synchronization  is much larger than 6mє, then it is necessary to synchronize 3f+1 times as often with algorithm CON than COM.

32 Message Complexity CON : n 2 messages COM : n f+1 messages The number of rounds of message passing might be more important, thus algorithm OM (with O(f) rounds) might be best for converting into a clock synchronization algorithm among all Byzantine Generals algorithms.

33 Other algorithms Arbitrary networks and topologies (not necessarily completely connected graphs) Uncertainties are unknown or unbounded NTP – Mill’s network time protocol for Internet time synchronization 1 Use of authenticated broadcast, digital signatures Algorithms based on approximate agreement, instead of consensus Amortizing adjustments over an interval of time, instead of discontinuities in adjusted clocks Allowing new processes to join a network with their clocks synchronized

34 References 1. Attiya and Welch. Distributed Computing: Fundamentals, Simulations and Advanced Topics, Chapter 6: Causality and Time, McGraw-Hill, , Attiya and Welch. Distributed Computing: Fundamentals, Simulations and Advanced Topics, Chapter 13: Fault-Tolerant Clock Synchronization, McGraw-Hill, , Fischer, Lynch and Merritt. Easy impossibility proofs for distributed consensus problems. Distributed Computing, 1(1): 26-39, Halpern, Simons, Strong and Dolev. Fault-tolerant clock synchronization. Proceedings of the 3 rd Annual ACM Symposium on Principles of Distributed Computing, Vancouver, B.C., Canada, , Lamport and Melliar-Smith. Byzantine clock synchronization. Proceedings of the 3 rd Annual ACM Symposium on Principles of Distributed Computing, Vancouver, B.C., Canada, 68-74, Lamport and Melliar-Smith. Synchronizing clocks in the presence of faults. Journal of the ACM, 32(1): 52-78, Lamport, Shostak and Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3): , Lundelius and Lynch. An upper and lower bound for clock synchronization. Information and Control, 62: , Mills. Internet time synchronization: The network time protocol. IEEE Transactions on Communications, 39(10): , Srikanth and Toueg. Optimal clock synchronization. Journal of the ACM, 34(3): , 1987.