Maya Haridasan April 15th

Slides:



Advertisements
Similar presentations
Lecture 1: Logical, Physical & Casual Time (Part 2) Anish Arora CSE 763.
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
HIERARCHY REFERENCING TIME SYNCHRONIZATION PROTOCOL Prepared by : Sunny Kr. Lohani, Roll – 16 Sem – 7, Dept. of Comp. Sc. & Engg.
Clock Synchronization. Problem 5:33 5:57 5:20 4:53 6:01.
Time and Clock Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.
Time in Embedded and Real Time Systems Lecture #6 David Andrews
Distributed Systems Fall 2010 Time and synchronization.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Teaching material based on Distributed Systems: Concepts and Design, Edition 3, Addison-Wesley Copyright © George Coulouris, Jean Dollimore, Tim.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
Clock Synchronization Ken Birman. Why do clock synchronization?  Time-based computations on multiple machines Applications that measure elapsed time.
Josef WidderBooting Clock Synchronization1 The  - Model, and how to Boot Clock Synchronization in it Josef Widder Embedded Computing Systems Group
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Josef Widder1 Why, Where and How to Use the  - Model Josef Widder Embedded Computing Systems Group INRIA Rocquencourt, March 10,
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
Composition Model and its code. bound:=bound+1.
Time Supriya Vadlamani. Asynchrony v/s Synchrony Last class: – Asynchrony Event based Lamport’s Logical clocks Today: – Synchrony Use real world clocks.
1 Fault Tolerance in Collaborative Sensor Networks for Target Detection IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 3, MARCH 2004.
Add Fault Tolerance – order & time Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport Optimal Clock Synchronization T.K. Srikanth.
1 Clock Synchronization Ronilda Lacson, MD, SM. 2 Introduction Accurate reliable time is necessary for financial and legal transactions, transportation.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Parallel and Distributed Simulation Synchronizing Wallclock Time.
1 Clock Synchronization for Wireless Sensor Networks: A Survey Bharath Sundararaman, Ugo Buy, and Ajay D. Kshemkalyani Department of Computer Science University.
BFTW 3 workshop (Sep 22, 2009)© 2009 Andreas Haeberlen 1 The Fault Detection Problem Andreas Haeberlen MPI-SWS Petr Kuznetsov TU Berlin / Deutsche Telekom.
Global Time in Distributed Real-Time Systems Dr. Konstantinos Tatas.
CS603 Clock Synchronization February 4, What is the best we can do? Lundelius and Lynch ‘84 Assumptions: –No failures –No drift –Fully connected.
Time This powerpoint presentation has been adapted from: 1) sApr20.ppt.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Physical clock synchronization Question 1. Why is physical clock synchronization important? Question 2. With the price of atomic clocks or GPS coming down,
SysRép / 2.5A. SchiperEté The consensus problem.
CS 3471 CS 347: Parallel and Distributed Data Management Notes13: Time and Clocks.
CS 347Notes 121 CS 347: Parallel and Distributed Data Management Notes12: Time and Clocks Hector Garcia-Molina.
PROCESS RESILIENCE By Ravalika Pola. outline: Process Resilience  Design Issues  Failure Masking and Replication  Agreement in Faulty Systems  Failure.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
Distributed Web Systems Time and Global State Lecturer Department University.
Proof of liveness: an example
The consensus problem in distributed systems
When Is Agreement Possible
Distributed Computing
Distributed Systems CS
Time and Clock Primary standard = rotation of earth
Time and Global States Ali Fanian Isfahan University of Technology
Outline Distributed Mutual Exclusion Distributed Deadlock Detection
Time and Clock.
Logical time (Lamport)
Alternating Bit Protocol
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Time and Clock.
Architectures of distributed systems Fundamental Models
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
Architectures of distributed systems Fundamental Models
Distributed Systems CS
EEC 688/788 Secure and Dependable Computing
Physical clock synchronization
CDK: Sections 11.1 – 11.4 TVS: Sections 6.1 – 6.2
Architectures of distributed systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Logical time (Lamport)
Architectures of distributed systems Fundamental Models
Logical time (Lamport)
Logical time (Lamport)
Presentation transcript:

Maya Haridasan April 15th Time Maya Haridasan April 15th

What’s wrong with the clocks?

Why is synchronization so complicated? Due to variations of transmission delays each process cannot have an instantaneous global view of every remote clock value Presence of drifting rates Hardest: support faulty elements Talk about why we need synchronized clocks: Process control applications, transaction processing applications, communication protocols require approximately the same view of time Present the scenario and talk about what are the difficulties in designing clock synchronization algorithms: Due to variations of transmission delays each process cannot have an instantaneous global view of every remote clock value. Presence of drifting rates Hardest: support faulty elements

External Clock Synchronization Synchronize clocks with respect to an external time reference: usually useful in loosely coupled networks or real-time systems (Ex, NTP) External x Internal Clock Synchronization: Synchronize clocks with respect to an external time reference: usually useful in loosely coupled networks or real-time systems (Ex, NTP) Synchronize clocks among themselves Enables a process to measure the duration of distributed activities that start on one processor and terminate on another one. Establishes an order between distributed events in a manner that closely approximates their real time precedence. Hardware x Software?

Internal Clock Synchronization Synchronize clocks among themselves Enables a process to measure the duration of distributed activities that start on one processor and terminate on another one. Establishes an order between distributed events in a manner that closely approximates their real time precedence. External x Internal Clock Synchronization: Synchronize clocks with respect to an external time reference: usually useful in loosely coupled networks or real-time systems (Ex, NTP) Synchronize clocks among themselves Enables a process to measure the duration of distributed activities that start on one processor and terminate on another one. Establishes an order between distributed events in a manner that closely approximates their real time precedence. Hardware x Software?

Software Clock Synchronization Deterministic  assumes an upper bound on transmission delays – guarantees some precision Statistical  expectation and standard deviation of the delay distributions are known Probabilistic  no assumptions about delay distributions Realistic? Reliable? Software Clock Synchronization: Deterministic -> assume an upper bound on transmission delays – guarantee some precision Probabilistic -> no assumptions about delay distributions Statistical -> expectation and standard deviation of the delay distributions are known Any guarantees?

Correct clock - Definition A correct clock Hp satisfies the bounded drift condition: (1 – ρ)(t – s) ≤ Hp(t) – Hp(s) ≤ (1 + ρ)(t – s) Hp(t) – Hp(s) < (1- ρ)(t – s) Hp(t) – Hp(s) > (1+ ρ)(t – s) Perfect hardware clock Hp(t) – Hp(s) = (t – s) Clock values Definition of correctness of clocks? Real time

Failure modes Crash Failure: Performance Failure: processor behaves correctly and then stops executing forever Performance Failure: processor reacts too slowly to a trigger event Arbitrary Failure (a.k.a Byzantine): processor executes uncontrolled computation

The clock synchronization problem Property 1 (Agreement): | Lpi(t) – Lpj(t) |  δ, (δ is the precision of the clock synchronization algorithm) Property 2 (Accuracy): (1 – ρv)(t – s) + a ≤ Lp(t) – Lp(s) ≤ (1 + ρv)(t – s) + b What is optimal accuracy? ρv ≠ ρ

Optimal accuracy Drift rate of the synchronized clocks is bounded by the maximum drift rate of correct harware clocks ρv = ρ (1 – ρ)(t – s) + a ≤ Lp(t) – Lp(s) ≤ (1 + ρ)(t – s) + b

Paper 1: Optimal Clock Synchronization Claims: Accuracy need not be sacrificed in order to achieve synchronization It’s the first synchronization algorithm where logical clocks have the same accuracy as the underlying physical clocks Unified solution to all models of failure Paper1: Accuracy need not be sacrificed in order to achieve synchronization. ??? First synchronization algorithm where logical clocks have the same accuracy as the underlying physical clocks. Unified solution to all models of failure

Authenticated Algorithm kth resynchronization - Waiting for time kP real time t logical time kP Ready to synchronize P – logical time between resynchronizations

Authenticated Algorithm logical time kP Ready to synchronize P – logical time between resynchronizations

Authenticated Algorithm logical time kP Ready to synchronize P – logical time between resynchronizations

Authenticated Algorithm Kp +  logical time kP Synchronize! P – logical time between resynchronizations

Achieving Optimal Accuracy Uncertainty of tdelay introduces a difference in the logical time between resynchronizations  Reason for non-optimal accuracy Solution: Slow down the logical clocks by a factor of P (P -  - ) where  = tdel / 2(1 + )

Authenticated Messages Correctness: If at least f + 1 correct processes broadcast messages by time t, then every correct process accepts the message by time t + tdel Unforgeability: If no correct process broadcasts a message by time t, then no correct process accepts the message by t or earlier Relay: If a correct process accepts the message at time t, then every correct process does so by time t + tdel

Nonauthenticated Algorithm Replace signed communication with a broadcast primitive Primitive relays messages automatically Cost of O(n2) messages per resynchronization New limit on number of faulty processes allowed: n > 3f

Broadcast Primitive 2 1 3 (echo, round k) Received f + 1 distinct (init, round k)! 1 Received 2f + 1 distinct (echo, round k)! Accept (round k) 3 (echo, round k)

Initialization and Integration Same algorithms can be used to achieve initial synchronization and integrate new processes into the network A process independently starts clock Co On accepting a message at real time t, it sets C0 = α “Passive” scheme for integration of new processes

Paper 2: Why try another approach? Traditional deterministic fault-tolerant clock synchronization algorithms: Assume bounded communication delays Require the transmission of at least N2 messages each time N clocks are synchronized Bursty exchange of messages within a narrow re-synchronization real-time interval

Probabilistic ICS Claims: Proposes family of fault-tolerant internal clock synchronization (ICS) protocols Probabilistic reading achieves higher precisions than deterministic reading Doesn’t assume unbounded communication delays Use of convergence function optimal accuracy

Their approach Only requires to send a number of unreliable broadcast messages Staggers the message traffic in time Uses a new transitive remote clock reading method Number of messages in the best case: N + 1 (N time server processes)

Probabilistic Clock Reading q Basic Idea: T1 m1 m2 T0 T2 (T2 – T0)(1 + ) = maximum bound (real time) p min ≤ t(m2) ≤ (T2 – T0)(1 + ) - min max(m2)(1 + ) + min(m2)(1 - ) 2 Cq = T1 +

Probabilistic Clock Reading q Basic Idea: T1 m1 m2 T0 T2 Maximum acceptable clock reading error p Is error ≤  ? Yes: Success No? Try reading again (Limit: D)

Staggering Messages p q r p slots per cycle k cycles per round slot

Transitive Remote Clock Reading Can reduce the number of messages per round to N + 1 tp tq real time T p T q Cq (T,p) Cr (T,q) = Cr (T,p) + T - Cq (T,p) r Cr (T,p) Cr (T,q) Cannot be used when arbitrary failures can occur!

Round Message Exchange Protocol Request Mode Clock times: p q r ? ? ? request messages t err Finish Mode Clock times: p q r 11 10 1 1 2 finish messages t err Reply Mode Clock times: p q r 11 10 ? ? ? reply messages t err

Outline of Algorithms Round clock Cpk of process p for round k: Cpk(t) = Hp(t) + Apk Void synchronizer() { ReadClocks(..) A = A + cfn(rank(), Clocks, Errors) T = T + P }

Convergence Functions Let I(t) = [L, R] be the interval spanned by at t by correct clocks. If all processes would set their virtual clocks at the same time t to the midpoint of I(t), then all correct clocks would be exactly synchronized at that point in time. Unfortunately, this is not a perfect world!

Convergence Functions Each correct process makes an approximation Ip which is guaranteed to be included in a bounded extension of the interval of correct clocks I: Ik(t) = [min{Csk (t) - }, max{Csk (t) + }] Deviation of clocks is bounded by , so length of Ik(t) is bounded by  + 2

Tolerated types of failures Failure classes Algorithm Tolerated Failures Required Processes Tolerated types of failures CSA Crash F F + 1 Crash CSA Read 2F + 1 Crash, Reading CSA Arbitrary 3F + 1 Arbitrary, Reading CSA Hybrid Fc, Fr, Fa 3Fa + 2Fr + Fc + 1 Crash, Read., Arb.

Conclusions – Which one is better? First Paper (deterministic algorithm) Simple algorithm Unified solution for different types of failures Achieves optimal accuracy Assumes bounded comunication O(n2) messages Bursty communication

Conclusions – Which one is better? Second Paper (probabilistic algorithm) Takes advantage of the current working conditions, by invoking successive round-trip exchanges, to reach a tight precision) Precision is not guaranteed Achieves optimal accuracy O(n) messages If both algorithms achieve optimal accuracy, Then why is there still work being done?