Time in Distributed Systems Distributed Systems
Why Time is Important? If you work in the industry, you never have to worry about this You’ll rarely have to architect a time- coordination framework
Why is Time important? There is no such thing as absolute time –Read: A Brief History of Time (Stephan Hawking) –Let’s “assume” there was a big bang, t=0 –UNIX epoch (~big bang) = Jan 1, 1971 Mostly for causal ordering
Ugliness of Time Humans measure it in astronomical terms Computers measure it either as: –Quartz crystal oscillations –Atomic Caesium transitions between hyperfine levels of the ground state These things get out of synch –We add leap seconds
Clocks and Clock Drifts Quartz Clocks –Accuracy: 10^6 (1 second in 11.6 days) Atomic Clocks –Accuracy: 10^13 (1 second in 300,000 years)
UTC Universal Time Coordinated Managed by an atomic clock Propagated by radio and satellite signals Delay? –Speed of light
Cristian’s algorithm Client asks the Server Sets its time to –Server + transmission delay Assumptions –Consistent Delay –Symmetric Delay Problems –Server can crash –Security
Berkeley’s Algorithm Server Polls clients Keeps an estimate of client delay Server takes a fault-tolerant average –Drop out-liers –Take an average (including its own) –Propagate the delta (not time)
NTP Hierarchical time dissemination Time Correction: can’t role back time Root talks to strata 1, strata 1 talks to strata 2 and so on Hosts ranked on strata number Use a dispersion filter to rank hosts –Dispersion filter: variation of delay
What we really care about Causal ordering Lamport clocks –Monotonically increasing counter per process (host) –Only meaningful for one pair –No global ordering
Vector Clocks Lamport Clocks –One clock for every process (host) –Vector of size N, for an N node system Imposes total ordering Not very scalable