The Θ-Model Ulrich Schmid Josef Widder Martin Hutle Daniel Albeseder

The Θ-Model Ulrich Schmid Josef Widder Martin Hutle Daniel Albeseder
Vienna University of Technology Embedded Computing Systems Group Gérard Le Lann Jean-François Hermant INRIA Rocquencourt Project Novaltis 2/22/2019 The Theta-Model (Version 1.3)

Motivation The Theta-Model

Timed Algorithms Most FT algorithms for distributed RTS have explicit time values (unit „seconds“) in their code / variables Toy example: Local real-time clock for timing out a crashed process msg_pong = do_roundtrip(msg_ping, p) send msg_ping to p TIMEOUT := C(t) + 2τ+ /* max. e.-t.-e. delay τ+ (sec) */ while C(t) < TIMEOUT do nothing if msg_pong did not arrive then msg_pong := NIL return msg_pong The Theta-Model

Implications ? Safety properties like consistency of replicated data may depend upon non-NIL operation of do_roundtrip Usual assumption: Real-time systems must always meet their timeliness properties Only possible if all end-to-end delays δ ≤ τ+ Safety properties also guaranteed in this case BUT: Bounds like τ+ that always hold are very difficult to determine for real systems Fail-operational systems might be allowed to sometimes lose timeliness – but never lose consistency The Theta-Model

Why is determining τ+ difficult ?
Queuing phenomenons: Simultaneous messages from different peers (CPU) Multiple processes (CPU) Multiple messages (Link) End-to-end delays hence depend upon message & computational complexity of algorithms interaction („blocking factors“) load conditions scheduling disciplines The Theta-Model

Importance of Scheduling ?
τ+ can be huge in real systems since all messages [including application-level] must be taken into account Maximum determines synchronous round duration  too conservative for most messages Escape: Appropriate scheduling Fast Failure Detectors by Hermant & Le Lann [HLL02] Use Head-of-the-Line Scheduling for FD-level processes and messages Only blocking factors due to non-preemptible resources can lead to priority inversion phenomenons on FD-level τ+ relevant for failure detection latency reduced by orders of magnitude The Theta-Model

(Note that woutQ, woutq and winq are the problematic parts here)
But still … Hermant & Le Lann [HLL02]: τ+ = γ(n) with (Note that woutQ, woutq and winq are the problematic parts here) Do you trust a real system to always obey this, during the whole mission time? Do you really want your safety and liveness properties to depend on this? The Theta-Model

YES: Asynchronous algorithms (time-free, message-driven)
Alternatives ? Are there ways to guarantee logical safety & liveness properties independently of the timing properties of the underlying system ? YES: Asynchronous algorithms (time-free, message-driven) Are there suitable time-free computational models and algorithms ? YES: Θ-Model The Theta-Model

Roadmap of our Presentation
Overview of Computational Models ₪ The Θ-Model First Experimental Results Applications The Theta-Model

Overview of Computational Models
The Theta-Model

The FLP Asynchronous Model (I)
Fischer, Lynch & Paterson [FLP85] System of n processes communicating via reliable point-to-point network Every message sent is eventually delivered No bounded-drift clocks available Computational step times are non-negative, finite but unbounded (i.e., can exceed any a priori given bound) Message transmission delays are non-negative, finite but unbounded The Theta-Model

The FLP Asynchronous Model (II)
FLP model has no timing assumption at all  cannot be violated at runtime BUT: In the FLP-Model, it is impossible to distinguish a slow from a crashed process Important DC problems like consensus impossible to solve in the FLP-Model in the presence of failures For solvability, some property/properties must be added to the pure FLP model. The Theta-Model

The FLP Asynchronous Model (III)
Resulting spectrum of models: FLP  partially synchronous  synchronous Clearly: The stronger the added property the less is the assumption coverage in real systems Usually: Add explicit timeliness properties to the FLP- Model Sometimes: Add implicit timeliness properties to the FLP-Model (time-free models) The Theta-Model

(Close to) Synchronous Models
Synchronous model  allows simulation of lock-step rounds Transmission delay bound Δ Computing step time bound σ Bounded-drift local clocks available Timed Asynchronous Model by Cristian & Fetzer [CF99] BUT: Fail awareness allows bounds Δ and σ to be violated arbitrarily often  fail-safe behavior The Theta-Model

Partially Synchronous Models (I)
Dwork, Lynch & Stockmeyer [DLS88], Ponzio & Strong [PS92], Attiya, Dwork, Lynch & Stockmeyer [ADLS94] Transmission delay bound Δ Bounded ratio of max. over min. computing step times Φ Bounds unknown / known but hold from unknown time GST on Every process can locally time-out messages: [PS91, ADLS94]: Semi-synchonous model assumes availability of bounded-drift local clocks [DLS88]: Computing steps of fastest processor are used as real-time units [= unit of Δ !]  local clock with bounded rate  [1/Φ,1] implementable via spin-loop The Theta-Model

Partially Synchronous Models (II)
Archimedean model by Vitany [Vit85] Bounded ratio s ≥ u/m on min. computing step time (m) and max. computing step time + max. transmission delay (u) s is dimensionless Every process can again locally time-out messages [via spinning for s steps] Finite Average Round-Trip-Time Model by Fetzer & Schmid [FS04] Unknown lower bound for computing step time Stubborn links with unknown average round-trip time bound Every process can implement „weak clock“ via spin-loop The Theta-Model

FLP-Model with Failure Detectors
Replace explicit timeliness properties by unreliable failure detectors FDs are local oracles based upon a list of suspected processes Completeness: Every crashed process is eventually suspected Accuracy: No correct process is suspected FLP-Model + FDs allow most important distributed computing problems to be solved BUT: Implementing FDs in a real system necessarily requires a system model stronger than FLP  back at initial problem The Theta-Model

The Θ-Model The Theta-Model

Time-Free Message-Timeout in ParSync ?
Implementation of do_roundtrip(msg_ping, p) using a spin-loop in the parsync models of [DLS88] or [Vit85]: send msg_ping to p for i=1 to x do no-op /* x=f(Δ, Φ) resp. x=f(s) is dimensionless! */ if msg_pong did not arrive then msg_pong := NIL return msg_pong The algorithm is time-free since neither code nor variables contain real-time values (unit „seconds“) ! not message-driven The Theta-Model

But … There is the ([DLS88]: hidden, [Vit85]: explicit) assumption that all timing values/bounds are multiples of the min. computing step time (m) The algorithm would be time-free only if m could vary arbitrarily Since there is no physically evident correlation between transmission delay and computing step time, however, m cannot vary arbitrarily without violating the physical (real-time) transmission delay bound [since Δ resp. s are fixed] Assuming fixed Δ resp. s hence makes sense for essentially constant m only Not time-free in reality since m  unit real-time! The Theta-Model

Still: Can we make this idea working ?
The problem with the previous algorithm is that computing step times and transmission delays are uncorrelated Key idea: Replace unit time „fastest computing step“ of [DLS88], [Vit85] by „fastest end-to-end delay“ Just assume that, during any round-trip, there may not be more that Θ other successive roundtrips (anywhere in the system) The Theta-Model

Time-free implementation of do_roundtrip(.)
send msg_ping to p for i=1 to Θ do /* Θ is dimensionless ! */ begin /* do additional roundtrips for waiting */ send delay_ping(i) to process q wait for delay_pong(i) from process q end if msg_pong did not arrive then msg_pong := NIL return msg_pong The algorithm is time-free since Θ is dimensionless fully message-driven since all events are triggered by message receptions only The Theta-Model

Time-free implementation of do_roundtrip(.)
q 1 2 3 4 5 Θ = 5 p D msg_ping msg_pong Timing behavior solely emerges from the underlying system [D adapts automatically to actual speed] Consider execution in a synchronous system: End-to-end delays δ satisfy τ− ≤ δ ≤ τ+ with τ+ / τ− ≤ Θ = 5  Termination within 10 τ− ≤ D ≤ 10 τ+ τ+ = 100 us  D ≤ 1 ms ◊ τ+ = 1 s  D ≤ 10 s The Theta-Model

Performance ? Is doing continuous successive round-trips for delay purposes prohibitively expensive? (a) Reasonably large delay * bandwidth product: τ+ = 1 ms with 1 Mbit/sec peer-to-peer bandwidth allows to send 1000 bit per message do_roundtrip(.) needs only a few bit of message data Only a few % overhead for continuous round-trips! (b) Small delay * bandwidth product: Use timer to separate multiple instances of do_roundtrip(.) No bounded drift timer required here  Implementable without hardware clock by counting some local events NO! The Theta-Model

The Θ-Model (Simple Version)
FLP-Model + End-to-end delays δ of all messages in transit at t minimum τ−(t) maximum τ+(t) τ+(t) and τ−(t) may vary arbitrarily with time, but ratio Θ(t) = τ+(t)/τ−(t) must remain bounded by some [known or even unknown] Θ for every time t The Theta-Model

Key Question Can we indeed expect a (positive) correlation between τ+(t) and τ−(t) in a real system? Shared channel-type networks [Deterministic Ethernet]: Theoretical analysis by Hermant & Widder [HW04] has shown that Θ close to 1 can be achieved Fully connected systems: First experimental evaluation of a simple Θ clock synchronization algorithm by Albeseder [Alb04] confirms correlation The Theta-Model

Reason for such a correlation ?
Restriction to broadcast communication (shared channel or multiple point to point sends in a fully connected network) (Part of) the messages populating the queues from p → q also sure/likely to populate queues from p → r, and even from s → r CPU Receiver q Chan Link p → q Link q → x Sender p t δpq= 10 Arrival at p Processed at q δpr = 7 Processed at r CPU Receiver r Chan Link p → r Link r → y Chan Link s → r CPU Sender s The Theta-Model

Correlation  Coverage Expansion
Given some bound τ+ and τ− assumed during system design (also used in synchronous systems), compute Θ = τ+ / τ− Unanticipated overload: τ+(t) > τ+ — if τ+(t) ≤ Θτ−(t), however, Θ-system still OK t end-to-end delays  δ  Synchronous system out of spec Note: τ+(t) = τ+ + α(t) τ −(t) = τ + α(t)/Θ suff. for Θ to hold The Theta-Model

Still: Shortcomings Simple Θ-Model
The predicted correlation need not exist for every fast message but only for some Some very fast messages [even τ− = 0] may be in transit somewhere in the system even during a slow message Correlation and hence coverage expansion does not exist in such cases Need a more relaxed definition of the relation between slow and fast messages All that is actually needed is to constrain the number of fast messages during a slow one No need for a correlation at every point in time t The Theta-Model

The Θ-Model (Generalized Version)
Consider chain of k ≥ 1 successive messages Longest chain of „covered“ causal messages ≤ kΘ τ+(t1) τ+(t2) k=2 successive (slow) messages ≤ kΘ = 9 causally dependent (fast) messages  Θ = 4.5 Advantage: Messages with τ−(t) = 0 allowed here! The Theta-Model

Partial Order of Partially Synchronous Models
DLS … [DLS88] with a priori known Δ, Φ Θ … Θ-Model with a priori known Θ DLSu … [DLS88] with a priori unknown Δ, Φ Θu … Θ-Model with a priori unknown Θ FLP … FLP-Model FLP Θu DLSu Θ DLS The Theta-Model

Existing Θ-Algorithms
Perfect failure detectors [Schmid and Le Lann 2003] Clock synchronization (+ system booting) [Widder 2003], [Widder and Schmid 03] Eventually perfect failure detectors / system booting [Widder, Le Lann and Schmid 2003] Fast failure detectors atop of Deterministic Ethernet [Widder and Hermant 2004] Self-stabilizing failure detectors & impossibility results [Hutle and Widder 2004] Synchronizer, SDD problem, atomic commitment, etc. [Widder’s PhD 2004] The Theta-Model

anks ! The Theta-Model

First Experimental Results
The Theta-Model

Remember Key Question:
Can we indeed expect a (positive) correlation between τ+(t) and τ−(t) in a real system? Alternatively: Let Θ = τ+ / τ− with τ− = mint τ−(t) being the total minimum for all t τ+ = maxt τ+(t) being the total maximum for all t Is it the case that Θ(t) < Θ ? How often and how much gain Θ/Θ(t) ? The Theta-Model

Evaluation Setup Master thesis by Daniel Albeseder [Alb04]
Pentium4 workstations (2,4GHz FSB533) Fully switched Fast-Ethernet over two Cisco Catalyst 2950 switches (connected over fiber Gigabit-Ethernet backbone) Red Hat Linux 7.2 with kernel, patched with High-Resolution-Timers and Kernel-Preemption The Theta-Model

Evaluation Parameter Settings
n = 4 processors with at most f = 1 faulty ones Head-of-line process scheduling (Linux RT Priorities) High message priority (low latency bit in TOS-byte), but no head-of-the-line message scheduling Simulated broadcast (= multiple point-to-point sends) Fixed message length: 36 bytes Inter-round delay: 1ms Duration evaluation run: 10 … 100 s - range The Theta-Model

Fully switched Fast-Ethernet
System Design ctrlpsa evalpsa The ctrlpsa workstation controls the network of evalpsa-clients. The evalpsa is running the algorithm to be evaluated. The fully connected network is simulated by a fully switched Fast-Ethernet. Fully switched Fast-Ethernet The Theta-Model

Control Communication
ctrlpsa evalpsa Phases: boot init done booting stop start running change parameters … run algorithm collecting store done t t The Theta-Model

Evalpsa Structure The Theta-Model

Data Analysis Consider only clock synchronization messages
τ−(t), τ+(t), Θ(t) etc. only evaluated at times t where some rule of the algorithm fires („effective Θ“) Approximation of one-way delays via round trip delays for simplicity (i.e., we assume that both messages of a round-trip have the same delay) The clock of one designated processor is used as global timebase, all timestamps are a-posteriori adjusted to this global timebase The Theta-Model

Glossary of variables τ−(t), τ+(t): Min. and max. delay of all messages in transit at some time t Θ(t) = τ+(t)/ τ−(t) Θ = maxt Θ(t) τ−, τ+: Min. and max. delay of all messages in transit at all times during the evaluation run Θ = τ+/ τ− Gain = Θ/Θ The Theta-Model

Θ Every testrun was repeated five-times. The maximum of this five testruns is shown here. The Theta-Model

Θ/Θ The Theta-Model

Continuously Increasing Network Load
The first and last secands are cut of from the calculation routine, to compansate errors during this phases. The load was increased in 1% jumps every 2 seconds. You see low Theta values in twi periods. We speculate, that this is caused by special network-improvement functions inside the Linux-kernel as well as inside the networt interface card itself. Overall Theta dont increase with network load. The Theta-Model

Conclusions from First Experiments
There is definitely a positive correlation between τ+(t) and τ−(t) in the evaluation setting, even with significant gain always achieved Although we cannot infer from this that there is always a correlation between τ+(t) and τ−(t) here, it is very likely that there are scenarios where some assumed Θ holds despite of the fact that some assumed τ+ is violated the Θ-model is very likely to have higher coverage that a synchronous solution More thorough experimental and theoretical evaluation [of more suitable systems] will follow The Theta-Model

Applications The Theta-Model

„Exotic“ Application: VLSI Chips
Interconnect delays dominate over switching delays Signals cannot traverse entire chip within a single clock cycle Increasing susceptibility to transient failures (particles, cross-talk, …) High power-consumption Shrinking feature size Increasing complexity Increasing clock speed The Theta-Model

Clock Generation in Systems-on-a-Chip
Illusion of chip-wide synchrony increasingly difficult to maintain Extend every functional unit with simple local CS algorithm CS algorithms communicate via dedicated clocking signals CS algs guarantee | Ci(t) – Cj(t) | ≤ π (Θ) Next tick happens every max delay Data sent by fui by tick k available at fuj by tick k+Ξ(Θ) at latest Division by Ξ provides global macro tick abstraction fu1 fu2 fu3 data bus CS algs CS network Distributed clock clock Clock tree The Theta-Model

Benefits CS algs simulate global clock
Synchronous design abstraction maintained Self-clocking feature: Chip runs as fast as routing delays allow Θ is estimated by place and route tools Explicit dependence upon routing only via Θ [required for determining macro-tick division factor Ξ(Θ) only] Distributed clocks tolerate transient failures Need n > 6fl FUs for tolerating up to fl transient failures (affecting clocking signals) per FU in every tick Additional (data) fault-tolerance possible via replicated FUs employing synchronous Byzantine agreement algorithms etc. [WS03]: CS algs work also for non-simultaneous reset The Theta-Model

anks ! The Theta-Model

The Θ-Model Ulrich Schmid Josef Widder Martin Hutle Daniel Albeseder

Similar presentations

Presentation on theme: "The Θ-Model Ulrich Schmid Josef Widder Martin Hutle Daniel Albeseder"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Θ-Model Ulrich Schmid Josef Widder Martin Hutle Daniel Albeseder

Similar presentations

Presentation on theme: "The Θ-Model Ulrich Schmid Josef Widder Martin Hutle Daniel Albeseder"— Presentation transcript:

Similar presentations

About project

Feedback