Download presentation
Presentation is loading. Please wait.
1
Design of Distributed Real-Time Systems Ramani Arunachalam
2
Case Study: MARS ● MARS (Maintainable Real-time system) – Distributed, fault-tolerant, hard real-time – Objectives ● Guaranteed timeliness ● Testability ● Maintainability ● Fault-tolerance ● Systematic software development – Time-triggered architecture
3
Objectives ● Guaranteed timeliness – Based on resource adequacy at peak load – Statistical assurances not enough ● Testability – Architecture should support testability of timeliness ● Maintainability – Needed to remedy hardware faults, design errors and respond to change requests – Localized consequences -> minimized effort
4
Objectives ● Fault Tolerance – Redundancy – On-line maintenance ● Systematic software development – No 'trial and error' integration – OS guarantees predictable temporal behaviour
5
State View ● Time Triggered observation of states – Observe RT entities at predefined intervals ● Intelligent input output – Observation grid – Intelligent sensor ● Preprocesses raw data from input device ● observes at finer granularity called Perception granularity
6
State View ● Intelligent actuator – Post-processes data from computer system before sending to output device ● State Messages – Produced at observation points – Minimal synchronization requirement – No need for buffer management – Unidirectional (from RT entity)
7
Structure ● Clusters – Autonomous subsystems – Disjoint name spaces – State message exchanges – Composed of Fault-tolerant units (FTUs) – Real-time communication channel (TDMA) ● FTU – Composed of replicated components – Active and shadow components
8
FTU
9
Structure ● Component – Smallest replaceable unit – Fail-silent (Correct results or none) – Termination upon failure ● Task Execution – Task : Software inside component – Starts at predefined time – Proceeds without any communication or synchronization – Execution time is deterministic
10
Operation ● Results of periodic tasks sent as state messages ● Execution time of communication is also predefined ● A Real-time transaction is a progression of processing and communication actions between a stimulus from and a response to the environment. ● Static scheduling (at compile time!) ● At run-time, no surprises ● Modes (operating, emergency)
11
Fault-tolerance ● Two levels of redundancy ● Active redundancy at FTU level – If a component fails, standby becomes active ● Time redundancy at component level – Every task is executed twice and results compared ● TDMA monitor – Monitors temporal behaviour – Controls the output from component ● Distributed clock synchronization
12
Fault-tolerance ● Replica determinism – All replicated components perform the same state changes at the same point in time – Prohibit reading of local time – All replicas should agree when to change mode ● Component reintegration – i-state, h-state – Reintegration point: when size of h-state is small – New component gets the h-state at this point
13
Summary ● Maintenance – Failed component doesn't affect FTU – On-line reintegration after repair – Change in software ● Does it fit in current schedule? ● Otherwise, new mode with new schedule ● Summary – Strict separation of functionality, timeliness and dependability. – Designed for temporal behaviour, testing simplified.
14
Delta-4 XPA ● Objectives – “A real-time system is not assured to meet deadlines outside operational envelope” – Bounded-demand school ● operational envelope is predictable ● Impractical assumption for complex systems – Unbounded-demand school ● Complete definition of operational envelope is not possible ● Graceful degradation if it falls outside the envelope – XPA implements hard real-time but falls into best- effort behaviour when required.
15
DELTASE Group management Layer Time and Group communication Abstract network layer (physical + MAC+ firmware)
16
Architecture ● Network infrastructure – FDDI supports urgent traffic, built-in fault tolerance – Token bus/ring has media redundancy for availability ● Time – Internal time maintained by distributed time server – Clocks synchronized to tens of microseconds – External time – one of the standard time ● Group communication – Services from atomic multicast to datagram – Very fast services of varying reliability
17
Architecture ● Group communication – Distributed replication management ● BestEffortN – guarantee delivery to N elements ● BestEffortTo - guarantee delivery to named elements ● AtLeastN, atLeastTo – guaranteed service even when sender fails ● Group management – Distributed Group manager object – Management and distribution of groups of objects – Incorporates knowledge of various modes of replication
18
Architecture ● Application support environment (Deltase) – Client-server and producer-consumer interactions – Apps written using deltase or converted using preprocessors ● Timeliness – What to do under overload conditions? ● Static off-line scheduling – too many possibilities ● On-line scheduling – can find feasible schedules if not overload.
19
Timeliness ● Scheduling policy uses “precedence” – Combination of priority and earliest-deadline – Few priority classes to avoid unfairness – Within priority class, earliest-deadline-first. ● Design-time and run-time timeliness – Targetline : instant chosen by designer for provision of service – Liveline and deadline: earliest and latest time at which service may be provided – Violation of these detected at runtime and design-time actions defined.
20
Preemption ● Leader-follower model for replication – Decisions made by a privileged replica i.e. Leader – Preemption point ● Point at which an interrupt will be served – High precedence msg arrives for a process not running currently ● Increase the process's precedence to that of msg ● Causes the process to be scheduled ● These actions propogated to followers ● Followers perform identical operations
21
Desynchronization ● Followers must not be too apart from leaders ● Followers too fast – Reach the preemption point before leader – remain blocked until leader notifies ● Followers too slow – Leader timestamps notifications – If follower didn't execute the action by T+t(desync) ● Desynchonization event raised ● Another follower takes over
22
Summary ● Communication support using groups – Oriented to distributed computing ● Tradeoffs between QOS and efficiency – Group mgr uses atomic multicast for orderly delivery – Leader-follower uses reliable, non-ordered delivery ● Group management service – Executes leader-follower, detects replica failure – Clone the replica at another node.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.