Download presentation
Presentation is loading. Please wait.
1
1 DRAFTS DRAFTS Distributed Real-time Applications Fault Tolerant Scheduling Claudio Pinello (pinello@eecs.berkeley.edu)pinello@eecs.berkeley.edu
2
2 DRAFTS Motivation Drive-by-Wire applications
3
3 DRAFTS Motivation No rods increased passive safety Interior design freedom BMW, Daimler, Cytroen, Chrysler, Bertone, SKF, etc…
4
4 DRAFTS Problem Overview Fault tolerance: redundancy is key Safety: system failure must be as unlikely as in traditional systems
5
5 DRAFTS Faults SW faults: bugs –can be reduced by disciplined coding –even better by code generation HW faults –harsh environment –many units (>50 uProcessors in a car; subsystems with 10-15 uP’s)
6
6 DRAFTS Fault Model Silent Faults –faults result in omission errors Detectable Faults –faults result in detectably corrupted data (e.g. CRC-protected channels) Non-silent Faults –faults result in value errors Byzantine Faults –malicious attacks, non-silent faults, unbounded delays, etc…
7
7 DRAFTS Software Redundancy Space redundancy –execute replicas on different HW –send results on different/multiple channels
8
8 DRAFTS N-copies Solution Pros: –reduced cost Cons: –degradation, 1x speed –multiple designs Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant AbstractinputAbstractOut Iterator Plant AbstractinputAbstractOut Iterator Plant Pros: –design once Cons: –N-x costs, 1x speed
9
9 DRAFTS Redundancy Management Managing a distributed system with multiple results requires careful programming –keep N-copies synchronized –exchange and apply results –detect and isolate faults –recover
10
10 DRAFTS Possible solutions Off-The-Shelf solutions TTP-based architectures FT-CORBA middle- ware Synthesis Debugged and portable libraries Development tools
11
11 DRAFTS Automotive Domain Production costs dominate NRE costs –multi-vendor supply-chain –interest in full utilization of architectures Validation and certification are critical –validate process –validate product
12
12 DRAFTS Shortcomings of OTS solutions TTP –proprietary communication network –network redundancy default is 2-way –active replication potential underutilization of resources FT CORBA –fairly large overhead middleware
13
13 DRAFTS Synthesis-based Solution Synthesize only needed glue-code –at the extreme: get rid of OS Customizable replication mechanisms –use passive replicas Treat architecture as a distributed execution machine –exploit parallelism to speed up execution
14
14 DRAFTS Schedule Synthesis Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant CPU Mapping CPU Act Input ArbiterBest Sens Input CoarseCTRL FineCTRL Act Output ArbiterBest
15
15 DRAFTS Synthesis-based Solution Enables fast architecture exploration
16
16 DRAFTS Contributions Programming Model Metropolis platform Schedule synthesis tool and optimization strategy Verification Tools
17
17 DRAFTS Programming Model Definition of a programming model that –Is amenable to specifying feedback controllers –Is convenient for analysis, simulation and synthesis –Supports degraded functionality/accuracy –Supports redundancy –Deterministic
18
18 DRAFTS Static Data-flow Model Pros: –Deterministic behavior Actors perform deterministic computation (no internal states) Requires all inputs to fire an actor –Explicit parallelism –Good for periodic algorithms Shortcomings: –Requires all inputs to fire an actor, but source actors may fail! A B C
19
19 DRAFTS Pendulum Example Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant Bang-Bang Linear
20
20 DRAFTS Model Extensions Node Criticality Node Typing (sensor, input, arbiter, etc.) Some types (input and arbiter) can fire with missing inputs Tokens have “Epoch” and “Valid” fields Specialized single-place buffer links –manage redundant sources (and destinations)
21
21 DRAFTS Data Tokens: Epoch iteration index of the periodic algorithm Actors ask for “current” inputs Using >= we can account for missing results (self-synchronization) EpochDataValid
22
22 DRAFTS Data Tokens: Valid Valid models the effect of fault detection: –True: data was received/produced correctly –False: data was not received on time or was corrupted Firing rules (and actors) may use it to change their behavior EpochDataValid
23
23 DRAFTS FTDataFlow modeling Metropolis used as framework to develop the set of tools FTDF is a platform library in Metropolis –modeling, simulation, fault injection –supports semi-automatic replication –results visualization
24
24 DRAFTS Actor Classes DF_SENactor sensor actor DF_INactor input actor DF_AINactor abstract input actor DF_FUNactor data-flow actor DF_ARBactor arbiter actor DF_AOUTactor abstract output actor DF_OUTactor output actor DF_ACTactor actuator actor DF_MEM state memory DF_Injector fault injection
25
25 DRAFTS Pendulum Example Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant Inject
26
26 DRAFTS Simulation output Fault
27
27 DRAFTS Summary on FTDF Extended SDF to deal with –missing/redundant inputs –different criticality –functionality types Developed Metropolis platform –modeling, simulation, fault-injection, visualization of results –support for adding redundancy
28
28 DRAFTS Architecture Model Architecture –Connectivity: bipartite graph –Computation and communication times: actor/cpu data/channel matrices of execution and transmission times Same as SynDEx model CPU
29
29 DRAFTS Fault Behavior Failure patterns –Subsets of Arch-Graph that may fail simultaneously For each failure pattern specify criticality level –i.e. which functionalities must be guaranteed –typically for empty failure pattern all functionality must be guaranteed
30
30 DRAFTS Synthesis Problem Given –Application –Architecture –Fault Behavior Derive –Redundancy –Schedule Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant CPU Mapping CPU Act Input ArbiterBest Sens Input CoarseCTRL FineCTRL Act Output ArbiterBest
31
31 DRAFTS Pendulum Example Actuator/Sensor location Tolerate any single fault –{empty} all functionality –{one CPU} may drop FineController, and sensor/actuator on that CPU –{one Channel} may drop FineController CPU Sens Act Sens Act
32
32 DRAFTS Refined I/O FineCTRL Iterator CoarseCTRL Sens Act Plant Input ArbiterBest Output
33
33 DRAFTS Full Replication
34
34 DRAFTS Simulation output
35
35 DRAFTS Schedule Synthesis Strategy Leverage existing dataflow scheduling tools (e.g. SynDEx) to achieve a distributed static schedule that is also fault-tolerant At design time (off-line) –devise redundant schedule At run-time –trivial reconfiguration: skip actors that cannot fire
36
36 DRAFTS Generating Schedules 1.Full architecture 1.Schedule all functionalities 2.For each failure pattern 1.Mark the faulty architecture components (critical functionalities cannot run there) 2.Schedule all functionalities 3.Merge the schedules Maximum performance Add redundancy
37
37 DRAFTS Generating Schedules 1.Full architecture 1.Schedule all functionalities 2.For each failure pattern 1.Mark the faulty architecture components 2.Schedule the critical functionalities 3.Merge the schedules
38
38 DRAFTS Merge into FTS Care must be taken to deal with multiple routings, clear non optimality [ECU0]Sensor1 [ECU1]Actuator2 [ECU1]Sensor2 [ECU0]Function2 (optional) [ECU0]Actuator1 [ECU0]Arbiter [ECU0]Output driver (requires 1) [ECU1]Function1 (required) [ECU1]Input receiver (requires 1) [ECU1]Arbiter [ECU1]Output driver (requires 1) [ECU0]Function1 (required) [ECU0]Input receiver (requires 1) [ECU0]Function2 (optional)
39
39 DRAFTS Heuristic 1: Limit CPU Load 1.Full architecture 1.Schedule all functionalities 2.For each failure pattern 1.Mark the faulty architecture components (critical functionalities cannot run there) 2.Re-schedule only critical functionalities (constrain non critical as in full architecture) 3.Merge the schedules Redundancy for critical only
40
40 DRAFTS Heuristic 2: Limit Bus Load Prune redundant communication [ECU0]Sensor1 [ECU1]Actuator2 [ECU1]Sensor2 [ECU0]Function2 (optional) [ECU0]Actuator1 [ECU0]Arbiter [ECU0]Output driver (requires 1) [ECU1]Function1 (required) [ECU1]Input receiver (requires 1) [ECU0]Function1 (required) [ECU0]Input receiver (requires 1) [ECU1]Arbiter [ECU1]Output driver (requires 1) Heuristic 3: passive replicas (limit CPU load)
41
41 DRAFTS Total Orders For each processor and for each channel find a total order that is compatible with the partial order of FTS Prototype: “any compatible total order”
42
42 DRAFTS Schedule optimization Exploit architectural redundancy as a performance boost (in absence of faults) –replicas overloading and deallocation –passive replicas –graceful degradation: reduced functionality (and resource demands) under faults
43
43 DRAFTS Active Replicas DA B C Behavior: Architecture: Active Replication: P2P1 A B C D A B C D CPU
44
44 DRAFTS Deallocation & Degradation DA B C Behavior: Architecture: Deallocation: P2 P1 A B C D A B C D DD B->DC->D C1C2 K P KP CPU
45
45 DRAFTS Aggressive Heuristics Some heuristics can be certified to not break fault-tolerance/fault behavior Others may need verification of the results –E.g. human inspection and modification
46
46 DRAFTS (Off-line) Verification Functional Verification –For each failure pattern the corresponding functionality is correctly executed Timing Verification/Analysis –Worst case iteration time under each fault
47
47 DRAFTS Functional Verification Apply equivalence checking methods to FT Schedule, under all fault scenarios (failure patterns) Based on application DAGs & Architecture graph
48
48 DRAFTS Functional Verification (example) Input receiver (requires 1) Sensor1 Arbiter Actuator2 Function1 (required) Sensor2 Output driver (requires 1) Function2 (optional) Actuator1 Task Graph P ECU0 Bus channel0 Bus channel1 P ECU1 Sensor1 Actuator2 Function1 (required) Sensor2 Function2 (optional) Actuator1 FT Schedule (no fails) I/O drivers must agree on the value to pass form the sensors, and the value to pass to the actuators. ECU0ECU1 Input receiver (requires 1) Arbiter Output driver (requires 1) Sensor1 Function1 (required) Function2 (optional) Actuator1 FT Schedule (ECU1 fails) I/O drivers timeout and pass values directly. Function2 could be deallocated if necessary as the arbiter only needs Function1 ECU0 Input receiver (requires 1) Arbiter Output driver (requires 1) Actuator2 Function1 (required) Sensor2 FT Schedule (ECU0 fails) I/O drivers timeout and pass values directly Function2 is not present but the arbiter will pass data anyway (beyond dataflow model) ECU1 Input receiver (requires 1) Arbiter Output driver (requires 1) Source: Sam Williams
49
49 DRAFTS Functional Verification (example - continued) Input receiver (requires 1) Sensor1 Arbiter Function1 (required) Sensor2 Output driver (requires 1) Function2 (optional) Actuator1 Task Graph – Actuator1 Input receiver (requires 1) Sensor1 Arbiter Function1 (required) Sensor2 Output driver (requires 1) Function2 (optional) Actuator2 Task Graph – Actuator2 [ECU0]Sensor1 [ECU1]Actuator2 [ECU0]Function1 (required) [ECU1]Sensor2 [ECU0]Function2 (optional) [ECU0]Actuator1 [ECU0]Input receiver (requires 1) [ECU0]Arbiter [ECU0]Output driver (requires 1) [ECU1]Function1 (required) [ECU1]Arbiter [ECU1]Input receiver (requires 1) [ECU1]Output driver (requires 1) ? For the full functionality case, the arbiter must include both functions. The output function only requires one of the actuators be visible. In the other graphs (which include failures), the arbiter only needs the single required input (Function1) Source: Sam Williams
50
50 DRAFTS F.Verification comments Takes milliseconds to run small cases. Few minutes for large schedules Tool was written in PERL (performance was sufficient) Schedule Verification is performed offline (not time critical) Credits: Sam Williams
51
51 DRAFTS Conclusions Contributions –Programming Model FTDF –Metropolis platform –Schedule synthesis tool (in collaboration with INRIA) –Schedule optimization strategy –Functional verification (in collaboration with Sam Williams) –Replica determinism analysis (not shown here)
52
52 DRAFTS Future Work Experiments on DBW example Timing verification (realtime calculus) Interface/migrate the synthesis and verification tools with/to Metropolis Integrate optimization in synthesis Code generation (in collaboration with Mark McKelvin)
53
53 DRAFTS DBW example
54
54 DRAFTS Now… Interested in helping out?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.