Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 DRAFTS DRAFTS Distributed Real-time Applications Fault Tolerant Scheduling Claudio Pinello

Similar presentations


Presentation on theme: "1 DRAFTS DRAFTS Distributed Real-time Applications Fault Tolerant Scheduling Claudio Pinello"— Presentation transcript:

1 1 DRAFTS DRAFTS Distributed Real-time Applications Fault Tolerant Scheduling Claudio Pinello (pinello@eecs.berkeley.edu)pinello@eecs.berkeley.edu

2 2 DRAFTS Motivation Drive-by-Wire applications

3 3 DRAFTS Motivation No rods  increased passive safety Interior design freedom BMW, Daimler, Cytroen, Chrysler, Bertone, SKF, etc…

4 4 DRAFTS Problem Overview Fault tolerance: redundancy is key Safety: system failure must be as unlikely as in traditional systems

5 5 DRAFTS Faults SW faults: bugs –can be reduced by disciplined coding –even better by code generation HW faults –harsh environment –many units (>50 uProcessors in a car; subsystems with 10-15 uP’s)

6 6 DRAFTS Fault Model Silent Faults –faults result in omission errors Detectable Faults –faults result in detectably corrupted data (e.g. CRC-protected channels) Non-silent Faults –faults result in value errors Byzantine Faults –malicious attacks, non-silent faults, unbounded delays, etc…

7 7 DRAFTS Software Redundancy Space redundancy –execute replicas on different HW –send results on different/multiple channels

8 8 DRAFTS N-copies Solution Pros: –reduced cost Cons: –degradation, 1x speed –multiple designs Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant AbstractinputAbstractOut Iterator Plant AbstractinputAbstractOut Iterator Plant Pros: –design once Cons: –N-x costs, 1x speed

9 9 DRAFTS Redundancy Management Managing a distributed system with multiple results requires careful programming –keep N-copies synchronized –exchange and apply results –detect and isolate faults –recover

10 10 DRAFTS Possible solutions Off-The-Shelf solutions TTP-based architectures FT-CORBA middle- ware Synthesis Debugged and portable libraries Development tools

11 11 DRAFTS Automotive Domain Production costs dominate NRE costs –multi-vendor supply-chain –interest in full utilization of architectures Validation and certification are critical –validate process –validate product

12 12 DRAFTS Shortcomings of OTS solutions TTP –proprietary communication network –network redundancy default is 2-way –active replication  potential underutilization of resources FT CORBA –fairly large overhead middleware

13 13 DRAFTS Synthesis-based Solution Synthesize only needed glue-code –at the extreme: get rid of OS Customizable replication mechanisms –use passive replicas Treat architecture as a distributed execution machine –exploit parallelism to speed up execution

14 14 DRAFTS Schedule Synthesis Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant CPU Mapping CPU Act Input ArbiterBest Sens Input CoarseCTRL FineCTRL Act Output ArbiterBest

15 15 DRAFTS Synthesis-based Solution Enables fast architecture exploration

16 16 DRAFTS Contributions Programming Model Metropolis platform Schedule synthesis tool and optimization strategy Verification Tools

17 17 DRAFTS Programming Model Definition of a programming model that –Is amenable to specifying feedback controllers –Is convenient for analysis, simulation and synthesis –Supports degraded functionality/accuracy –Supports redundancy –Deterministic

18 18 DRAFTS Static Data-flow Model Pros: –Deterministic behavior Actors perform deterministic computation (no internal states) Requires all inputs to fire an actor –Explicit parallelism –Good for periodic algorithms Shortcomings: –Requires all inputs to fire an actor, but source actors may fail! A B C

19 19 DRAFTS Pendulum Example Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant Bang-Bang Linear

20 20 DRAFTS Model Extensions Node Criticality Node Typing (sensor, input, arbiter, etc.) Some types (input and arbiter) can fire with missing inputs Tokens have “Epoch” and “Valid” fields Specialized single-place buffer links –manage redundant sources (and destinations)

21 21 DRAFTS Data Tokens: Epoch iteration index of the periodic algorithm Actors ask for “current” inputs Using >= we can account for missing results (self-synchronization) EpochDataValid

22 22 DRAFTS Data Tokens: Valid Valid models the effect of fault detection: –True: data was received/produced correctly –False: data was not received on time or was corrupted Firing rules (and actors) may use it to change their behavior EpochDataValid

23 23 DRAFTS FTDataFlow modeling Metropolis used as framework to develop the set of tools FTDF is a platform library in Metropolis –modeling, simulation, fault injection –supports semi-automatic replication –results visualization

24 24 DRAFTS Actor Classes DF_SENactor sensor actor DF_INactor input actor DF_AINactor abstract input actor DF_FUNactor data-flow actor DF_ARBactor arbiter actor DF_AOUTactor abstract output actor DF_OUTactor output actor DF_ACTactor actuator actor DF_MEM state memory DF_Injector fault injection

25 25 DRAFTS Pendulum Example Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant Inject

26 26 DRAFTS Simulation output Fault

27 27 DRAFTS Summary on FTDF Extended SDF to deal with –missing/redundant inputs –different criticality –functionality types Developed Metropolis platform –modeling, simulation, fault-injection, visualization of results –support for adding redundancy

28 28 DRAFTS Architecture Model Architecture –Connectivity: bipartite graph –Computation and communication times: actor/cpu data/channel matrices of execution and transmission times Same as SynDEx model CPU

29 29 DRAFTS Fault Behavior Failure patterns –Subsets of Arch-Graph that may fail simultaneously For each failure pattern specify criticality level –i.e. which functionalities must be guaranteed –typically for empty failure pattern all functionality must be guaranteed

30 30 DRAFTS Synthesis Problem Given –Application –Architecture –Fault Behavior Derive –Redundancy –Schedule Abstractinput FineCTRL ArbiterBestAbstractOut Iterator CoarseCTRL Plant CPU Mapping CPU Act Input ArbiterBest Sens Input CoarseCTRL FineCTRL Act Output ArbiterBest

31 31 DRAFTS Pendulum Example Actuator/Sensor location Tolerate any single fault –{empty} all functionality –{one CPU} may drop FineController, and sensor/actuator on that CPU –{one Channel} may drop FineController CPU Sens Act Sens Act

32 32 DRAFTS Refined I/O FineCTRL Iterator CoarseCTRL Sens Act Plant Input ArbiterBest Output

33 33 DRAFTS Full Replication

34 34 DRAFTS Simulation output

35 35 DRAFTS Schedule Synthesis Strategy Leverage existing dataflow scheduling tools (e.g. SynDEx) to achieve a distributed static schedule that is also fault-tolerant At design time (off-line) –devise redundant schedule At run-time –trivial reconfiguration: skip actors that cannot fire

36 36 DRAFTS Generating Schedules 1.Full architecture 1.Schedule all functionalities 2.For each failure pattern 1.Mark the faulty architecture components (critical functionalities cannot run there) 2.Schedule all functionalities 3.Merge the schedules Maximum performance Add redundancy

37 37 DRAFTS Generating Schedules 1.Full architecture 1.Schedule all functionalities 2.For each failure pattern 1.Mark the faulty architecture components 2.Schedule the critical functionalities 3.Merge the schedules

38 38 DRAFTS Merge into FTS Care must be taken to deal with multiple routings, clear non optimality [ECU0]Sensor1 [ECU1]Actuator2 [ECU1]Sensor2 [ECU0]Function2 (optional) [ECU0]Actuator1 [ECU0]Arbiter [ECU0]Output driver (requires 1) [ECU1]Function1 (required) [ECU1]Input receiver (requires 1) [ECU1]Arbiter [ECU1]Output driver (requires 1) [ECU0]Function1 (required) [ECU0]Input receiver (requires 1) [ECU0]Function2 (optional)

39 39 DRAFTS Heuristic 1: Limit CPU Load 1.Full architecture 1.Schedule all functionalities 2.For each failure pattern 1.Mark the faulty architecture components (critical functionalities cannot run there) 2.Re-schedule only critical functionalities (constrain non critical as in full architecture) 3.Merge the schedules Redundancy for critical only

40 40 DRAFTS Heuristic 2: Limit Bus Load Prune redundant communication [ECU0]Sensor1 [ECU1]Actuator2 [ECU1]Sensor2 [ECU0]Function2 (optional) [ECU0]Actuator1 [ECU0]Arbiter [ECU0]Output driver (requires 1) [ECU1]Function1 (required) [ECU1]Input receiver (requires 1) [ECU0]Function1 (required) [ECU0]Input receiver (requires 1) [ECU1]Arbiter [ECU1]Output driver (requires 1) Heuristic 3: passive replicas (limit CPU load)

41 41 DRAFTS Total Orders For each processor and for each channel find a total order that is compatible with the partial order of FTS Prototype: “any compatible total order”

42 42 DRAFTS Schedule optimization Exploit architectural redundancy as a performance boost (in absence of faults) –replicas overloading and deallocation –passive replicas –graceful degradation: reduced functionality (and resource demands) under faults

43 43 DRAFTS Active Replicas DA B C Behavior: Architecture: Active Replication: P2P1 A B C D A B C D CPU

44 44 DRAFTS Deallocation & Degradation DA B C Behavior: Architecture: Deallocation: P2 P1 A B C D A B C D DD B->DC->D C1C2 K P KP CPU

45 45 DRAFTS Aggressive Heuristics Some heuristics can be certified to not break fault-tolerance/fault behavior Others may need verification of the results –E.g. human inspection and modification

46 46 DRAFTS (Off-line) Verification Functional Verification –For each failure pattern the corresponding functionality is correctly executed Timing Verification/Analysis –Worst case iteration time under each fault

47 47 DRAFTS Functional Verification Apply equivalence checking methods to FT Schedule, under all fault scenarios (failure patterns) Based on application DAGs & Architecture graph

48 48 DRAFTS Functional Verification (example) Input receiver (requires 1) Sensor1 Arbiter Actuator2 Function1 (required) Sensor2 Output driver (requires 1) Function2 (optional) Actuator1 Task Graph  P ECU0 Bus channel0 Bus channel1  P ECU1 Sensor1 Actuator2 Function1 (required) Sensor2 Function2 (optional) Actuator1 FT Schedule (no fails) I/O drivers must agree on the value to pass form the sensors, and the value to pass to the actuators. ECU0ECU1 Input receiver (requires 1) Arbiter Output driver (requires 1) Sensor1 Function1 (required) Function2 (optional) Actuator1 FT Schedule (ECU1 fails) I/O drivers timeout and pass values directly. Function2 could be deallocated if necessary as the arbiter only needs Function1 ECU0 Input receiver (requires 1) Arbiter Output driver (requires 1) Actuator2 Function1 (required) Sensor2 FT Schedule (ECU0 fails) I/O drivers timeout and pass values directly Function2 is not present but the arbiter will pass data anyway (beyond dataflow model) ECU1 Input receiver (requires 1) Arbiter Output driver (requires 1) Source: Sam Williams

49 49 DRAFTS Functional Verification (example - continued) Input receiver (requires 1) Sensor1 Arbiter Function1 (required) Sensor2 Output driver (requires 1) Function2 (optional) Actuator1 Task Graph – Actuator1 Input receiver (requires 1) Sensor1 Arbiter Function1 (required) Sensor2 Output driver (requires 1) Function2 (optional) Actuator2 Task Graph – Actuator2 [ECU0]Sensor1 [ECU1]Actuator2 [ECU0]Function1 (required) [ECU1]Sensor2 [ECU0]Function2 (optional) [ECU0]Actuator1 [ECU0]Input receiver (requires 1) [ECU0]Arbiter [ECU0]Output driver (requires 1) [ECU1]Function1 (required) [ECU1]Arbiter [ECU1]Input receiver (requires 1) [ECU1]Output driver (requires 1)  ? For the full functionality case, the arbiter must include both functions. The output function only requires one of the actuators be visible. In the other graphs (which include failures), the arbiter only needs the single required input (Function1) Source: Sam Williams

50 50 DRAFTS F.Verification comments Takes milliseconds to run small cases. Few minutes for large schedules Tool was written in PERL (performance was sufficient) Schedule Verification is performed offline (not time critical) Credits: Sam Williams

51 51 DRAFTS Conclusions Contributions –Programming Model FTDF –Metropolis platform –Schedule synthesis tool (in collaboration with INRIA) –Schedule optimization strategy –Functional verification (in collaboration with Sam Williams) –Replica determinism analysis (not shown here)

52 52 DRAFTS Future Work Experiments on DBW example Timing verification (realtime calculus) Interface/migrate the synthesis and verification tools with/to Metropolis Integrate optimization in synthesis Code generation (in collaboration with Mark McKelvin)

53 53 DRAFTS DBW example

54 54 DRAFTS Now… Interested in helping out?


Download ppt "1 DRAFTS DRAFTS Distributed Real-time Applications Fault Tolerant Scheduling Claudio Pinello"

Similar presentations


Ads by Google