Download presentation
Presentation is loading. Please wait.
1
1 of 14 1 Fault-Tolerant Embedded Systems: Scheduling and Optimization Viacheslav Izosimov, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping University, Sweden Paul Pop Computer Science & Engineering Group Technical University of Denmark, Denmark
2
2 of 14 2 Hard real-time applications Time-constrained Cost-constrained Fault-tolerant etc. Motivation Focus on transient faults and intermittent faults
3
3 of 14 3 Outline Motivation Background and limitations of previous work Contributions: Scheduling with fault tolerance requirements Trading-off transparency for performance Fault tolerance policy assignment Mapping optimization with transparency Checkpoint optimization Current work
4
4 of 14 4 P1P1 0 20 40 60 N1N1 P1P1 2 P1P1 P1P1 1 2 P1P1 P1P1 P 1/1 Fault Tolerance Techniques Error-detection overhead N1N1 Re-execution Checkpointing overhead P1P1 P1P1 1 2 N1N1 Rollback recovery with checkpointing Recovery overhead P 1(1) P 1(2) N1N1 N2N2 P 1(1) P 1(2) N1N1 N2N2 Active replication P 1/2 P1P1 1 P 1/1 2 P 1/2 2 N1N1 1
5
5 of 14 5 Fault-Tolerant Time-Triggered Systems Processes: Re-execution, Active Replication, Rollback Recovery with Checkpointing Messages: Fault-tolerant predictable protocol … Transient faults P2P2 P4P4 P3P3 P5P5 P1P1 m1m1 m2m2 Maximum k transient faults within each application run (system period) Static Scheduling
6
6 of 14 6 Limitations of Previous Work Design optimization with fault tolerance is limited Process mapping is not considered together with fault tolerance issues Multiple faults are not addressed in the framework of static cyclic scheduling Transparency, if at all addressed, is restricted to a whole computation node
7
7 of 14 7 Scheduling with Fault Tolerance Reqirements
8
8 of 14 8 P1P1P1P1 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 k = 2 020406080100120140160180200
9
9 of 14 9 P1P1P1P1 P2P2P2P2 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 40 k = 2 020406080100120140160180200
10
10 of 14 10 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 45 40 k = 2 P1P1P1P1 P 1/1 P 1/2 020406080100120140160180200
11
11 of 14 11 P2P2P2P2 020406080100120140 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 45 40 k = 2 160180200 90 130 P 1/1 P 1/2 P 1/3
12
12 of 14 12 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 45 40 k = 2 90 13014085 020406080100120140160180200 P 1/1 P 1/2 P2P2P2P2 P 2/1 P 2/2
13
13 of 14 13 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 45 40 k = 2 90 130150 95 14085 020406080100120140160180200 P1P1P1P1 P2P2P2P2 P 2/1 P 2/2 P 2/3
14
14 of 14 14 Conditional Schedule Table true P1P1 0 m1m1 P2P2 45 40 50 90 130 140160 105 150 85 95 P1P1P1P1 P2P2P2P2 m1m1 k = 2 N1N1 N2N2
15
15 of 14 15 Conditional Scheduling Conditional scheduling: Generates short schedules – Scheduling algorithm is very slow – Requires a lot of memory to store schedule tables Alternative: shifting-based scheduling Messages sent over the bus should be scheduled at one time Faults on one computation node must not affect other nodes Requires less memory Schedule generation is very fast – Schedules are longer
16
16 of 14 16 Root Schedules P2P2 m1m1 P1P1 m2m2 m3m3 P4P4 P3P3 N1N1 N2N2 Bus P1P1 P1P1 Worst-case scenario for P 1 Recovery slack for P 1 and P 2
17
17 of 14 17 Extracting Execution Scenarios P2P2 m1m1 P1P1 m2m2 m3m3 P 4/1 P3P3 N1N1 N2N2 Bus P 4/2 P 4/3
18
18 of 14 18 Trading-off Transparency for Performance
19
19 of 14 19 Good for debugging and testing FT Implementations with Transparency P2P2P2P2 P4P4P4P4 P3P3P3P3 P5P5P5P5 P1P1P1P1 m1m1 m2m2 – regular processes/messages – frozen processes/messages P3P3P3P3 Frozen Transparency is achieved with frozen processes and messages
20
20 of 14 20 N1N1 N2N2 P1P1 P2P2 P3P3 N1N1 30X 20 X X N2N2 P4P4 X30 = 5 ms k = 2 No Transparency Deadline P2P2P2P2 P1P1P1P1 P4P4P4P4 m2m2 m1m1 m3m3 P3P3P3P3 P2P2P2P2 m1m1 P1P1P1P1 m2m2 m3m3 P4P4P4P4 P3P3P3P3 no fault scenario N1N1 N2N2 bus P2P2P2P2 m1m1 P1P1P1P1 m2m2 P4P4P4P4 P3P3P3P3 P1P1P1P1 P4P4P4P4 m3m3 the worst-case fault scenario N1N1 N2N2 bus processes start at different times messages are sent at different times
21
21 of 14 21 Full Transparency Customized Transparency P2P2P2P2 P1P1P1P1 m2m2 m3m3 P3P3P3P3 P3P3P3P3 m1m1 P4P4P4P4 P3P3P3P3 Customized transparency P2P2P2P2 m1m1 P1P1P1P1 P4P4P4P4 m2m2 m3m3 P3P3P3P3 no fault scenario P2P2P2P2 m1m1 P1P1P1P1 P1P1P1P1 P1P1P1P1 P4P4P4P4 m2m2 m3m3 P3P3P3P3 P2P2P2P2 m1m1 P1P1P1P1 m2m2 P4P4P4P4 P3P3P3P3 P1P1P1P1 P4P4P4P4 m3m3 No transparency Deadline P2P2P2P2 m1m1 P1P1P1P1 P4P4P4P4 m2m2 m3m3 P3P3P3P3 P3P3P3P3 P3P3P3P3 Full transparency
22
22 of 14 22 Trading-Off Transparency for Performance 0% 25% 50% 75% 100% k=1k=2k=3k=1k=2k=3k=1k=2k=3k=1k=2k=3k=1k=2k=3 20 244463326092397411548831334886139 40 172943204058284972346090396697 60 122434133043193958285479325886 80 81622101829142739244166274373 Four (4) computation nodes Recovery time 5 ms Trading transparency for performance is essential 2940496066 How longer is the schedule length with fault tolerance? increasing transparency
23
23 of 14 23 Fault Tolerance Policy Assignment
24
24 of 14 24 Fault Tolerance Policy Assignment P 1/1 P 1/2 P 1/3 Re-execution N1N1 P 1(1) P 1(2) P 1(3) Replication N1N1 N2N2 N3N3 P 1(1)/1 P 1(2) N1N1 N2N2 P 1(1)/2 Re-executed replicas 2
25
25 of 14 25 Re-execution vs. Replication N1N1 N2N2 P1P1 P3P3 P2P2 m1m1 1 P1P1 P2P2 P3P3 N1N1 N2N2 4050 40 60 50 70 A1A1 P1P1 P3P3 P2P2 m1m1 m2m2 A2A2 N1N1 N2N2 bus P1P1 P2P2 P3P3 Missed Deadline P 1(1) N1N1 N2N2 bus P 1(2) P 2(1) P 2(2) P 3(1) P 3(2) Met m 1(2) m 1(1) m 2(2) m 2(1) Replication is better N1N1 N2N2 bus P1P1 P2P2 P3P3 Met Deadline P 1(1) N1N1 N2N2 bus P 1(2) P 2(1) P 2(2) P 3(1) P 3(2) Missed m 1(2) m 1(1) Re-execution is better
26
26 of 14 26 P1P1 N1N1 N2N2 bus P2P2 P3P3 P4P4 m2m2 Missed N1N1 N2N2 P 1(1) P 3(2) P 4(1) P 2(1) P 1(2) bus P 2(2) P 3(1) P 4(2) Missed m 1(2) m 1(1) m 2(1) m 2(2) m 3(1) m 3(2) N1N1 N2N2 P 1(1) P3P3 P4P4 P2P2 P 1(2) m 2(1) m 1(2) bus Met Optimization of fault tolerance policy assignment Fault Tolerance Policy Assignment P1P1 P2P2 P3P3 N1N1 N2N2 4050 60 80 P4P4 4050 1 N1N1 N2N2 P1P1 P4P4 P2P2 P3P3 m1m1 m2m2 m3m3 Deadline
27
27 of 14 27 Optimization Strategy Design optimization: Fault tolerance policy assignment Mapping of processes and messages Root schedules Three tabu-search optimization algorithms: 1.Mapping and Fault Tolerance Policy assignment (MRX) Re-execution, replication or both 2.Mapping and only Re-Execution (MX) 3.Mapping and only Replication (MR) Tabu-search Shifting-based scheduling
28
28 of 14 28 80 20 Experimental Results 0 10 30 40 50 60 70 90 100 20406080100 80 Mapping and replication (MR) 20 Mapping and re-execution (MX) Mapping and policy assignment (MRX) Number of processes Avgerage % deviation from MRX Schedulability improvement under resource constraints
29
29 of 14 29 Checkpoint Optimization
30
30 of 14 30 3 Locally Optimal Number of Checkpoints 1 = 15 ms k = 2 1 = 5 ms 1 = 10 ms P1P1 C 1 = 50 ms No. of checkpoints 2 P1P1 P1P1 12 P1P1 P1P1 P1P1 123 4 P1P1 P1P1 P1P1 P1P1 1234 1 P1P1 5 P1P1 P1P1 P1P1 P1P1 P1P1 12345 3
31
31 of 14 31 Globally Optimal Number of Checkpoints P2P2 P1P1 m1m1 105 5 P1P1 P2P2 P1P1 C 1 = 50 ms P2P2 C 2 =60 ms k = 2 265 P1P1 P1P1 P1P1 123 P2P2 P2P2 P2P2 123 P1P1 P2P2 P2P2 P1P1 1212 255
32
32 of 14 32 0% 10% 20% 30% 40% 406080100 Global Optimization of Checkpoint Distribution (MC) % deviation from MC0 ( how smaller the fault tolerance overhead ) Application size (the number of tasks) 4 nodes, 3 faults Local Optimization of Checkpoint Distribution (MC0) Global Optimization vs. Local Optimization Does the optimization reduce the fault tolerance overheads on the schedule length?
33
33 of 14 33 Conclusions Scheduling with fault tolerance requirements Two novel scheduling techniques Handling customized transparency requirements Design optimization with fault tolerance Policy assignment optimization strategies Checkpoint optimization
34
34 of 14 34 To be continued… Design Optimization and Scheduling of Mixed Soft-Hard Real-Time Systems Soft Processes Soft Deadlines Hard Processes Hard Deadlines
35
35 of 14 35 Design Optimization of Embedded Systems with Fault Tolerance is Essential
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.