Download presentation
Presentation is loading. Please wait.
1
1 of 14 1/14 Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping University, Sweden
2
2 of 14 2/14 Motivation Hard real-time applications Timing constraints Cost constraints Online preemptive Flexible Off-line non-preemptive Predictable vs. Faults Predictable Transient Intermittent Hardware solutions MARS, TTA, X-by-Wire Permanent faults Costly for transient faults Software solutions Re-execution/rollback recovery Checkpointing/rollback recovery Replication, primary-backup… vs. Software solutions Re-execution/rollback recovery Checkpointing/rollback recovery Replication, primary-backup…
3
3 of 14 3/14 Outline Motivation System architecture and fault-model Fault-tolerance techniques Problem formulation Motivational examples Tabu-search optimization strategy Experimental results Contributions and Message
4
4 of 14 4/14 Processes: Static cyclic scheduling Fault-Tolerant Time-Triggered Systems... Transient faults Processes: Re-execution and replication S1S1 S3S3 S2S2 S4S4 S1S1 S3S3 S2S2 S4S4 TDMA Round Cycle of two rounds Slot Time Triggered Protocol (TTP) Bus access scheme: time-division multiple-access (TDMA) Schedule table located in each TTP controller: message descriptor list (MEDL) Messages: Static schedule table Messages: Fault-tolerant protocol
5
5 of 14 5/14 Fault-Tolerant Techniques P1P1 P1P1 P1P1 Re-execution N1N1 P1P1 P1P1 P1P1 Replication N1N1 N2N2 N3N3 P1P1 P1P1 N1N1 N2N2 P1P1 Re-executed replicas 2
6
6 of 14 6/14 Problem Formulation Given Fault model Number of transient faults in the system period System architecture Application WCETs, message sizes, periods, deadlines Application: set of process graphsArchitecture: time-triggered system Fault-model: transient faults Determine Schedulable and fault-tolerant design implementation Fault-tolerance policy assignment Mapping of processes and messages Schedule tables for processes and messages
7
7 of 14 7/14 Static Scheduling [Kandasamy et al. 03] P1P1 N 1 : S 1 N 2 : S 11 N 3 : S 14 P2P2 P4P4 P5P5 P3P3 m1m1 m2m2 2 P2P2 P4P4 P3P3 P5P5 P1P1 m1m1 m2m2 N1N1 N2N2 N3N3 S 11 S 12 S 13 P1P1 P1P1 N2N2 S1S1 S2S2 S3S3 P2P2 P2P2 S4S4 P3P3 S5S5 S6S6 S7S7 S8S8 S 10 S9S9 P4P4 P3P3 P4P4 P3P3 P4P4 P4P4 N1N1 S 14 S 15 S 18 P5P5 P5P5 N3N3 Root schedules P1P1 N 1 : S 2 N 2 : S 12 N 3 : S 14 P2P2 P4P4 P5P5 P3P3 m1m1 m2m2 P1P1 Contingency schedules S1S1 S 11 S2S2 S 12 P2P2 Contingency schedules Transparent re-execution Recovery slack
8
8 of 14 8/14 Re-execution vs. Replication N1N1 N2N2 P1P1 P3P3 P2P2 m1m1 1 P1P1 P2P2 P3P3 N1N1 N2N2 4050 40 60 50 70 N1N1 N2N2 TTP P1P1 P2P2 S1S1 S2S2 P3P3 Met A1A1 N1N1 N2N2 TTP P1P1 P2P2 P3P3 S1S1 S2S2 Missed P1P1 N1N1 N2N2 TTP P1P1 P2P2 P2P2 P3P3 P3P3 S1S1 S2S2 m1m1 m1m1 m2m2 m2m2 Deadline Met P1P1 P3P3 P2P2 m1m1 m2m2 A2A2 Replication is better P1P1 S1S1 N1N1 N2N2 TTP P1P1 S2S2 P2P2 P2P2 P3P3 P3P3 Deadline Missed m1m1 m1m1 Re-execution is better
9
9 of 14 9/14 P1P1 N1N1 N2N2 TTP P2P2 P3P3 S1S1 S2S2 P4P4 m2m2 Missed Fault-Tolerant Policy Assignment P1P1 P2P2 P3P3 N1N1 N2N2 4050 60 80 P4P4 4050 1 N1N1 N2N2 P1P1 P4P4 P2P2 P3P3 m1m1 m2m2 m3m3 P1P1 N1N1 N2N2 TTP P2P2 P3P3 S1S1 S2S2 m2m2 P4P4 N1N1 N2N2 P1P1 P3P3 S1S1 S2S2 P4P4 P2P2 P1P1 m1m1 m1m1 m2m2 m2m2 P2P2 m3m3 m3m3 P3P3 P4P4 Missed P1P1 N1N1 N2N2 TTP P2P2 P3P3 S1S1 S2S2 m2m2 P4P4 No fault-tolerance: application crashes N1N1 N2N2 P1P1 P3P3 S1S1 S2S2 P4P4 P2P2 P1P1 m2m2 m1m1 TTP Met Optimization of fault-tolerance policy assignment Deadline
10
10 of 14 10/14 Mapping and Fault-Tolerance P1P1 P4P4 P2P2 P3P3 m1m1 m2m2 m3m3 m4m4 P1P1 P2P2 P3P3 P4P4 N1N1 N2N2 40X 60 4040 70 X 1 N1N1 N2N2 P1P1 N1N1 N2N2 TTP P2P2 P3P3 S1S1 S2S2 m2m2 P4P4 m4m4 Best mapping without considering fault-tolerance Deadline Missed P1P1 N1N1 N2N2 TTP P2P2 P3P3 S1S1 S2S2 P4P4 m4m4 m2m2 P1P1 N1N1 N2N2 P2P2 P3P3 S1S1 S2S2 m4m4 m2m2 P4P4 Deadline Met Simultaneous mapping and fault-tolerance
11
11 of 14 11/14 Optimization Strategy Design optimization: Fault-tolerance policy assignment Mapping of processes and messages Root schedules Three tabu-search optimization algorithms: 1.Mapping and Fault-Tolerance Policy assignment (MRX) Re-execution, replication or both 2.Mapping and only Re-Execution (MX) 3.Mapping and only Replication (MR) Tabu-search List scheduling
12
12 of 14 12/14 MRX Tabu-Search Example P1P1 P2P2 P3P3 N1N1 N2N2 4050 60 75 P4P4 4050 1 N1N1 N2N2 P1P1 P4P4 P2P2 P3P3 m1m1 m2m2 m3m3 Design transformations
13
13 of 14 13/14 80 20 Experimental Results 0 10 30 40 50 60 70 90 100 20406080100 80 Mapping and replication (MR) 20 Mapping and re-execution (MX) Mapping and policy assignment (MRX) Number of processes Avgerage % deviation from MRX Schedulability improvement under resource constraints Case study Vehicle cruise controller MRX: schedulable fault-tolerant application with 65% overhead
14
14 of 14 14/14 Contributions and Message Contributions Combined re-execution and replication Optimization algorithms for fault-tolerance policy assignment Efficient contingency schedule generation Optimization of fault-tolerance policy assignment needed for cost-effective fault tolerance
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.