Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Fault Tolerance (SWFT) How to Design, Develop and Evaluate Robust SW and OS’s Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de.

Similar presentations


Presentation on theme: "Software Fault Tolerance (SWFT) How to Design, Develop and Evaluate Robust SW and OS’s Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de."— Presentation transcript:

1 Software Fault Tolerance (SWFT) How to Design, Develop and Evaluate Robust SW and OS’s Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Prof. Neeraj Suri Abdelmajid Khelil (Majid) Constantin Sârbu (Dinu) Brahim Ayari Dept. of Computer Science TU Darmstadt, Germany

2 2 Outline of today’s lecture  Course info  Course goals  Research related to course © DEEDS Group SWFT WS ‘07

3 3 Related Courses  Lectures  SW/OS Fault-Tolerance  Kanonik: Introduction to Trusted Systems  Seminars  Embedded Mobile Computing  Secure and Reliable OS  Labs  Selected Topics in Dependable SW & Mobile Computing

4 4 Course Info  Lecture (in English) Wed. (11:40am - 1:20pm), C120  Exercises (in E & G): Thu., 3. DS (10:45-13:20), C110 Starts 25 th 2007  Course webpage: http://www.deeds.informatik.tu-darmstadt.de © DEEDS Group SWFT WS ‘07

5 5 Grading Related  Credit points: 7.5 - SWS: 5 (2+3)  Exam  Mid-term exam: 25% (E or G), December 17th, 2007  Presentations: 24% (E or G)  Final exam: 51% (E or G)  Exercises: (E or G)  Practice + presentations  Lab stuff: Optional  Do some live programming  Gain some practical experience  Please take this opportunity! May improve your grade (bonus points) If you have a suggestion for a lab discuss it with us! © DEEDS Group SWFT WS ‘07

6 6 Learn more...  We have a selection of sub-projects related to this lecture  will be targeted to interests of students  See example  See slides research@DEEDS  We offer  Bachelor/masters theses  HiWi  Fun © DEEDS Group SWFT WS ‘07

7 7 Course Goals  Learn software fault tolerance concepts  Learn how to develop robust programs  how to deal with software bugs  software fault tolerance: continuation of service in the face of failures  Learn concepts and mechanisms  to build software fault tolerance tools  Learn how to evaluate and test robust SW/OS  Learn some SW issues related to (a) mobile SW and (b) security © DEEDS Group SWFT WS ‘07

8 8 Course Outline 1.Introduction/Concepts of SWFT 2.SW-FT Mechanisms: Design Aspects  Process pairs, selective retries, graceful degradation,…  Checkpointing, N-copy programming (NCP), N-version Programming (NVP), micro-reboots,...  Robust programming, … 3.Evaluation of fault-tolerant SW & OS’s  SW reliability  SW/OS stress testing  Hardening of OS’s, Patching  OS Driver profiling and testing 4.Transactional/Mobile SW  Mobile transcation (FT, recovery..), Wireless sensor networks (Energy-efficient FT, spatial/temporal redundancy..) 5.SW and Security: Buffer overflows etc © DEEDS Group SWFT WS ‘07

9 9 Literature  Most lectures will be based on research papers:  URLs of papers available via class page  References on slides (available on web)  Coverage for exams is primarily (a) the lecture content and (b) issues covered over the Exercises....so attending is important © DEEDS Group SWFT WS ‘07

10 10 Research@DEEDS Related to Course DEEDS: Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Dependable Embedded Systems & Software (DEEDS)

11 11 What isn’t an Embedded System?  response motes sensors UAV’s XBW FBW

12 12 The Spread: Dependability, Safety, Security  X-by-Wire: Safety-Service Critical Systems (Aerospace/Automotive)  System Architecture Design  Protocols (Synchronous, Membership, Diagnosis, Recovery, Scheduling)  Dependability Evaluation  Verification & Validation (V&V) Experimental fault injection (PROPANE) Formal Methods  Distributed Systems: Byzantine Consensus, Failure Detectors, Verification  Mobile/WSN Networks: Fault-tolerant protocols, routing, reliability analysis  CPU Architectures: Energy-efficient FT, Transient Resilience, …  Operating Systems: OS Robustness Evaluation, Driver Testing & Evaluation, Vulnerability Profiling, Embedded/Desktop OS, …

13 13 Complexity (Devices, Systems) Within a Domain in-vehicle networks aircraft flight control

14 14 Automotive/Aerospace (Federated Systems) Dist. Resources Nodes + Comm. Comm.DiagnosticsSteeringEnv. Ctrl.Braking Engine/Flight Control NavigationUser I/OMultimediaBody Elec. Applications Middleware Resources multiple nodes, varied criticality buses, clusters, bridges (HW, SW), …

15 15 Re-usable Core (technology + domain invariant) Core Services MW + Arch Platforms Shared, Distributed/Networked High-level Services app 1 app 2 app n Managing Complexity: Federated to Integrated Applications will change  Multi-Domain Solutions: Automotive, Aerospace, Control  Compositional Framework (+Tools)  Integrates diverse criticality apps  Delineation over integration for functionality and safety  Flexible building blocks & interfaces Technologies will change  Benefits  Design flexibility, short time- to-market  Reduced number of nodes  Reduced complexity and cost

16 16 P1: X-by-Wire Protocols  On-Line Diagnosis  Enhance sustained autonomic system operations  Self-healing On-line recovery (transient faults)  Self-diagnosing Maintenance actions (permanent faults)  Challenges  Avoid overreaction to transient faults The cure can be worse than the disease!  Support mixed-criticalities applications From X-by-Wire to Comfort applications  Portability for time-triggered (TT) platforms Add-on, middleware approach Contact: Marco Serafini (marco@informatik.tu-darmstadt.de)

17 17 P2: Mobile Database Systems  Mobile transactions  Commit protocols  Challenges: Frequent perturbations  Heterogeneity Wireless links (WLAN, UMTS, …) Mobile nodes (laptops, PDAs, …)  Failures Unpredictable disconnections Node/Communication failures  Infrastructure-based vs. ad-hoc  Mobile Ad-hoc NETworks (MANETs)  Wireless Sensor Networks (WSNs) Wired Network WLAN UMTS GPRS Contacts: Brahim Ayari (brahim@informatik.tu-darmstadt.de) Abdelmajid Khelil (khelil@informatik.tu-darmstadt.de) WAVE

18 18 P3: Dependable Ad-hoc Sensor Networks  Applications  Car2Car communication Cooperative driving Announcements  Tracking & monitoring  Measurement  Disaster rescue  Research challenges  Energy (efficiency, maintenance..)  Frequent failures (detection, diagnosis..)  Safety-critical applications  Reliable communication Contacts: Faisal Karim (fkarim@informatik.tu-darmstadt.de) Abdelmajid Khelil (khelil@informatik.tu-darmstadt.de) WAVE WLAN ZigBee

19 19 P4: Energy Efficient Dependable Systems  Trends  Heterogeneous systems  Increased dependence upon technology  Mobility  low voltage  smaller noise margins  more transient errors  Increased complexity  Integration/communication between systems  Energy efficient fault tolerance  Evaluate  Characterize  Optimize/trade-off  Dimensions  Design-time vs. run-time  Time vs. space  System level vs. components level  Service degradations and reconfiguration Contact: Neeraj Suri (suri@informatik.tu-darmstadt.de)

20 20 P5: Robustness Evaluation of Embedded OS/SW Problem: SW systems are vulnerable to errors in Commercial-Off-The-Shelf (COTS) components. Characterization of impact of 3 rd party SW is hard. Approach:  Focus on device drivers  Error propagation analysis using fault injection  Robustness enhancing wrappers Applications:  Verification  COTS integration (acceptance)  Robustness enhancement Contact: Andréas Johansson (aja@informatik.tu-darmstadt.de)

21 21 P6: Improved Testing of Device Drivers Problem Faulty COTS drivers used in modern OSs have a significant impact on system reliability. They are hard to test as execute in kernel space and are delivered sans source code. Applications  Black-box testing for COTS drivers  Profiling, debugging  System activity monitoring Research aims  Profile driver behavior at runtime  Expedite driver testing by focusing on runtime activity  Test methods tuned to OS/driver operational profiles … Hardware Layer System Services OS kernel Application 1Application p Contact: Constantin Sarbu (cs@cs.tu-darmstadt.de) Driver Monitor

22 22 Questions? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? © DEEDS Group SWFT WS ‘07


Download ppt "Software Fault Tolerance (SWFT) How to Design, Develop and Evaluate Robust SW and OS’s Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de."

Similar presentations


Ads by Google