Fardin Abdi, Renato Mancuso, Stanley Bak, Or Dantsker, Marco Caccamo

Slides:



Advertisements
Similar presentations
© Alan Burns and Andy Wellings, 2001 Real-Time Systems and Programming Languages n Buy Real-Time Systems: Ada 95, Real-Time Java and Real-Time POSIX by.
Advertisements

Avionics Panel Go For Luna Landing! Graham ONeil United Space Alliance March 2008.
The System-Level Simplex Architecture Stanley Bak Olugbemiga Adekunle Deepti Kumar Chivukula Mu Sun Marco Caccamo Lui Sha.
ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 2.
EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
1 SOFTWARE TESTING Przygotował: Marcin Lubawski. 2 Testing Process AnalyseDesignMaintainBuildTestInstal Software testing strategies Verification Validation.
Self-Stabilization in Distributed Systems Barath Raghavan Vikas Motwani Debashis Panigrahi.
ECE 720T5 Fall 2012 Cyber-Physical Systems Rodolfo Pellizzoni.
Fardin Abdi, Brett Robins, Marco Caccamo University of Illinois at Urbana-Champaign Urbana-Champaign, USA {abditag2, robbins3, 1UIUC.
Updating RT Embedded Software in the Field Lui Sha Real Time Systems Laboratory Department of CS, UIUC October, 2002.
1 ORTEGA: An Efficient and Flexible Online Fault Tolerance Architecture for Real-Time Control Systems Xue Liu, Qixin Wang, Sathish Gopalakrishnan, Wenbo.
REAL-TIME SOFTWARE SYSTEMS DEVELOPMENT Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
Real-time systems. CS351 - Software Engineering (AY2004)2 Real-time systems Real-time (RT) Systems RT transaction Controlled Object Computer System Operator.
Real-Time Systems and Programming Languages
Software Testing for Safety- Critical Applications Presented by: Ciro Espinosa & Daniel Llauger.
©Ian Sommerville 2006Critical Systems Slide 1 Critical Systems Engineering l Processes and techniques for developing critical systems.
Issues on Software Testing for Safety-Critical Real-Time Automation Systems Shahdat Hossain Troy Mockenhaupt.
Systems Engineering Approach to MPS Risk Management Kelly Mahoney Presented at the Workshop for Machine Protection in Linear Accelerators.
EMBEDDED SOFTWARE Team victorious Team Victorious.
Unit 3a Industrial Control Systems
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 15 Slide 1 Real-time Systems 1.
Tolerating Memory Leaks Michael D. Bond Kathryn S. McKinley.
On-Chip Control Flow Integrity Check for Real Time Embedded Systems Fardin Abdi Taghi Abad, Joel Van Der Woude, Yi Lu, Stanley Bak, Marco Caccamo, Lui.
1 EVALUATING INTELLIGENT FLUID AUTOMATION SYSTEMS USING A FLUID NETWORK SIMULATION ENVIRONMENT Ron Esmao - Sr. Applications Engineer, Flowmaster USA.
REAL-TIME SOFTWARE SYSTEMS DEVELOPMENT Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
Towards a Contract-based Fault-tolerant Scheduling Framework for Distributed Real-time Systems Abhilash Thekkilakattil, Huseyin Aysan and Sasikumar Punnekkat.
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
1 Feedback Based Real-Time Fault Tolerance Issues and Possible Solutions Xue Liu, Hui Ding, Kihwal Lee, Marco Caccamo, Lui Sha.
1 Software Testing and Quality Assurance Lecture 33 – Software Quality Assurance.
 CS 5380 Software Engineering Chapter 8 Testing.
UNIT 17 Computing Support.
3.5 Linear Programming Warm-up (IN) 1. Solve the system: (5, 2)
Towards the Design of Heterogeneous Real-Time Multicore System m Yumiko Kimezawa February 1, 20131MT2012.
REAL-TIME SOFTWARE SYSTEMS DEVELOPMENT Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
The complexity of modern software packages make exhaustive testing difficult. Automated testing can help to improve efficiency of the testing process.
Fault-Tolerant Parallel and Distributed Computing for Software Engineering Undergraduates Ali Ebnenasir and Jean Mayo {aebnenas, Department.
Dynamic Memory. We will follow different order from Course Book We will follow different order from Course Book First we will cover Sect The new.
Handling Mixed-Criticality in SoC- based Real-Time Embedded Systems Rodolfo Pellizzoni, Patrick Meredith, Min-Young Nam, Mu Sun, Marco Caccamo, Lui Sha.
Effective State Awareness Information is Enabling for System Prognosis Mark M. Derriso Advanced Structures Branch Air Vehicles Directorate Air Force Research.
RELIABILITY ENGINEERING 28 March 2013 William W. McMillan.
1 Structure of Aalborg University Welcome to Aalborg University.
Verification of FT System Using Simulation Petr Grillinger.
Safety and Automated Driving Systems Kyle Vogt, Cruise, October 28, 2015.
Static WCET Analysis vs. Measurement: What is the Right Way to Assess Real-Time Task Timing? Worst Case Execution Time Prediction by Static Program Analysis.
Real-Time Systems, Events, Triggers. Real-Time Systems A system that has operational deadlines from event to system response A system whose correctness.
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
Functional Safety in industry application
Real-time Software Design
Stanley Bak, Fardin Abdi Taghi Abad, Zhenqi Huang, Marco Caccamo
Presented by: Daniel Taylor
ADVANCED COMPUTATIONAL MODELS AND ALGORITHMS
Albert M. K. Cheng Embedded Real-Time Systems
Black Box Testing PPT Sources: Code Complete, 2nd Ed., Steve McConnell
Embedded Systems Introduction
FAULT TOLERANCE TECHNIQUE USED IN SEAWOLF SUBMARINE
UNIT 17 Computing Support.
FPGA: Real needs and limits
Real-time Software Design
Tradeoff Analysis of Strategies for System Qualities
Fault-tolerant Control System Design and Analysis
Autonomous Cyber-Physical Systems: Basics of Verification
Avionics Panel Go For Luna Landing!
Lesson 17: State-Based Design Example
Fault Tolerance Distributed Web-based Systems
Baisc Of Software Testing
CubeSat vs. Science Instrument Complexity
Section 3.4 Sensitivity Analysis.
Overview Dependability: "[..] the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [..]"
Motion Planning for a Point Robot (1/2)
Presentation transcript:

Fardin Abdi, Renato Mancuso, Stanley Bak, Or Dantsker, Marco Caccamo 21st Conference on Emerging Technologies Factory Automation Reset-Based Recovery for Real-Time Cyber-Physical Systems with Temporal Safety Constraints Fardin Abdi, Renato Mancuso, Stanley Bak, Or Dantsker, Marco Caccamo

Safety Critical CPS

CPS Safety Constraints Physical Limits Regulations MAX Altitude by FAA

Safety is Only Meaningful with Liveliness

Software Faults: Main Obstacle for Safety and Liveliness

Software Fault; A major challenge Verification: Cost Correctness Upgrades Time/Cost 3rd party SW Specialized Knowledge Not always doable Testing: No Guarantees

Our approach: Tolerate Faults and Recover using Restarts

Recovery Using Restarts Cyber-Physical Systems Traditional Computers First, most of the bugs in production quality software are Heisenbugs \cite{candea2001recursive} which are hard to reproduce or depend on the timing of external events, for example race condition. Restarting is very effective in recovering from this type of bugs. Second, restarting can claim all the stale resources, clean up all the corrupt state (e.x. memory leaks, dangling pointers, damaged heap) and take system back into a known well-tested state within a predictable amount of time

Two Type of Safety Constraints

System Constraints I: Linear Constraints: Example:   \left\{ \begin{array}{cc} p < 2 &\\ p/4 + t < 2.5 &\\ \end{array}\right. pressure Temperature

System Constraints II: Overrun Constraints: Example: \text{Stress}(p) = \left\{ \begin{array}{cc} 1 & p > 10\\ 0 & p \leq 10\\ \end{array}\right. \int_{t}^{t+16} \text{Stress}(p(\tau))\cdot d\tau \leq 15 P=10 Power Time

Architecture WD timers: Restart the board if components fail Sensors FS Switch Control Command Complex Controller Physical plant WD Timer MUX Safety Controller FS Enable RTR Module RESET PIN Rescue Unit Main Unit WD timers: Restart the board if components fail SC: Can always keep the system safe RTR: Predicts if the future states are safe CC: Not verified, can create unsafe commands FS switch: switch to SC during the restart Rescue Unit: Bare Metal, verified Main Unit: OS/Firmware Can fail

Fault Model Rescue Unit: Verified and no faults RTR unit: Fail-stop failure model Complex Controller: Any type of fault

Safety Controller Design Goals: To keep system within the Linear constraints To satisfy the overrun constraints To stay within the limits of actuators Strategy: To find a region where all the above are always satisfied To design a state feedback controller that keeps the system within that region

Finding a safe region for Overrun Constraints Example: O = \{x | \text{Stress}(x) \leq \frac{C}{T^{win}}\} \forall t; \int_{t}^{t+{T^{win}}} \text{Stress}(x(\tau))\cdot d\tau \leq C

Safety Controller Design Linear Constraints: Gamma: Intersection of all the Linear Inequalities. Overrun Constraints: Actuator Limits: a^T_m\cdot x \leq 1, m = 1, \dots, q,\\ c_{i,k}^T\cdot x \leq 1, k = 1, \dots, p_i, i = 1, \dots, p,\\ b^T_j\cdot u \leq 1, j=1,\dots,r Use an LMI solver, to find a linear state feedback controller and its Q matrix.

Under the control of SC, any point inside R, will remain inside R. Stability Region Under the control of SC, any point inside R, will remain inside R. Gamma Stability Region, R

Switching Condition for Hard Constraints \text{Reach}_{\leq T_{c}}(x, CC) \subseteq \mathcal{S} \text{Reach}_{\leq T_s}(\text{Reach}_{\leq T_{c}}(x, CC), SC) \subseteq \mathcal{S} \item$\text{Reach}_{= T_s }(\text{Reach}_{\leq T_{c}}(x, CC), SC) \subseteq \mathcal{R}

Switching Condition for Hard Constraints Safe region, S Stability Region, R

Switching Conditions for Overrun Constraints Due to design of Stability Region \int_{0}^{{T^{win}}} \text{\normalfont{Stress}}(x(\tau))\cdot d\tau \leq \alpha C

Switching Conditions for Overrun Constraints We keep track of the past stress in an array. We predict future stress using reachability analysis. 𝑇 𝑤𝑖𝑛 = 14 𝑇 𝑐 10 3 5 4 7 9 11 1 16 2 6 8 15 Time Stored in array Future Predictions Interval of time Sum of stress in this interval of time Current Time

Evaluations

Restarting in Action

Flight Trace

Progress Analysis

Stability Region Size – Experiment 1 No Overrun constraints LMI-Simplex RTR, Our approach

Stability Region Size – Experiment 2 No Overrun constraints: LMI-Simplex RTR With Overrun constraints: Our approach

Thank You!

Support Slides

Introducing 𝜶 \Gamma = \{x| \\a^T_m\cdot x \leq 1, m = 1, \dots, q,\\ c_{i,k}^T\cdot x \leq 1, k = 1, \dots, p_i, i = 1, \dots, p,\\ b^T_j\cdot u \leq 1, j=1,\dots,r \}

If O was not Linear Finding a convex Region inside O:

How to predict stress using reachability. MaxSumStress([ 𝑡 1 , 𝑡 2 ]): Return the maximum of integral of stress function in a given window [ 𝑡 1 , 𝑡 2 ] Power Time