Rigorous Development Of a Safety-Critical System Based on Coordinated Atomic Actions By Subash M S.

Slides:



Advertisements
Similar presentations
The Fault-Tolerant Insulin Pump Therapy Alfredo Capozucca, Nicolas Guelfi, Patrizio Pelliccione University of Luxembourg Faculty of Sciences, Technologies.
Advertisements

EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
Chapter 4 Quality Assurance in Context
1 Design by Contract Building Reliable Software. 2 Software Correctness Correctness is a relative notion  A program is correct with respect to its specification.
Model for Supporting High Integrity and Fault Tolerance Brian Dobbing, Aonix Europe Ltd Chief Technical Consultant.
Interrupts (contd..) Multiple I/O devices may be connected to the processor and the memory via a bus. Some or all of these devices may be capable of generating.
Overview Lesson 10,11 - Software Quality Assurance
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
Testing - an Overview September 10, What is it, Why do it? Testing is a set of activities aimed at validating that an attribute or capability.
Transactions and concurrency control
EMBEDDED SOFTWARE Team victorious Team Victorious.
Part II AUTOMATION AND CONTROL TECHNOLOGIES
1 Exception and Event Handling (Based on:Concepts of Programming Languages, 8 th edition, by Robert W. Sebesta, 2007)
CONTENTS:-  What is Event Log Service ?  Types of event logs and their purpose.  How and when the Event Log is useful?  What is Event Viewer?  Briefing.
Unit 3a Industrial Control Systems
Real-Time Software Design Yonsei University 2 nd Semester, 2014 Sanghyun Park.
Instructore: Tasneem Darwish1 University of Palestine Faculty of Applied Engineering and Urban Planning Software Engineering Department Concurrent and.
Concurrency, Mutual Exclusion and Synchronization.
CS4730 Real-Time Systems and Modeling Fall 2010 José M. Garrido Department of Computer Science & Information Systems Kennesaw State University.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Architectural Design lecture 10. Topics covered Architectural design decisions System organisation Control styles Reference architectures.
Reference: Ian Sommerville, Chap 15  Systems which monitor and control their environment.  Sometimes associated with hardware devices ◦ Sensors: Collect.
Exceptions in Java. Exceptions An exception is an object describing an unusual or erroneous situation Exceptions are thrown by a program, and may be caught.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
CprE 458/558: Real-Time Systems
RELIABILITY ENGINEERING 28 March 2013 William W. McMillan.
Hwajung Lee. One of the selling points of a distributed system is that the system will continue to perform even if some components / processes fail.
Real-time Software Design King Saud University College of Computer and Information Sciences Department of Computer Science Dr. S. HAMMAMI.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
CS4730 Real-Time Systems and Modeling Fall 2010 José M. Garrido Department of Computer Science & Information Systems Kennesaw State University.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Software Systems Verification and Validation Laboratory Assignment 4 Model checking Assignment date: Lab 4 Delivery date: Lab 4, 5.
Winter 2007SEG2101 Chapter 121 Chapter 12 Verification and Validation.
1 Exceptions When the Contract is Broken. 2 Definitions A routine call succeeds if it terminates its execution in a state satisfying its contract A routine.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.
©2008 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved. This material is protected under all copyright laws as they currently exist.
Week#3 Software Quality Engineering.
Lecture 6 Deadlock 1. Deadlock and Starvation Let S and Q be two semaphores initialized to 1 P 0 P 1 wait (S); wait (Q); wait (Q); wait (S);. signal (S);
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
Real-time Software Design
Part II AUTOMATION AND CONTROL TECHNOLOGIES
SOFTWARE TESTING Date: 29-Dec-2016 By: Ram Karthick.
Exception and Event Handling
Self Healing and Dynamic Construction Framework:
Principles of Programming and Software Engineering
Fault Tolerance In Operating System
Real-time Software Design
Automation Topics: Elements of an Automated System
Multi-version approach (with error detection and recovery)
Fault Injection: A Method for Validating Fault-tolerant System
Outline Announcements Fault Tolerance.
Fault Tolerance Distributed Web-based Systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Baisc Of Software Testing
Welcome to Corporate Training -1
Exception Handling Imran Rashid CTO at ManiWeber Technologies.
Concurrency: Mutual Exclusion and Process Synchronization
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Exception and Event Handling
Assertions References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 4/25/2019.
EEC 688/788 Secure and Dependable Computing
COMP3221: Microprocessors and Embedded Systems
EEC 688/788 Secure and Dependable Computing
Abstractions for Fault Tolerance
Presentation transcript:

Rigorous Development Of a Safety-Critical System Based on Coordinated Atomic Actions By Subash M S

The Idea To design and build a fault-tolerant control Program for a realistically-detailed model of an industrial production Cell using Coordinated atomic (CA) actions. Then Debug, improve and verify the design formally. 2/29 Subash M S

Organization and sections Description of CA, Fault-Tolerant Production cell model Analysis of possible failures of devices and sensors of Production cell Control program that using CA actions both for structuring and exception handling. Formal treatment of CA action-based designs and formalize properties of Production Cell II Discuss an implementation of the control program Conclusion 3/29 Subash M S

CA Actions and Production CELL II Definition of CA Actions: It is a mechanism for coordinating multi-threaded interactions and ensuring consistent access to objects in the presence of concurrency and potential faults. 4/29 Subash M S

Overview and Example of a CA Action CA action consists of roles executing in parallel performing operations on objects Upon the occurrence of an error appropriate forward/reverse recovery must be performed cooperatively by all roles to reach a mutually consistent conclusion A recovery line must be provided by the CA action in order to coordinate the recovery of roles without domino effect 5/29 Subash M S

Overview and Example of a CA Action Contd… An Acceptance test is provided to determine whether the outcome of the CA action is successful External objects which are shared between various CA actions must provide their own error coordination mechanisms These external objects must also behave atomically with respect each CA action in order to prevent information smuggling 6/29 Subash M S

Overview and Example of a CA Action Contd… 7/29 Subash M S

Overview and Example of a CA Action Contd… The desired effect of the CA action becomes visible to the entire system only if the acceptance test is passed. The acceptance test allows both a normal and one or more exceptional outcomes to be visible Each exceptional outcome signals a specified exception to the environment. If the Acceptance test is not passed the erroneous visible out come is undone and an Abort exception is signaled to the environment. If the error cannot be undone and it is visible, a Failure exception is signaled to the environment so that it may deal with the situation. 8/29 Subash M S

The fault tolerant production cell 9/29 Subash M S

Basic system requirements of the control program Safety Liveness Failure detection and continuous service 10/29 Subash M S

Failure Definitions and Analysis Assumptions: The system clock, traffic lights and alarm signal are fault-free and do not fail Values of sensors, actuators and clocks are always transmitted correctly without any error No failure can cause any device to exceed limiting positions All sensor failures are indicated by sensor values All actuator failures will cause devices to stop 11/29 Subash M S

Possible types of failure Failure Definitions and Analysis Contd… Possible types of failure Sensor Failures Actuator Failures Lost Blank All of the above mentioned failures can be detected by monitoring various sensor values corresponding to the different Devices, Robot, Press…etc and using assertion statements to check for the failure. However This is not that simple, because though detection of a failure is easy, Distinguishing and deciding which type of exception or failure has occurred is very difficult. This is why in most cases, the cell has to be stopped to identify the failure and correct it. 12/29 Subash M S

Control Program based on CA Actions for Production Cell II There are six concurrent execution threads Feed belt Table Robot Press1 Press2 Deposit belt Additional Threads are Blank supplier Blank consumer 13/29 Subash M S

Control Program based on CA Actions for Production Cell II Contd… 14/29 Subash M S

Control Program based on CA Actions for Production Cell II Contd… Intersection between CA Actions represents the fact that they cannot be executed in parallel Each hardware device is associated with a device-controller (a thread) which is responsible for dynamically the sequence of actions that the device will participate in Action will begin only if its pre-conditions are valid, and its post conditions will hold if no exception is raised during the execution of the action. 15/29 Subash M S

CA Action example: Load Press I 16/29 Subash M S

Description of the CA Action using the COALA notation CAA LoadPress1; Interface Use MetalBlank; Roles Robot: blankType, robotActuator; Press1: blankType, press1Actuator; RobotSensor: arm1ExtensionSensor, robotAngleSensor; Press1Sensor: blankSensor, lowPositionSensor, midPositionSensor; Exceptions Press1Failure, Arm1Failure1, ...; ;;exceptions to signal Body Use CAA ;;specify nested actions RotateRobot, MovePress1toMiddle, ExtendArm1, RetractArm1; Object robotPress1Channel: Channel; ;;shared local objects 17/29 Subash M S

Description of the CA Action using the COALA notation Contd… Exceptions press1_failure, blank_sensor_failure, ...; ;;internal exceptions Handlers press1_handler, blank_sensor_handler, ...; Resolution press1_failure -> press1_handler, ...; ;;exception resolution graph Role Robot(...); Role Press1(...); ... End LoadPress1; 18/29 Subash M S

Dealing with component Failures failure of a component involved in the CA action is detected by using assertion statement Exception is raised by one or more roles and is propagated to other roles Control is transferred to exception handlers to perform appropriate error recovery In most cases it is not possible to completely recover to normal Post conditions 19/29 Subash M S

Handler examples and exceptional post-conditions Handler for Press1: The Load press1 action performs forward error recovery and tries to move the blank to press2 which is still operational Handler for the Rotary Sensor or motor Failure: In this case Backward error recovery is used and the robot is moved back to its initial position and attempt is made to rotate it again 20/29 Subash M S

Exceptional outcomes: Incase the handlers are not able to eliminate the failure it leads to one of the following exceptional outcomes and corresponding one or more post-condition 21/29 Subash M S

Dealing with Concurrent Failures 22/29 Subash M S

Design of Device-Controllers Device/Sensor-controllers are used to determine dynamically the order in which the CA actions are executed. For Production Cell II model: Feedbelt, Table, Robot, Press1, Press2, DepositBelt, Supplier, Consumer Two queue objects are defined : RobotQueue and depositbeltQueue 23/29 Subash M S

Device-Controller Example: Press1 PressController: loop forever { robotQueue.put(PRESS1_FREE) LoadPress1.Press(Plate) ForgeBlank.Press(Plate) robotQueue.put(FORGED_PLATE_IN_PRESS1) UnloadPress1.Press(Plate) } 24/29 Subash M S

Interactions between controllers and CA Actions 25/29 Subash M S

An Implementation The Fault-Tolerant Production Cell Simulator 26/29 Subash M S

Features of Production Cell Simulator Outlines of 12 top-level CA actions are displayed on the simulator Outlines are gradually colored in gradually during system execution to show CA action execution dynamically Dynamic process if exception handling will be shown using color change within outline Up to two Failures can be injected into the Production Cell simulator dynamically During testing all injected device or Sensor failures were caught successfully and handled immediately by the control program. Even a previously unknown software bug in the original simulator was detected by the acceptance test of a CA action and recovered by the retry operation of the Ca action. 27/29 Subash M S

An Implementation Contd… Failure Injection Panel 28/29 Subash M S

Conclusion This work represents the first and complete formal analysis for the complex and realistic Production Cell II Analysis has been conducted on the possible component failures and ways to identify these failures. Results of the analysis has been used to guide the design of the system employing a very sophisticated exception handling scheme, capable of dealing appropriately even with concurrent occurrences of any of the wide variety of possible failures defined in the FZI specification of Production Cell II Different Design, focusing mainly on cooperation between devices during both normal execution and during exception handling Design style used was one that has been reached through very specific consideration of the problems raised by the production Cell examples 29/29 Subash M S