Fault Injection: A Method for Validating Fault-tolerant System

Slides:



Advertisements
Similar presentations
Software Quality Assurance Plan
Advertisements

System Development Life Cycle (SDLC)
Chapter 4 Quality Assurance in Context
Software Fault Injection for Survivability Jeffrey M. Voas & Anup K. Ghosh Presented by Alison Teoh.
Page 1 Building Reliable Component-based Systems Chapter 10 - Predicting System Trustworthiness Chapter 10 Predicting System Trustworthiness.
Software Fault Injection Kalynnda Berens Science Applications International Corporation NASA Glenn Research Center.
SENG521 (Fall SENG 521 Software Reliability & Testing Defining Necessary Reliability (Part 3b) Department of Electrical & Computer.
1 Software Testing and Quality Assurance Lecture 5 - Software Testing Techniques.
SIMULATING ERRORS IN WEB SERVICES International Journal of Simulation: Systems, Sciences and Technology 2004 Nik Looker, Malcolm Munro and Jie Xu.
What Exactly are the Techniques of Software Verification and Validation A Storehouse of Vast Knowledge on Software Testing.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering.
Software Faults and Fault Injection Models --Raviteja Varanasi.
Software Quality Assurance Lecture #8 By: Faraz Ahmed.
Towards An Open Data Set for Trace-Oriented Monitoring Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Michael R. Lyu 1,2 1 National University.
© SERG Dependable Software Systems (Mutation) Dependable Software Systems Topics in Mutation Testing and Program Perturbation Material drawn from [Offutt.
1 Software Testing and Quality Assurance Lecture 33 – Software Quality Assurance.
University of Palestine software engineering department Testing of Software Systems Testing throughout the software life cycle instructor: Tasneem Darwish.
Testing Basics of Testing Presented by: Vijay.C.G – Glister Tech.
University of Palestine software engineering department Testing of Software Systems Testing throughout the software life cycle instructor: Tasneem.
Today’s Agenda  HW #1  Finish Introduction  Input Space Partitioning Software Testing and Maintenance 1.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
Historical Aspects Origin of software engineering –NATO study group coined the term in 1967 Software crisis –Low quality, schedule delay, and cost overrun.
Fault Tolerance Benchmarking. 2 Owerview What is Benchmarking? What is Dependability? What is Dependability Benchmarking? What is the relation between.
Software Metric; defect removal efficiency, Cyclomate Complexity Defect Seeding Mutation Testing.
1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University
1 Software Testing Strategies: Approaches, Issues, Testing Tools.
Software Quality Assurance and Testing Fazal Rehman Shamil.
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
Mutation Testing Breaking the application to test it.
Agenda  Quick Review  Finish Introduction  Java Threads.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
Testing and Evolution CSCI 201L Jeffrey Miller, Ph.D. HTTP :// WWW - SCF. USC. EDU /~ CSCI 201 USC CSCI 201L.
SENG521 (Fall SENG 521 Software Reliability & Testing Preparing for Test (Part 6a) Department of Electrical & Computer Engineering,
Week#3 Software Quality Engineering.
MANAGEMENT INFORMATION SYSTEM
Wolfgang Runte Slide University of Osnabrueck, Software Engineering Research Group Wolfgang Runte Software Engineering Research Group Institute.
Software Testing Strategies for building test group
Module A Fundamentals of Testing
Software Engineering (CSI 321)
Testing Tutorial 7.
Software Quality Assurance
Lecture 1 Introduction Richard Gesick.
Software Testing Introduction CS 4501 / 6501 Software Testing
Application Level Fault Tolerance and Detection
Chapter 8 – Software Testing
IEEE Std 1074: Standard for Software Lifecycle
Quality Management Perfectqaservices.
Faults, Errors, Failures CS 4501 / 6501 Software Testing
Verification & Validation
Fault Tolerance In Operating System
Application Level Fault Tolerance and Detection
Software testing strategies 2
Verification and Validation Unit Testing
Software Quality Assurance
Soft Error Detection for Iterative Applications Using Offline Training
Static Testing Static testing refers to testing that takes place without Execution - examining and reviewing it. Dynamic Testing Dynamic testing is what.
Exception Handling Imran Rashid CTO at ManiWeber Technologies.
Software Verification and Validation
Software Verification and Validation
Software Verification and Validation
© Oxford University Press All rights reserved.
Abstractions for Fault Tolerance
Presentation transcript:

Fault Injection: A Method for Validating Fault-tolerant System Salahuddin Mohammad Masum Department of Electrical & Computer Engineering The University of Memphis Email: smasum@memphis.edu

Outline Motivation What is Fault Injection? Fault, Error, and Failure … Objectives & Expected Results of the Project What has been done so far! Current Results … Future Direction © Mohammed Yeasin, 2007

Error: Where it Beings The Carnegie Mellon Software Engineering Institute1 reports that at least 42-50 percent of software defects originate in the requirements phase. The Defense Acquisition University Program Manager Magazine2 reports that a Department of Defense study that over 50 percent of all software errors originate in the requirements phase. 1 – Carnegie Mellon Software Engineering Institute, The Business Case for Requirements Engineering, RE’ 2003, 12 September 2003 2 - Defense Acquisition University Program Manager Magazine, Nov-Dec 1999, Curing the Software Requirements and Cost Estimating Blues © Mohammed Yeasin, 2007

Error Detection/Correction The cost to correct software errors multiplies during the SDLC. “The cost of correcting code in production increases up to 100 times as compared to in development...” Early error detection and correction are vital. 75% of attacks today happen at the Application (Gartner). “ The cost and reputation savings of avoiding a security breach are “priceless” 1. MSDN (November, 2005) “Leveraging the Role of Testing and Quality Across the Lifecycle to Cut Costs and Drive IT/Business Responsiveness “ 2. Direct Return on Investment of Software Independent Verification and Validation: Methodology and Initial Case Studies, James B. Dabney and Gary Barber, Assurance Technology Symposium, 5 June 2003. © Mohammed Yeasin, 2007

What is Fault Injection? Fault Injection is the process of corrupting a data state during program execution. Fault injection based testing is the process of determining the effect of that corruption. The testing may consist of simply measuring whether the corrupted state affected a particular output, or the testing may determine whether system attributes such as safety, security, or survivability have been affected. Fault Injection is an effective solution to the problem of validating highly reliable systems. © Mohammed Yeasin, 2007

Fault, Error, and Failure When a fault causes an invalid change in machine state, an error occurs. The time between fault occurrence and the first manifestation of an error is called the fault latency. Although a fault remains localized in the affected code, multiple errors can originate from one fault site and propagate throughout the system. These will cause a propagating error after a period of time, called the error latency. When the fault-tolerance mechanisms perceive an error, they may commence several actions to handle the fault and contain its errors. Recovery occurs if these actions are successful otherwise the system eventually malfunctions and a failure occurs. © Mohammed Yeasin, 2007

Fault, Error, and Failure © Mohammed Yeasin, 2007

Accomplishments so far! The project is still at its infancy! The accomplishment so far is: A software system testing method using simulated fault injection model that periodically monitor software to sample machine state or record memory references on an operational system. The acquired trace is used to simulate system behavior, as errors that mimic faults in the instrumented components are inserted into the trace. Techniques are being developed to associate measure of system load (at the time the trace was obtained) with the results, to distinguish extremes in fault behavior from the norm. © Mohammed Yeasin, 2007

Simulation Two different versions of Quicksort. Basic (does not have error and exception handling. Not reliable) Advanced (has exceptional handling component. Reliable) The input to the simulation is a random unsorted array of very large dimension. Faults are injected into the runtime variables and are monitored simultaneously. Description Basic Advanced Number of simulations 250 Number of faults injected 250*3 % Total failure 32% 19% % Partial failure 27% 25% % Nothing happened (Wrong output) 34% 16% Theoretically the second implementation must be more reliable than the first one because of the presence of error and exception handling mechanisms Java is chosen because it supports internally multithreading and exception handling. The simulation is split into three threads: 1) Quicksort implementation thread, 2) Monitoring thread, 3) Fault Injection thread. The Quicksort thread is not synchronized so that the monitoring and the fault injection threads can read and write variables (data) respectively during the course of execution of the Quicksort implementation thread. Faults are injected into the key variables of the software by the fault injection thread which runs in parallel with the implementation thread. The key variables are the pivot, low and the high. They denote the index numbers of the array in the Quicksort implementation. Quicksort uses recursion and so any change in these index values are carried to the later stages and can a potential hazard to the execution of the program. Total failure denotes the number of times the program terminated abnormally and also at the early stage of the implementation (these denote that the faults are fatal), partial failure means that the faults introduced minor errors which do not halt the program immediately. In other cases, the program executes successfully, but the output may be correct or wrong. From the results it can be noted that the second implementation of the Quicksort performs better than the first implementation. That means that the second version of the Quicksort is more dependable with respect to fault-tolerant concern. % Nothing happened (Correct output) © Mohammed Yeasin, 2007

Simulation (continued) We injected faults for the entire execution time of Quicksort. An injected fault initially caused a minor error. If the minor error later propagated to and was detected by the model, it became a detected error. A fatal error occurred when a detected error disrupted control flow. The program would then either complete with correct or incorrect results or terminate through a time-out or fatal error. © Mohammed Yeasin, 2007

Simulation (continued) © Mohammed Yeasin, 2007

Simulation (continued) © Mohammed Yeasin, 2007

Future Directions Future investigations will focus on reducing the large fault space associated with integrated systems. To devise novel model that homogeneously injects the effects of low-level faults at higher rate compared to high-level faults. Method for analyzing the behavior after fault injection. How to perform software fault injection on “large scale” integrated software system. © Mohammed Yeasin, 2007

Thank You ! © Mohammed Yeasin, 2007