Presented by: CAI Xia Ph.D Term2 Presentation April 28, 2004

Slides:

Advertisements

Similar presentations

Defect testing Objectives

Advertisements

Verification and Validation

Design of Experiments Lecture I

SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.

An Empirical Study on Reliability Modeling for Diverse Software Systems Xia Cai and Michael R. Lyu Dept. of Computer Science & Engineering The Chinese.

Making Services Fault Tolerant

Coverage-Based Testing Strategies and Reliability Modeling for Fault- Tolerant Software Systems Presented by: CAI Xia Supervisor: Prof. Michael R. Lyu.

Software Reliability Engineering: A Roadmap

1 Testing Effectiveness and Reliability Modeling for Diverse Software Systems CAI Xia Ph.D Term 4 April 28, 2005.

An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering Michael R. Lyu, Zubin Huang, Sam Sze, Xia Cai The Chinese University.

An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.

Testing an individual module

Reliability Modeling for Design Diversity: A Review and Some Empirical Studies Teresa Cai Group Meeting April 11, 2006.

1 The Effect of Code Coverage on Fault Detection Capability: An Experimental Evaluation and Possible Directions Teresa Xia Cai Group Meeting Feb. 21, 2006.

Chapter 11: Testing The dynamic verification of the behavior of a program on a finite set of test cases, suitable selected from the usually infinite execution.

State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.

Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.

Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.

System/Software Testing

Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

CPIS 357 Software Quality & Testing

CMSC 345 Fall 2000 Unit Testing. The testing process.

VTT-STUK assessment method for safety evaluation of safety-critical computer based systems - application in BE-SECBS project.

Evaluation of software engineering. Software engineering research : Research in SE aims to achieve two main goals: 1) To increase the knowledge about.

Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.

Software Measurement & Metrics

BLACK BOX TESTING K.KARTHIKEYAN. Black box testing technique Random testing Equivalence and partitioning testing Boundary value analysis State transition.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.

SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.

CprE 458/558: Real-Time Systems

Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.

CSC 480 Software Engineering Test Planning. Test Cases and Test Plans A test case is an explicit set of instructions designed to detect a particular class.

Testing and inspecting to ensure high quality An extreme and easily understood kind of failure is an outright crash. However, any violation of requirements.

Software Quality Assurance and Testing Fazal Rehman Shamil.

HNDIT23082 Lecture 09:Software Testing. Validations and Verification Validation and verification ( V & V ) is the name given to the checking and analysis.

Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.

Week#3 Software Quality Engineering.

UC Marco Vieira University of Coimbra

Introduction to Machine Learning, its potential usage in network area,

Experience Report: System Log Analysis for Anomaly Detection

SOFTWARE TESTING Date: 29-Dec-2016 By: Ram Karthick.

The Impact of Concurrent Coverage Metrics on Testing Effectiveness

Testing Tutorial 7.

Software Testing.

Hardware & Software Reliability

CSC 480 Software Engineering

Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.

Software Engineering (CSI 321)

Input Space Partition Testing CS 4501 / 6501 Software Testing

Software Testing An Introduction.

Chapter 8 – Software Testing

Verification and Validation

BASICS OF SOFTWARE TESTING Chapter 1. Topics to be covered 1. Humans and errors, 2. Testing and Debugging, 3. Software Quality- Correctness Reliability.

UNIT-4 BLACKBOX AND WHITEBOX TESTING

Software testing strategies 2

Testing, Reliability, and Interoperability Issues in CORBA Programming Paradigm 11/21/2018.

Lecture 09:Software Testing

Verification and Validation Unit Testing

Test Case Purification for Improving Fault Localization

Fault Tolerance Distributed Web-based Systems

Gang Xing and Michael R. Lyu The Chinese University of Hong Kong

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Software Verification and Validation

Institute of Computing Tech.

DESIGN OF EXPERIMENTS by R. C. Baker

Biological Science Applications in Agriculture

Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.

A handbook on validation methodology. Metrics.

UNIT-4 BLACKBOX AND WHITEBOX TESTING

Presentation transcript:

Presented by: CAI Xia Ph.D Term2 Presentation April 28, 2004 An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering Presented by: CAI Xia Ph.D Term2 Presentation April 28, 2004

Outline Background and motivations Project descriptions and experimental Procedure Preliminary experimental results on testing and fault tolerance Evaluation on current reliability models Conclusion and future work

Background Fault removal and fault tolerance are two major approaches in software reliability engineering Software testing is the main fault removal technique Data flow coverage testing Mutation testing The main fault tolerance technique is software design diversity Recovery blocks N-version programming N self-checking programming Data flow coverage is a technique to provide measure of test sets and test completeness by executing the test cases and measuring how program codes are exercised. Some studies show the high data flow coverage brings high software reliability. The observation of a correlation between good data flow testing and a low field fault rate is reported for the usefulness of data flow coverage testing. Problem: As most experimental investigations are “once-only” efforts, however, conclusive evidence about the effectiveness of coverage is still lacking. The approach to mutation testing, on the other hand, begins by creating many versions of a program. Each of these versions is "mutated" to introduce a single fault. These "mutant" programs are then run against test cases with the goal of causing each faulty version to fail. Each time a test case causes a faulty version to fail, that mutant is considered "killed.” Problem: These mutants are either too trivial (too easily killed) or too unrealistic (too hard to be activated).

Design Diversity N-version Programming (NVP) To employ different development teams to build different program versions according to one single specification The target is to achieve quality and reliability of software systems by detecting and tolerating software faults during operations The final output of NVP is voted by multiple versions Problem: the possibility of correlated faults in multiple versions

Background Conclusive evidence about the relationship between test coverage and software reliability is still lacking Mutants with hypothetical faults are either too easily killed, or too hard to be activated The effectiveness of design diversity heavily depends on the failure correlation among the multiple program versions, which remains a debatable research issue.

Motivation The lack of real world project data for investigation on software testing and fault tolerance techniques The lack of comprehensive analysis and evaluation on software testing and fault tolerance together The lack of evaluation and validation on current software reliability models for design diversity

Motivation Conduct a real-world project to engage multiple teams for development of program versions Perform detailed experimentation to study the nature, source, type, detectability and effect of faults uncovered in the versions Apply mutation testing with real faults and investigate different hypotheses on software testing and fault tolerance schemes Evaluate the current reliability models

Project descriptions In spring of 2002, 34 teams are formed to develop a critical industry application for a 12-week long project in a software engineering course Each team composed of 4 senior-level undergraduate students with computer science major from the Chinese University of Hong Kong

RSDIMU System Data Flow Diagram Project descriptions The RSDIMU project Redundant Strapped-Down Inertial Measurement Unit It is part of the navigation system in an aircraft or spacecraft. In this application, developers are required to estimate the vehicle acceleration using the eight accelerometers mounted on the four triangular faces of a semi-octahedron in the vehicle. As the system itself is fault tolerant, it allows the calculation of the acceleration when some of the accelerometers fail. RSDIMU System Data Flow Diagram

Software development procedure Initial design document ( 3 weeks) Final design document (3 weeks) Initial code (1.5 weeks) Code passing unit test (2 weeks) Code passing integration test (1 weeks) Code passing acceptance test (1.5 weeks)

Program metrics Id Lines Modules Functions Blocks Decisions C-Use P-Use Mutants 01 1628 9 70 1327 606 1012 1384 25 02 2361 11 37 1592 809 2022 1714 21 03 2331 8 51 1081 548 899 1070 17 04 1749 7 39 1183 647 646 1339 24 05 2623 40 2460 960 2434 1853 26 07 2918 35 2686 917 2815 1792 19 08 2154 57 1429 585 1470 1293 09 2161 56 1663 666 1979 20 12 2559 46 1308 551 1204 1201 31 15 1849 47 1736 732 1645 1448 29 1768 58 1310 655 1014 1328 18 2177 6 69 1635 686 1138 1251 10 1807 60 1531 782 1512 1735 22 3253 68 2403 1076 2907 2335 23 2131 90 1890 706 1586 1805 4512 45 2144 1238 2404 4461 27 1455 622 1114 1364 1627 43 1710 506 1539 833 1914 1601 827 1075 1617 32 1919 41 974 1649 2132 33 1880 1009 2574 2887 16 Average 2234.2 9.0 48.8 1700.1 766.8 1651.5 1753.4 Total: 426

Mutant creation Revision control was applied in the project and code changes were analyzed Fault found during each stage were also identified and injected into the final program of each version to create mutants Each mutant contains one design or programming fault 426 mutants were created for 21 program versions

Setup of evaluation test ATAC tool was employed to analyze and compare the test coverage 1200 test cases were exercised on 426 mutants All the resulting failures from each mutant were analyzed, their coverage measured, and cross-mutant failure results compared 60 Sun machines running Solaris were involved in the test, one cycle took 30 hours and a total of 1.6 million files around 20GB were generated

Static analysis: fault classificaiton and distribution Mutant defect type distribution Mutant qualifier distribution Mutant severity distribution Fault distribution over development stage Mutant effect code lines

Static Analysis result (1) Defect types Number Percent Assign/Init: 136 31% Function/Class/Object: 144 33% Algorithm/Method: 81 19% Checking: 60 14% Interface/OO Messages 5 1% Qualifier Number Percent Incorrect: 267 63% Missing: 141 33% Extraneous: 18 4% Qualifier Distribution Defect Type Distribution

Static Analysis result (2) Severity Level Highest Severity First Failure Severity Number Percentage A Level (Critical): 12 2.8% 3 0.7% B Level (High): 276 64.8% 317 74.4% C Level (Low): 95 22.3% 99 23.2% D Level (Zero): 43 10.1% 7 1.6% Severity Distribution

Static Analysis result (3) Lines Number Percent 1 line: 116 27.23% 2-5 lines: 130 30.52% 6-10 lines: 61 14.32% 11-20 lines: 43 10.09% 21-50 lines: 53 12.44% >51 lines: 23 5.40% Average 11.39 Stage Number Percentage Init Code 237 55.6% Unit Test 120 28.2% Integration Test 31 7.3% Acceptance Test 38 8.9% Development Stage Distribution Fault Effect Code Lines

Dynamic analysis of mutants Software testing related Effectiveness of code coverage Test case contribution: test coverage vs. mutant coverage Finding non-redundant set of test cases Software fault tolerance related Relationship between mutants Relationship between the programs with mutants We analyzed the dynamic behavior of mutants from the following aspects: 1. Software testing related:

Fault Detection Related to Changes of Test Coverage Version ID Blocks Decisions C-Use P-Use Any 1 6/11 7/11 7/11(63.6%) 2 9/14 10/14 10/14(71.4%) 3 4/8 3/8 4/8(50.0%) 4 7/13 8/13 8/13(61.5%) 5 7/12 5/12 7/12(58.3%) 7 5/11 5/11(45.5%) 8 1/9 2/9 2/9(22.2%) 9 12 10/19 17/19 11/19 18/19(94.7%) 15 6/18 6/18(33.3%) 17 18 5/6 5/6(83.3%) 20 9/11 10/11 8/11 10/11(90.9%) 22 12/14 12/14(85.7%) 24 26 2/11 4/11 4/11(36.4%) 27 4/9 5/9 5/9(55.6%) 29 10/15 11/15 12/15(80.0%) 31 7/15 8/15(53.3%) 32 3/16 4/16 5/16 5/16(31.3%) 33 Overall 131/252 (60.0%) 145/252 (57.5%) 137/252 (53.4%) 152/252 (60.3%) 155/252 (61.5%) In order to answer the question whether testing coverage is an effective means for fault detection, we executed the 426 mutants over all test cases and observed whether additional coverage of the code was achieved when the mutants were killed by a new test case. 1, eliminate the mutants that failed at the first test case

Test Case Contribution on Program Coverage The contribution of each test case in block coverage of the total 426 mutants, measured across all executed mutants, is recorded and depicted in Figure 3. The vertical axis indicates the average percent of block coverage by each test case.

Percentage of Test Case Coverage Percentage of Coverage Blocks Decision C-Use P-Use Average 45.86% 29.63% 35.86% 25.61% Maximum 52.25% 35.15% 41.65% 30.45% Minimum 32.42% 18.90% 23.43% 16.77%

Test Case Contributions on Mutant Average: 248 (58.22%) Maximum: 334 (78.40%) Minimum: 163 (38.26%)

Non-redundant Set of Test Cases Gray: redundant test cases (502/1200) Black: non-redundant test cases (698/1200) Reduction: 58.2%

Mutants Relationship Relationship Number of pairs Percentage Related mutants 1067 1.18% Similar mutants 38 0.042% Exact mutants 13 0.014% For fault tolerance aspect, we’d like to get the relationship between mutants. There are three types of categories that we are interested. 1.2,3 Related mutants: two mutants have the same success/failure result on the 1200-bit binary string Similar mutants: two mutants have the same binary string and with the same erroneous output variables Exact mutants: two mutants have the same binary string with the same erroneous output variables, and erroneous output values are exactly the same

Observation Coverage measures and mutation scores cannot be evaluated in isolation, and an effective mechanism to distinguish related faults is critical A good test case should be characterized not only by its ability to detect more faults, but also by its ability to detect faults which are not detected by other test cases in the same test set

Observation Individual fault detection capability of each test case in a test set does not represent the overall capability of the test set to cover more faults, diversity natures of the test cases are more important Design diversity involving multiple program versions can be an effective solution for software reliability engineering, since the portion of program versions with exact faults is very small Software fault removal and fault tolerance are complementary rather than competitive, yet the quantitative tradeoff between the two remains a research issue

Evaluations on Current Reliability Models Popov and Strigini’s reliability bounds estimation model (PS model) Dugan and Lyu’s dependability model (DL model) This 4th part is not included in my term paper, cause we just have the results these two weeks.

PS Model PS model gives the upper and ''likely'' lower bounds for probability of failures on demand for a 1-out-of-2 diverse system As it is hard to obtain complete knowledge on the whole demand space, the demand space can be partitioned into some independent subsets, which is called sub-domains. Given the knowledge on subdomains, failure probabilities of the whole system can be estimated as a function of the subdomain to which a demand belongs.

PS Model

PS Model The upper bound on the probability of system failure as a weighted sum of upper bounds within subdomains: The ''likely'' lower bound can be drawn from the assumption of conditional independence:

PS Model: mutants passed acceptance test

PS Model Demand profile

PS Model: joint pfds Pfd: probability of failure on the demand The behavior of three pairs of mutants show three different features of fault coincidence of design diversity. 1) For pair (117,215), the two versions failed differently on the seven subdomains. 2) For the 2nd pair of mutants (215,382), the situation is normal and reasonable which can be found in other literatures, e.g., he covariance between mutants is a positive number $8.67\times{10^{-7}}$, indicating that the two mutants have related faults and may fail at the same subdomains. 3) The 3rd pair (117, 382), however, shows the possibility of negative covariance on Demand Profiles 2 and 3.

PS Model The pair with star under ''Positive correlation'' means that the covariance of these two mutants vary from positive to negative according to different demand files. Most likely the correlation are positive under DP1, but negative under the other three demand profiles. Thus these three pairs also occur in ''Negative correlation''. We can see that more than half of the pairs (8 out of 15) are minor correlated, the reason behind this is that there is only one programming fault in the mutants and probability of correlation may be reduced than that of real program versions.

DL Model Dugan and Lyu’s dependability model a Markov model details the system structure, two fault trees represent the causes of unacceptable results in the initial configuration and in the reconfigured degraded state. Three parameters can be estimated: the probability of an unrelated fault in a version; the probability of a related fault between two versions; the probability of a related fault in all versions.

DL Model Explain the 3 configurations

DL Model Our experiment shows that the predicted failure probability from fault tree model is very close to the observed values in most situations. The gap between the two in 4-version model, however, should be further investigated to validate the effectiveness and accuracy of the fault tree model.

Conclusion Our target is to investigate software testing, fault correlation and reliability modeling for design diversity We perform an empirical investigation on evaluating fault removal and fault tolerance issues as software reliability engineering techniques Mutation testing was applied with real faults

Conclusion Static and dynamic analysis were performed to evaluate the relationship of fault removal and fault tolerance techniques Different reliability models are applied on our project data to evaluate their validation and prediction accuracy.

Future Work For further evaluation and investigation purpose, more test cases should be generated to be executed on the mutants and versions. Comparison with existing project data will be made to observe the “variants” as well as “invariants” of design diversity

Q & A Thank you!