Coverage-Based Testing Strategies and Reliability Modeling for Fault- Tolerant Software Systems Presented by: CAI Xia Supervisor: Prof. Michael R. Lyu.

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Towards Self-Testing in Autonomic Computing Systems Tariq M. King, Djuradj Babich, Jonatan Alava, and Peter J. Clarke Software Testing Research Group Florida.
Presentation of the Quantitative Software Engineering (QuaSE) Lab, University of Alberta Giancarlo Succi Department of Electrical and Computer Engineering.
An Empirical Study on Reliability Modeling for Diverse Software Systems Xia Cai and Michael R. Lyu Dept. of Computer Science & Engineering The Chinese.
11. Practical fault-tolerant system design Reliable System Design 2005 by: Amir M. Rahmani.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
A Unified Scheme of Some Nonhomogenous Poisson Process Models for Software Reliability Estimation C. Y. Huang, M. R. Lyu and S. Y. Kuo IEEE Transactions.
1 Software Reliability Growth Models Incorporating Fault Dependency with Various Debugging Time Lags Chin-Yu Huang, Chu-Ti Lin, Sy-Yen Kuo, Michael R.
Software Reliability Engineering: A Roadmap
1 Testing Effectiveness and Reliability Modeling for Diverse Software Systems CAI Xia Ph.D Term 4 April 28, 2005.
An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering Michael R. Lyu, Zubin Huang, Sam Sze, Xia Cai The Chinese University.
Simulation.
An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Testing an individual module
Reliability Modeling for Design Diversity: A Review and Some Empirical Studies Teresa Cai Group Meeting April 11, 2006.
1 The Effect of Code Coverage on Fault Detection Capability: An Experimental Evaluation and Possible Directions Teresa Xia Cai Group Meeting Feb. 21, 2006.
University of Southern California Center for Software Engineering CSE USC ©USC-CSE 3/11/2002 Empirical Methods for Benchmarking High Dependability The.
Outline Types of errors Component Testing Testing Strategy
Statistical Critical Path Selection for Timing Validation Kai Yang, Kwang-Ting Cheng, and Li-C Wang Department of Electrical and Computer Engineering University.
Software Quality Assurance
Methodology: How Social Psychologists Do Research
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
ECE355 Fall 2004Software Reliability1 ECE-355 Tutorial Jie Lian.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
Pop Quiz How does fix response time and fix quality impact Customer Satisfaction? What is a Risk Exposure calculation? What’s a Scatter Diagram and why.
University of Toronto Department of Computer Science © 2001, Steve Easterbrook CSC444 Lec22 1 Lecture 22: Software Measurement Basics of software measurement.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
System/Software Testing
Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.
Chapter 2 The Research Enterprise in Psychology. n Basic assumption: events are governed by some lawful order  Goals: Measurement and description Understanding.
CPIS 357 Software Quality & Testing
1SAS 03/ GSFC/SATC- NSWC-DD System and Software Reliability Dolores R. Wallace SRS Technologies Software Assurance Technology Center
University of Coimbra, DEI-CISUC
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 2.
CRESCENDO Full virtuality in design and product development within the extended enterprise Naples, 28 Nov
Software Reliability SEG3202 N. El Kadri.
Chapter 6 : Software Metrics
Evaluation of software engineering. Software engineering research : Research in SE aims to achieve two main goals: 1) To increase the knowledge about.
1 Software testing. 2 Testing Objectives Testing is a process of executing a program with the intent of finding an error. A good test case is in that.
Configuration Management (CM)
Software Measurement & Metrics
Quality Assessment for CBSD: Techniques and A Generic Environment Presented by: Cai Xia Supervisor: Prof. Michael Lyu Markers: Prof. Ada Fu Prof. K.F.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
Chin-Yu Huang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan Optimal Allocation of Testing-Resource Considering Cost, Reliability,
Enabling Reuse-Based Software Development of Large-Scale Systems IEEE Transactions on Software Engineering, Volume 31, Issue 6, June 2005 Richard W. Selby,
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Fault Tolerance Benchmarking. 2 Owerview What is Benchmarking? What is Dependability? What is Dependability Benchmarking? What is the relation between.
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
WERST – Methodology Group
1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
SOFTWARE TESTING. Introduction Software Testing is the process of executing a program or system with the intent of finding errors. It involves any activity.
8/23/00ISSTA Comparison of Delivered Reliability of Branch, Data Flow, and Operational Testing: A Case Study Phyllis G. Frankl Yuetang Deng Polytechnic.
Copyright , Dennis J. Frailey CSE Software Measurement and Quality Engineering CSE8314 M00 - Version 7.09 SMU CSE 8314 Software Measurement.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.
Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.
Software Testing. SE, Testing, Hans van Vliet, © Nasty question  Suppose you are being asked to lead the team to test the software that controls.
1 Visual Computing Institute | Prof. Dr. Torsten W. Kuhlen Virtual Reality & Immersive Visualization Till Petersen-Krauß | GUI Testing | GUI.
Week#3 Software Quality Engineering.
Experience Report: System Log Analysis for Anomaly Detection
Software Testing.
Software testing strategies 2
Presented by: CAI Xia Ph.D Term2 Presentation April 28, 2004
By Hyunsook Do, Sebastian Elbaum, Gregg Rothermel
Presentation transcript:

Coverage-Based Testing Strategies and Reliability Modeling for Fault- Tolerant Software Systems Presented by: CAI Xia Supervisor: Prof. Michael R. Lyu August 24, 2006 Ph.D Thesis Defense

2 Outline  Background and related work  Research methodology  Experimental setup  Evaluations on design diversity  Coverage-based testing strategies  Reliability modeling  Conclusion and future work

3 Background  Four technical methods to achieve reliable software systems Structural Programming Formal methods Software reuse Software testing Formal inspection Checkpointing and recovery Exception handling Data diversity Design diversity Software reliability modeling

4 Fault-tolerant software  Single-version technique  Checkpointing and recovery  Exception handling  Multi-version technique (design diversity)  Recovery block (RB)  N-version programming (NVP)  N self-checking programming (NSCP) NVP model

5 Design diversity  Requirement  Same specification;  The multiple versions developed differently by independent teams;  No communications allowed between teams;  Expectation  Programs built differently should fail differently  Challenges  Cost consuming;  Correlated faults?

6 Experiments and evaluations  Empirical and theoretical investigations have been conducted based on experiments, modeling, and evaluations  Knight and Leveson (1986), Kelly et al (1988), Eckhardt et al (1991), Lyu and He (1993)  Eckhardt and Lee (1985), Littlewood and Miller (1989), Popov et al. (2003)  Belli and Jedrzejowicz (1990), Littlewood. et al (2001), Teng and Pham (2002)  No conclusive estimation can be made because of the size, population, complexity and comparability of these experiments

7 Software testing strategies  Key issue  test case selection and evaluation  Classifications  Functional testing (black-box testing)  Specification-based testing  Structural testing (white-box testing)  Branch testing  Data-flow coverage testing  Mutation testing  Random testing  Comparison of different testing strategies:  Simulations  Formal analysis Subdomain-based testing Code coverage: measurement of testing completeness?

8 Code coverage  Definition  measured as the fraction of program codes that are executed at least once during the test.  Classification  Block coverage: the portion of basic blocks executed.  Decision coverage: the portion of decisions executed  C-Use coverage: computational uses of a variable.  P-Use coverage: predicate uses of a variable

9 Code coverage: an indicator of testing effectiveness?  Positive evidence  high code coverage brings high software reliability and low fault rate  both code coverage and fault detected in programs grow over time, as testing progresses.  Negative evidence  Can this be attributed to causal dependency between code coverage and defect coverage?  Controversial, not conclusive

10 Software reliability growth modeling (SRGM)  To model past failure data to predict future behavior

11 SRGM: some examples  Nonhomogeneous Poisson Process (NHPP) model  S-shaped reliability growth model  Musa-Okumoto Logarithmic Poisson model μ(t) is the mean value of cumulative number of failure by time t

12 Reliability models for design diversity  Echhardt and Lee (1985)  Variation of difficulty on demand space  Positive correlations between version failures  Littlewood and Miller (1989)  Forced design diversity  Possibility of negative correlations  Dugan and Lyu (1995)  Markov reward model  Tomek and Trivedi (1995)  Stochastic reward net  Popov, Strigini et al (2003)  Subdomains on demand space  Upper bounds and “likely” lower bounds for reliability

13 Our contributions  For Fault Tolerance:  Assess the effectiveness of design diversity  For Fault Removal:  Establish the relationship between fault coverage and code coverage under various testing strategies  For Fault Forecast:  Propose a new reliability model which incorporates code coverage and testing time together

14 Outline  Background and related work  Research methodology  Experimental setup  Evaluations on design diversity  Coverage-based testing strategies  Reliability modeling  Conclusion and future work

15 Motivation  Fault-tolerant software  A necessity  Yet controversial  Lack of  Conclusive assessment  creditable reliability model  effective testing strategy  Real-world project data on testing and fault tolerance techniques together

16 Research procedure and methodology  A comprehensive and systematic approach  Modeling  Experimentation  Evaluation  Economics  Modeling  Formulate the relationship between testing and reliability achievement  Propose our own reliability models with the key attributes

17 Research procedure and methodology  Experimentation  Obtain new real-world fault-tolerant empirical data with coverage testing and mutation testing  Evaluation  Collect statistical data for the effectiveness of design diversity  Evaluate existing reliability models for design diversity;  Investigate the effect of code coverage;  Economics  Perform a tradeoff study on testing and fault tolerance

18 Outline  Background and related work  Research methodology  Experimental setup  Evaluations on design diversity  Coverage-based testing strategies  Reliability modeling  Conclusion and future work

19 Project features  Complicated and real-world application  Large population of program versions  Controlled development process  Mutation testing with real faults injection  Well-defined acceptance test set

20 Experimental setup  Time: spring of 2002  Population: 34 teams of four members  Application: a critical avionics application  Duration: a 12-week long project  Developers: senior-level undergraduate students with computer science major  Place: CUHK

21 Experimental project description  Geometry  Data flow diagram Redundant Strapped-Down Inertial Measurement Unit (RSDIMU)

22 Software development procedure 1. Initial design document ( 3 weeks) 2. Final design document (3 weeks) 3. Initial code (1.5 weeks) 4. Code passing unit test (2 weeks) 5. Code passing integration test (1 weeks) 6. Code passing acceptance test (1.5 weeks)

23 Mutant creation  Revision control applied and code changes analyzed  Mutants created by injecting real faults identified during each development stage  Each mutant containing one design or programming fault  426 mutants created for 21 program versions

24 Program metrics IdLinesModulesFunctionsBlocksDecisionsC-UseP-UseMutants Average Total: 426

25 Setup of evaluation test  ATAC tool employed to analyze the compare testing coverage  1200 test cases exercised as acceptance test  All failures analyzed, code coverage measured, and cross-mutant failure results compared  60 Sun machines running Solaris involved with 30 hours one cycle and a total of 1.6 million files around 20GB generated  1M test cases in operational test

26 Outline  Background and related work  Research methodology  Experimental setup  Evaluations on design diversity  Coverage-based testing strategies  Reliability modeling  Conclusion and future work

27 Static analysis result (1) Fault typesNumberPercentage Assign/Init:13631% Function/Class/Object:14433% Algorithm/Method:8119% Checking:6014% Interface/OO Messages51% QualifierNumberPercentage Incorrect:26763% Missing:14133% Extraneous:184% Fault Type Distribution Qualifier Distribution

28 Static analysis result (2) StageNumberPercentage Init Code % Unit Test % Integration Test317.3% Acceptance Test388.9% LinesNumberPercentage 1 line: % 2-5 lines: % 6-10 lines: % lines: % lines: % >51 lines:235.40% Average11.39 Development Stage Distribution Fault Effect Code Lines

29 Mutants relationship  Related mutants: - same success/failure 1200-bit binary string  Similar mutants: - same binary string with the same erroneous output variables  Exact mutants: - same binary string with same values of erroneous output variables Total pairs: 90525

30 Cross project comparison

31 Cross project comparison  NASA 4-university project: 7 out of 20 versions passed the operational testing  Coincident failures were found among 2 to 8 versions  5 of the 7 related faults were not observed in our project

32 Major contributions or findings on fault tolerance  Real-world mutation data for design diversity  A major empirical study in this field with substantial coverage and fault data  Supportive evidence for design diversity  Remarkable reliability improvement (10 2 to 10 4)  Low probability of fault correlation

33 Outline  Background and related work  Research methodology  Experimental setup  Evaluations on design diversity  Coverage-based testing strategies  Reliability modeling  Conclusion and future work

34 Research questions  Is code coverage a positive indicator for fault detection capability?  Does such effect vary under different testing strategies and profiles?  Does any such effect vary with different code coverage metrics?

35 Fault detection related to changes of test coverage Version IDBlocksDecisionsC-UseP-UseAny 16/8 7/87/8 (87.5%) 29/14 10/1410/14 (71.4%) 34/7 3/74/74/7 (57.1%) 4 7/118/11 8/11 (72.5%) 57/10 5/107/107/10 (70%) 75/10 5/10 (50%) 81/52/5 2/5 (40%) 97/9 7/9 (77.8%) 1210/2017/2011/2017/2018/20 (90%) 156/11 6/11 (54.5%) 175/7 5/7 (71.4%) 185/6 5/6 (83.3%) 209/1110/118/1110/1110/11 (90.9%) 2212/13 12/13 (92.3%) 245/7 5/7 (71.4%) 262/124/12 4/12 (33.3%) 274/75/74/75/75/7 (71.4%) 2910/18 11/1810/1812/18 (66.7%) 317/11 8/11 (72.7%) 323/74/75/7 5/7 (71.4%) 337/13 9/1310/1310/13 (76.9%) Overall131/217 (60.4%)145/217 (66.8%)137/217 (63.1%)152/217 (70%)155/217 (71.4%) = 217 Coverage increase => more faults detected!

36 Cumulated defect/block coverage

37 Cumulated defect coverage versus block coverage R 2 =0.945

38 Test cases description I II III IV V VI

39 Block coverage vs. fault coverage  Test case contribution on block coverage  Test case contribution on fault coverage I II III IV V VI

40 Correlation between block coverage and fault coverage  Linear modeling fitness in various test case regions  Linear regression relationship between block coverage and defect coverage in the whole test set

41 The correlation at various test regions  Linear regression relationship between block coverage and defect coverage in Region VI  Linear regression relationship between block coverage and defect coverage in Region IV

42 Under various testing strategies  Functional test:  Random test:  Normal test: the system is operational according to the spec  Exceptional test: the system is under severe stress conditions.

43 With different coverage metrics  The correlations under decision, C- use and P-use are similar with that of block coverage

44 Answers to the research questions  Is code coverage a positive indicator for fault detection capability?  Yes.  Does such effect vary under different testing strategies and profiles?  Yes. The effect is highest with exceptional test cases, while lowest with normal test cases.  Does any such effect vary with different code coverage metrics?  Not obvious with our experimental data.

45 Major contributions or findings on software testing  High correlation between fault coverage and code coverage in exceptional test cases  Give guidelines for design of exceptional test cases  This is the first time that such correlation has been investigated under various testing strategies

46 Outline  Background and related work  Research methodology  Experimental setup  Evaluations on design diversity  Coverage-based testing strategies  Reliability modeling  Conclusion and future work

47 Work on reliability modeling  Evaluate current probability reliability models for design diversity with our experimental data  Propose a new reliability model which incorporates test coverage measurement into traditional software growth model

48 Results of PS Model with our project data Popov, Strigini et al (2003)

49 Results of PS Model with our project data

50 Results of DL model with our project data  Dugan and Lyu (1995)  Predicted reliability by different configurations  The result is consistent with previous study

51 Introducing coverage into software reliability modeling  Most traditional software reliability models are based on time domain  However, time may not be the only factor that affects the failure behavior of software  Test completeness may be another indicator for software reliability

52 A new reliability model  Assumptions: 1. The number of failures revealed in testing is related to not only the execution time, but also the code coverage achieved; 2. The failure rate with respect to time and test coverage together is a parameterized summation of those with respect to time or coverage alone; 3. The probabilities of failure with respect to time and coverage are not independent, they affect each other by an exponential rate.

53 Model form  λ(t,c): joint failure intensity function  λ 1 (t): failure intensity function with respect to time  λ 2 (c): failure intensity function with respect to coverage  α 1,γ 1, α 2, γ 2 : parameters with the constraint of α 1 + α 2 = 1 joint failure intensity function failure intensity function with time failure intensity function with coverage Dependency factors

54  Method A:  Select a model for λ 1 (t) and λ 2 (c) ;  Estimate the parameters inλ 1 (t) and λ 2 (c) independently;  Optimize other four parameters afterwards.  Method B:  Select a model for λ 1 (t) and λ 2 (c) ;  Optimize all parameters together.  Least-squares estimation (LSE) employed Estimation methods Existing reliability models: NHPP, S-shaped, logarithmic, Weibull … ???

55 λ(c) : Modeling defect coverage and code coverage  A Hyper-exponential model  F c : cumulated number of failures when coverage c is achieved  K: number of classes of testing strategies;  N i : the expected number of faults detected eventually in each class  A Beta model  N 1 : the expected number of faults detected eventually  N 2 : the ultimate test coverage

56 λ(c) : Experimental evaluation

57 λ(c) : Parameters estimation results  Hyper-exponential model  Beta model SSE=38365

58  λ 1 (t), λ 2 (c): exponential (NHPP)  NHPP model: original SRGM Parameter estimation (1)

59 Prediction accuracy (1)

60 Parameter estimation (2)  λ 1 (t) : NHPP  λ 2 (c): Beta model

61 Estimation accuracy (2)

62 Major contributions or findings on software reliability modeling  The first reliability model which combines the effect of testing time and code coverage together  The new reliability model outperforms traditional NHPP model in terms of estimation accuracy

63 Outline  Background and related work  Research methodology  Experimental setup  Evaluations on design diversity  Coverage-based testing strategies  Reliability modeling  Conclusion and future work

64 Conclusion  Propose a new software reliability modeling  Incorporate code coverage into traditional software reliability growth models  Achieve better accuracy than the traditional NHPP model The first reliability model combining the effect of testing time and code coverage together

65 Conclusion  Assess multi-version fault-tolerant software with supportive evidence by a large-scale experiment  High reliability improvement  Low fault correlation  Stable performance A major empirical study in this field with substantial fault and coverage data

66 Conclusion  Evaluate the effectiveness of coverage- based testing strategies:  Code coverage is a reasonably positive indicator for fault detection capability  The effect is remarkable under exceptional testing profile The first evaluation looking into different categories of testing strategies

67 Future work  Further evaluate the current reliability model using comparisons with existing reliability models other than NHPP  Consider other formulations about the relationship between fault coverage and test coverage  Further study on the economical tradeoff between software testing and fault tolerance

68 Publication list  Journal papers and book chapters  Xia Cai, Michael R. Lyu and Kam-Fai Wong, A Generic Environment for COTS Testing and Quality Prediction, Testing Commercial-off-the-shelf Components and Systems, Sami Beydeda and Volker Gruhn (eds.), Springer-Verlag, Berlin, 2005, pp  Michael R. Lyu and Xia Cai, Fault-tolerant Software, To appear in Encyclopedia on Computer Science and Engineering, Benjamin Wah (ed.), Wiley..  Xia Cai, Michael R. Lyu, An Experimental Evaluation of the Effect of Code Coverage on Fault Detection, Submitted to IEEE Transactions on Software Engineering, June  Xia Cai, Michael R. Lyu, Mladen A. Vouk, Reliability Features for Design Diversity :Cross Project Evaluations and Comparisons, in preparation.  Xia Cai, Michael R. Lyu, Predicting Software Reliability with Test Coverage, in preparation.

69 Publication list  Conference papers  Michael R. Lyu, Zubin Huang, Sam K. S. Sze and Xia Cai, “An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering,” Proceedings of the 14th IEEE International Symposium on Software Reliability Engineering (ISSRE'2003), Denver, Colorado, Nov. 2003, pp This paper received the ISSRE'2003 Best Paper Award.  Xia Cai and Michael R. Lyu, “An Empirical Study on Reliability and Fault Correlation Models for Diverse Software Systems,” ISSRE’2004, Saint-Malo, France, Nov. 2004, pp  Xia Cai and Michael R. Lyu, “The Effect of Code Coverage on Fault Detection under Different Testing Profiles,” ICSE 2005 Workshop on Advances in Model-Based Software Testing (A-MOST), St. Louis, Missouri, May  Xia Cai, Michael R. Lyu and Mladen A. Vouk, “An Experimental Evaluation on Reliability Features of N-Version Programming,” ISSRE’2005, Chicago, Illinois, Nov. 8-11, 2005, pp  Xia Cai and Michael R. Lyu, “Predicting Software Reliability with Testing Coverage Information,” In preparation to International Conference on Software Engineering (ICSE’2007).

Q & A Thanks!

71 Previous work on modeling reliability with coverage information  Vouk (1992)  Rayleigh model  Malaiya et al.(2002)  Logarithmic-exponential model  Chen et al. (2001)  Using code coverage as a factor to reduce the execution time in reliability models

72 Comparisons with previous estimations

73  The number of mutants failing in different testing

74 Non-redundant set of test cases

75 Test set reduction with normal testing

76 Test set reduction with exceptional testing