1 The Effect of Code Coverage on Fault Detection Capability: An Experimental Evaluation and Possible Directions Teresa Xia Cai Group Meeting Feb. 21, 2006.

Slides:



Advertisements
Similar presentations
Defect testing Objectives
Advertisements

Lecture 8: Testing, Verification and Validation
Design of Experiments Lecture I
Testing and Quality Assurance
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Ossi Taipale, Lappeenranta University of Technology
An Empirical Study on Reliability Modeling for Diverse Software Systems Xia Cai and Michael R. Lyu Dept. of Computer Science & Engineering The Chinese.
Prachi Saraph, Mark Last, and Abraham Kandel. Introduction Black-Box Testing Apply an Input Observe the corresponding output Compare Observed output with.
Software testing.
Coverage-Based Testing Strategies and Reliability Modeling for Fault- Tolerant Software Systems Presented by: CAI Xia Supervisor: Prof. Michael R. Lyu.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Software Reliability Engineering: A Roadmap
Software Testing and Quality Assurance
1 Testing Effectiveness and Reliability Modeling for Diverse Software Systems CAI Xia Ph.D Term 4 April 28, 2005.
An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering Michael R. Lyu, Zubin Huang, Sam Sze, Xia Cai The Chinese University.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.
CS 425/625 Software Engineering Software Testing
1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.
Testing an individual module
Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.
03/09/2007 Earthquake of the Week
1 Software Testing and Quality Assurance Lecture 5 - Software Testing Techniques.
Chapter 7 Correlational Research Gay, Mills, and Airasian
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
University of Toronto Department of Computer Science © 2001, Steve Easterbrook CSC444 Lec22 1 Lecture 22: Software Measurement Basics of software measurement.
Quantify prediction uncertainty (Book, p ) Prediction standard deviations (Book, p. 180): A measure of prediction uncertainty Calculated by translating.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
TESTING.
Objectives Understand the basic concepts and definitions relating to testing, like error, fault, failure, test case, test suite, test harness. Explore.
CPIS 357 Software Quality & Testing
Validation Metrics. Metrics are Needed to Answer the Following Questions How much time is required to find bugs, fix them, and verify that they are fixed?
CMSC 345 Fall 2000 Unit Testing. The testing process.
CS4311 Spring 2011 Unit Testing Dr. Guoqiang Hu Department of Computer Science UTEP.
VTT-STUK assessment method for safety evaluation of safety-critical computer based systems - application in BE-SECBS project.
Chapter 6 : Software Metrics
Overview of Software Testing 07/12/2013 WISTPC 2013 Peter Clarke.
Software Testing Course Shmuel Ur
1 Software testing. 2 Testing Objectives Testing is a process of executing a program with the intent of finding an error. A good test case is in that.
Software Testing Testing types Testing strategy Testing principles.
Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.
Software Development Cycle What is Software? Instructions (computer programs) that when executed provide desired function and performance Data structures.
Software Testing Yonsei University 2 nd Semester, 2014 Woo-Cheol Kim.
Software Testing Reference: Software Engineering, Ian Sommerville, 6 th edition, Chapter 20.
1 Program Testing (Lecture 14) Prof. R. Mall Dept. of CSE, IIT, Kharagpur.
CSC 480 Software Engineering Lecture 15 Oct 21, 2002.
Chapter 10 Verification and Validation of Simulation Models
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 20 Slide 1 Defect testing l Testing programs to establish the presence of system defects.
Chair of Software Engineering Exercise Session 6: V & V Software Engineering Prof. Dr. Bertrand Meyer March–June 2007.
A Biased Fault Attack on the Time Redundancy Countermeasure for AES Sikhar Patranabis, Abhishek Chakraborty, Phuong Ha Nguyen and Debdeep Mukhopadhyay.
Week 14 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.
Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.
Comparing model-based and dynamic event-extraction based GUI testing techniques : An empirical study Gigon Bae, Gregg Rothermel, Doo-Hwan Bae The Journal.
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
CS451 Lecture 10: Software Testing Yugi Lee STB #555 (816)
Computer Science 1 Systematic Structural Testing of Firewall Policies JeeHyun Hwang 1, Tao Xie 1, Fei Chen 2, and Alex Liu 2 North Carolina State University.
Structuring Redundancy for Fault Tolerance Chapter 2 Designed by: Hadi Salimi Instructor: Dr. Mohsen Sharifi.
Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.
Software Testing. SE, Testing, Hans van Vliet, © Nasty question  Suppose you are being asked to lead the team to test the software that controls.
Defect testing Testing programs to establish the presence of system defects.
SOFTWARE TESTING AND QUALITY ASSURANCE. Software Testing.
Testing Tutorial 7.
Predict Failures with Developer Networks and Social Network Analysis
Presented by: CAI Xia Ph.D Term2 Presentation April 28, 2004
Presentation transcript:

1 The Effect of Code Coverage on Fault Detection Capability: An Experimental Evaluation and Possible Directions Teresa Xia Cai Group Meeting Feb. 21, 2006

2 Outline Testing coverage and testing strategies Research questions Experimental setup Results and analysis Discussions and conclusions

3 Introduction Test case selection and evaluation is a key issue in software testing Testing strategies aim to select an effective test set to detect as more faults as possible Black-box testing (functional testing) White-box testing (structural testing)

4 White-box testing schemes: Control/data flow coverage Code coverage - measured as the fraction of program codes that are executed at least once during the test. Block coverage - the portion of basic blocks executed. Decision coverage - the portion of decisions executed C-Use - computational uses of a variable. P-Use - predicate uses of a variable

5 Code coverage: an indicator for test effectiveness? Supportive empirical studies high code coverage brings high software reliability and low fault rate both code coverage and fault detected in programs grow over time, as testing progresses.  Weyuker et al (1985, 1988, 1990)  Horgan, London & Lyu (1994)  Wong, Horgan, London & Mathur (1994)  Frate, Garg, Mathur & Pasquini (1995) Oppositive empirical studies Can this be attributed to causal dependency between code coverage and defect coverage?  Briand & Pfahl (2000)

6 Black-box testing schemes: testing profiles Functional testing – based on specified functional requirements Random testing - the structure of input domain based on a predefined distribution function Normal operational testing – based on normal operational system status Exceptional testing - based on exceptional system status { {

7 Testing coverage and testing strategies Research questions Experimental setup Results and analysis Discussions and conclusions Outline

8 Research questions 1. Is code coverage a positive indicator for testing effectiveness? 2. Does such effect vary under various testing profiles? 3. Does such effect vary with different coverage measurements? 4. Is code coverage a good filter to reduce the size of effective test set?

9 Testing coverage and testing strategies Research questions Experimental setup Results and analysis Discussions and conclusions Outline

10 Experimental setup In spring of 2002, 34 teams are formed to develop a critical industry application for a 12-week long project in a software engineering course Each team composed of 4 senior-level undergraduate students with computer science major from the Chinese University of Hong Kong

11 Experimental project description Geometry Data flow diagram Redundant Strapped-Down Inertial Measurement Unit (RSDIMU)

12 Software development procedure 1. Initial design document ( 3 weeks) 2. Final design document (3 weeks) 3. Initial code (1.5 weeks) 4. Code passing unit test (2 weeks) 5. Code passing integration test (1 weeks) 6. Code passing acceptance test (1.5 weeks)

13 Mutant creation Revision control was applied in the project and code changes were analyzed Faults found during each stage were also identified and injected into the final program of each version to create mutants Each mutant contains one design or programming fault 426 mutants were created for 21 program versions

14 Program metrics IdLinesModulesFunctionsBlocksDecisionsC-UseP-UseMutants Average Total: 426

15 Fault effect code lines LinesNumberPercentage 1 line: % 2-5 lines: % 6-10 lines: % lines: % lines: % >51 lines:235.40% Average11.39

16 Setup of evaluation test A test coverage tool was employed to analyze the compare testing coverage 1200 test cases were exercised on 426 mutants All the resulting failures from each mutant were analyzed, their coverage measured, and cross-mutant failure results compared 60 Sun machines running Solaris were involved in the test, where one cycle took 30 hours and a total of 1.6 million files around 20GB were generated

17 Test case description Case IDDescription of the test cases. 1A fundamental test case to test basic functions. 2-7Test cases checking vote control in different order. 8General test case based on test case 1 with different display mode. 9-19Test varying valid and boundary display mode Test cases for lower order bits Test cases for display and sensor failure Test random display mode and noise in calibration Test correct use of variable and sensitivity of the calibration procedure. 86, Test on input, noise and edge vector failures Test various and large angle value Test cases checking for the minimal sensor noise levels for failure declaration Test cases with various combinations of sensors failed on input and up to one additional sensor failed in the edge vector test Random test cases. Initial random seed for 1st 100 cases is: 777, for 2nd 100 cases is: Random test cases. Initial random seed is: for 200 cases.

18 Testing coverage and testing strategies Research questions Experimental setup Results and analysis Effective of code coverage Under various testing profiles With different coverage measurements Effective test set Discussions and conclusions Outline

19 Fault detection related to changes of test coverage Version IDBlocksDecisionsC-UseP-UseAny 16/11 7/117/11(63.6%) 29/14 10/1410/14(71.4%) 34/8 3/84/84/8(50.0%) 4 7/138/13 8/13(61.5%) 57/12 5/127/127/12(58.3%) 75/11 5/11(45.5%) 81/92/9 2/9(22.2%) 97/12 7/12(58.3%) 1210/1917/1911/1917/1918/19(94.7%) 156/18 6/18(33.3%) 175/11 5/11(45.5%) 185/6 5/6(83.3%) 209/1110/118/1110/1110/11(90.9%) 2212/14 12/14(85.7%) 245/6 5/6(83.3%) 262/114/11 4/11(36.4%) 274/95/94/95/95/9(55.6%) 2910/15 11/1510/1512/15(80.0%) 317/15 8/15(53.3%) 323/164/165/16 5/16(31.3%) 337/11 9/1110/1110/11(90.9%) Overall131/252 (60.0%)145/252 (57.5%)137/252 (53.4%)152/252 (60.3%)155/252 (61.5%)

20 Relations between numbers of mutants against effective percentage of coverage

21 Cumulated defect coverage on sequence

22 Cumulated block coverage on sequence

23 Cumulated defect coverage versus block coverage R 2 =0.945

24 Percentage of test case coverage Percentage of Coverage BlocksDecisionC-UseP-Use Average45.86%29.63%35.86%25.61% Maximum52.25%35.15%41.65%30.45% Minimum32.42%18.90%23.43%16.77%

25 The correlation: various test regions Test case coverage contribution on block coverage Test case coverage contribution on mutant coverage I II III IV V VI

26 Test cases description I II III IV V VI

27 Testing coverage and testing strategies Research questions Experimental setup Results and analysis Effective of code coverage Under various testing profiles With different coverage measurements Effective test set Discussions and conclusions Outline

28 In various test regions Linear modeling fitness in various test case regions Linear regression relationship between block coverage and defect coverage in the whole test set

29 In various test regions (cont’) Linear regression relationship between block coverage and defect coverage in region VI Linear regression relationship between block coverage and defect coverage in region IV

30 In various test regions (cont’) Observations: Code coverage: a moderate indicator Reasons behind the big variance between region IV and VI Region IVRegion VI Design principleFunctional testing Random testing Coverage range 32% ~ 50% 48% ~ 52% Number of exceptional test cases 277 (Total: 373) 0

31 With functional/random testing Code coverage: – a moderate indicator Random testing – a necessary complement to functional testing Similar code coverage Both have high fault detection capability Testing profile (size)R-square Whole test set (1200)0.781 Functional test cases (800)0.837 Random test cases (400)0.558

32 With functional/random testing (cont’) Failure number of mutants detected only by functional testing or random testing Test case typeMutants detected exclusively (total mutants killed) Average number of test cases that detect these mutants Std. deviation Functional testing 20 (382) Random testing 9 (371)

33 Under normal operational / exceptional testing The definition of operational status and exceptional status Defined by specification Application-dependent For RSDIMU application Operational status: at most two sensors failed as the input and at most one more sensor failed during the test Exceptional status: all other situations The 1200 test cases are classified to operational and exceptional test cases according to their inputs and outputs

34 Under normal operational / exceptional testing (cont’) Normal operational testing very weak correlation Exceptional testing strong correlation Testing profile (size)R-square Whole test case (1200)0.781 Normal testing (827)0.045 Exceptional testing (373)0.944

35 Under normal operational / exceptional testing (cont’) Normal testing: small coverage range (48%-52%) Exceptional testing: two main clusters

36 Under normal operational / exceptional testing (cont’) Failure number of mutants detected only by normal operational testing or exceptional testing Test case type Mutants detected exclusively (total mutants detected) Average number of test cases that detect these mutants Std. deviation Normal testing 36/ Exceptional testing 20/

37 Under testing profile combinations Combinations of testing profiles Observations: Combinations containing exceptional testing indicate strong correlations Combinations containing normal testing inherit weak correlations

38 Testing coverage and testing strategies Research questions Experimental setup Results and analysis Effective of code coverage Under various testing profiles With different coverage measurements Effective test set Discussions and conclusions Outline

39 With different coverage measurements Similar patterns as block coverage Insignificant difference under normal testing Decision and P-use have a bit larger correlation, as they relate to change of control flow

40 Testing coverage and testing strategies Research questions Experimental setup Results and analysis Effective of code coverage Under various testing profiles With different coverage measurements Effective test set Discussions and conclusions Outline

41 The reduction of the test set size using coverage increase information

42 Testing coverage and testing strategies Research questions Experimental setup Results and analysis Discussions and conclusions Outline

43 Answers to RQs 1. Is code coverage a positive indicator for testing effectiveness?  Our answer is supportive  At most situations (61.5%), there is an coverage increase when a test case detect additional faults.  Under some functional and exceptional testing region, the correlation between code coverage and fault coverage is pretty high  When more cumulated code coverage have been achieved, more faults are detected.

44 Answers to RQs (cont’) 2. Does such effect vary under various testing profiles?  A significant correlation exists in exceptional test cases, while no correlation in normal operational test cases.  Higher correlation is revealed in functional testing than in random testing, but the difference is insignificant

45 Answers to RQs (cont’) 3. Does such effect vary with different coverage measurements?  Not obvious with four coverage measurements 4. Is code coverage a good filter to reduce the size of effective test set?  Yes, 203 test cases (17% of the original test set) which achieve any coverage increase can detect 98% of the faults.

46 Conclusion Code coverage is a reasonably good indictor for fault detection capability. The strong correlation revealed in exceptional testing implies that coverage works predictably better in certain testing profiles than others. Testing guidelines and strategy can be established for coverage-based testing: For normal operational testing: specification-based, regardless of code coverage For exceptional testing: code coverage is an important metrics for testing capability A quantifiable testing strategy may emerge by combining black-box and white-box testing strategies appropriately.