Prioritizing Test Cases for Regression Testing Sebastian Elbaum University of Nebraska, Lincoln Alexey Malishevsky Oregon State University Gregg Rothermel.

Slides:



Advertisements
Similar presentations
Testing Coverage Test case
Advertisements

Time-Aware Test Suite Prioritization Kristen R. Walcott, Mary Lou Soffa University of Virginia International Symposium on Software Testing and Analysis.
An Evaluation of MC/DC Coverage for Pair-wise Test Cases By David Anderson Software Testing Research Group (STRG)
FACTORIAL ANOVA. Overview of Factorial ANOVA Factorial Designs Types of Effects Assumptions Analyzing the Variance Regression Equation Fixed and Random.
CS527: Advanced Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 18, 2008.
November 5, 2007 ACM WEASEL Tech Efficient Time-Aware Prioritization with Knapsack Solvers Sara Alspaugh Kristen R. Walcott Mary Lou Soffa University of.
Effectively Prioritizing Tests in Development Environment
Software Failure: Reasons Incorrect, missing, impossible requirements * Requirement validation. Incorrect specification * Specification verification. Faulty.
Test Case Filtering and Prioritization Based on Coverage of Combinations of Program Elements Wes Masri and Marwa El-Ghali American Univ. of Beirut ECE.
Prioritizing User-session-based Test Cases for Web Applications Testing Sreedevi Sampath, Renne C. Bryce, Gokulanand Viswanath, Vani Kandimalla, A.Gunes.
Prediction of fault-proneness at early phase in object-oriented development Toshihiro Kamiya †, Shinji Kusumoto † and Katsuro Inoue †‡ † Osaka University.
Validation and Monitoring Measures of Accuracy Combining Forecasts Managing the Forecasting Process Monitoring & Control.
1 Static Testing: defect prevention SIM objectives Able to list various type of structured group examinations (manual checking) Able to statically.
A CONTROL INSTRUMENTS COMPANY The Effectiveness of T-way Test Data Generation or Data Driven Testing Michael Ellims.
Empirically Assessing End User Software Engineering Techniques Gregg Rothermel Department of Computer Science and Engineering University of Nebraska --
Chapter 3 Forecasting McGraw-Hill/Irwin
An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
© 2006 Fraunhofer CESE1 MC/DC in a nutshell Christopher Ackermann.
Forecasting McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Software Process and Product Metrics
State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.
Data Forensics: A Compare and Contrast Analysis of Multiple Methods Christie Plackner.
Expediting Programmer AWAREness of Anomalous Code Sarah E. Smith Laurie Williams Jun Xu November 11, 2005.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 27 Slide 1 Quality Management 1.
AMOST Experimental Comparison of Code-Based and Model-Based Test Prioritization Bogdan Korel Computer Science Department Illinois Institute of Technology.
How Significant Is the Effect of Faults Interaction on Coverage Based Fault Localizations? Xiaozhen Xue Advanced Empirical Software Testing Group Department.
Class Specification Implementation Graph By: Njume Njinimbam Chi-Chang Sun.
University of Coimbra, DEI-CISUC
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
SWEN 5430 Software Metrics Slide 1 Quality Management u Managing the quality of the software process and products using Software Metrics.
Regression Testing. 2  So far  Unit testing  System testing  Test coverage  All of these are about the first round of testing  Testing is performed.
Software Measurement & Metrics
Management & Development of Complex Projects Course Code MS Project Management Perform Qualitative Risk Analysis Lecture # 25.
Test Case Prioritization: A New Approach By Otto Borchert.
Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.
Chapter 7 Auditing Internal Control over Financial Reporting McGraw-Hill/IrwinCopyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
The influence of developer quality on software fault- proneness prediction Yansong Wu, Yibiao Yang, Yangyang Zhao,Hongmin Lu, Yuming Zhou,Baowen Xu 資訊所.
Detector Simulation on Modern Processors Vectorization of Physics Models Philippe Canal, Soon Yung Jun (FNAL) John Apostolakis, Mihaly Novak, Sandro Wenzel.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
1 Program Testing (Lecture 14) Prof. R. Mall Dept. of CSE, IIT, Kharagpur.
Prioritizing Test Cases for Regression Testing Article By: Rothermel, et al. Presentation by: Martin, Otto, and Prashanth.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
Bug Localization with Association Rule Mining Wujie Zheng
Comparing model-based and dynamic event-extraction based GUI testing techniques : An empirical study Gigon Bae, Gregg Rothermel, Doo-Hwan Bae The Journal.
When Tests Collide: Evaluating and Coping with the Impact of Test Dependence Wing Lam, Sai Zhang, Michael D. Ernst University of Washington.
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
Daniel Liu & Yigal Darsa - Presentation Early Estimation of Software Quality Using In-Process Testing Metrics: A Controlled Case Study Presenters: Yigal.
WERST – Methodology Group
8/23/00ISSTA Comparison of Delivered Reliability of Branch, Data Flow, and Operational Testing: A Case Study Phyllis G. Frankl Yuetang Deng Polytechnic.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
Mutation Testing Breaking the application to test it.
A PRELIMINARY EMPIRICAL ASSESSMENT OF SIMILARITY FOR COMBINATORIAL INTERACTION TESTING OF SOFTWARE PRODUCT LINES Stefan Fischer Roberto E. Lopez-Herrejon.
Software Testing and Quality Assurance Practical Considerations (1) 1.
QbD Technologies: Workshop for Risks Analysis Incorporating Risk Management for Technology Transfer.
A Hierarchical Model for Object-Oriented Design Quality Assessment
The Impact of Concurrent Coverage Metrics on Testing Effectiveness
Regression Testing with its types
Software Testing.
Deriving Test Data for Web Applications from User Session Logs
Regression Testing.
Software Testing (Lecture 11-a)
August Shi, Tifany Yung, Alex Gyori, and Darko Marinov
Fabiano Ferrari Software Engineering Federal University of São Carlos
Test Case Purification for Improving Fault Localization
By Hyunsook Do, Sebastian Elbaum, Gregg Rothermel
Mitigating the Effects of Flaky Tests on Mutation Testing
Presentation transcript:

Prioritizing Test Cases for Regression Testing Sebastian Elbaum University of Nebraska, Lincoln Alexey Malishevsky Oregon State University Gregg Rothermel Oregon State University ISSTA 2000

Defining Prioritization Test scheduling During regression testing stage Goal: maximize a criterion/criteria –Increase rate of fault detection –Increase rate of coverage –Increase rate of fault likelihood exposure

Prioritization Requirements Definition of goal Increase rate of fault detection Measurement criterion % Of faults detected over life of test suite Prioritization technique Randomly Total statements coverage Probability of exposing faults

Previous Work Goal –Increase rate of fault detection Measurement –APFD: weighted average of the percentage of faults detected over life of test suite –Scale: (higher means faster detection)

Previous Work (2) A-B-C-D-EC-E-B-A-DE-D-C-B-A XX XXXX XXXXXXX X XXX Faults B A D C E TESTS Measuring Rate of Fault Detection

Previous Work (3) #Label Prioritize on 1 random randomized ordering 2 optimal optimize rate of fault detection 3 st­total coverage of statements 4 st­addtl coverage of statements not yet covered 5 st­fep probability of exposing faults 6 st­fep­addtl probability of faults, adjusted to consider previous test cases Prioritization Techniques

Summary Previous Work Performed empirical evaluation of general prioritization techniques –Even simple techniques generated gains Used statement level techniques Still room to improve

Research Questions 1.Can version specific TCP improve the rate of fault detection? 2.How does fine technique granularity compare with coarse level granularity? 3.Can the use of fault proneness improve the rate of fault detection?

Addressing RQ New family of prioritization techniques New series of experiments 1.Version specific prioritization –Statement –Function 2.Granularity 3.Contribution of fault proneness Practical implications

Additional Techniques #Label Prioritize on 7 fn­total coverage of functions 8 fn­addtl coverage of functions not yet covered 9 fn­fep­total probability of exposing faults 10 fn­fep­addtl probability of exposing faults, adjusted to consider previous tests 11 fn­fi­total probability of fault likelihood 12 fn­fi­addtl probability of fault likelihood, adjusted to consider previous tests 13 fn­fi­fep­total combined probabilities of fault existence and fault exposure 14 fn­fi­fep­addtl combined probabilities of fault existence/exposure, adjusted on previous coverage

Family of Experiments 8 programs 29 versions 50 test suites per program –Branch coverage adequate 14 techniques –2 control “techniques” – optimal & random –4 statement level –8 function level

“Generic” Factorial Design Techniques Programs 50 Test Suites 29 Versions Independenc e of code Independence of suite composition Independence of changes

Experiment 1a – Version Specific RQ1: Prioritization works on version specific at stat. level. –ANOVA: Different average APFD among stat. level techniques –Bonferroni: St-fep-addtl significantly better GroupTechniqueValue ASt-fep-addtl78.88 BSt-fep-total76.99 BSt-total76.30 CSt-addtl74.44 Random59.73

Experiment 1b – Version Specific RQ1: Prioritization works on version specific at function level. –ANOVA: Different average APFD among function level techniques –Bonferroni: Fn-fep not significantly different than Fn-total GroupTechniqueValue AFn-fep-addtl75.59 AFn-fep-total75.48 AFn-total75.09 BFn-addtl71.66

Experiment 2: Granularity RQ2: Fine granularity has greater prioritization potential –Techniques at the stat. level are significantly better than functional level –However, “best” functional level are better than “worse” statement level

Experiment 3: Fault Proneness RQ3: Incorporating fault likelihood did not significantly increased APFD. –ANOVA: Significant differences in average APFD values among all functional level techniques –Bonferroni: “Surprise”. Techniques using fault likelihood did not rank significantly better GroupTechniqueValue AFn-fi-fep-addtl76.34 A BFn-fi-fep-total75.92 A BFn-fi-total75.63 A BFn-fep-addtl75.59 A BFn-fep-total75.48 BFn-total75.09 CFn-fi-addtl72.62 CFn-addtl71.66 Reasons: –For small changes fault likelihood does not seem to be worth it. –We believe it will be worthwhile for larger changes. Further exploration required.

Practical Implications APFD: Optimal = 99% Fn-fi-fep-addtl= 98% Fn-total = 93% Random = 84% Time: Optimal = 1.3 Fn-fi-fep-addtl = 2.0 (+.7) Fn-total = 11.9(+10.6) Random = 16.5(+15.2)

Conclusions Version specific techniques can significantly improve rate of fault detection during regression testing Technique granularity is noticeable –In general, statement level is more powerful but, –Advanced functional level techniques are better than simple statement level techniques Fault likelihood may not be helpful

Working on … Controlling the threats –More subjects –Extending model Discovery of additional factors Development of guidelines to choose “best” technique

Backup Slides

Threats Representativeness –Program –Changes –Tests and process APFD as a test efficiency measure Tools correctness

Experiment Subjects

FEP Computation Probability that a fault causes a failure Works with mutation analysis –Insert mutants –Determine how many mutant are exposed by a test case FEP(t,s) = # of mutants of s exposed by t # of mutants of s

FI Computation Fault likelihood Associated with measurable software attributes Complexity metrics –Size, Control Flow, and Coupling –Generated fault index principal component analysis

Overall GroupTechniqueValue AOptimal94.24 BSt-fep-addtl78.88 CSt-fep-total76.99 D CFn-fi-fep-addtl76.34 D CSt-total76.30 D EFn-fi-fep-total75.92 D EFn-fi-total75.63 D EFn-fep-addtl75.59 D EFn-fep-total75.48 F EFn-total75.09 FSt-addtl74.44 GFn-fi-addtl72.62 GFn-addtl71.66 HRandom59.73