Balancing Trade-Offs in Test-Suite Reduction

Slides:



Advertisements
Similar presentations
Overview of Withdrawal Designs
Advertisements

Testing Coverage Test case
Understanding and Detecting Real-World Performance Bugs
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
CS527: Advanced Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 18, 2008.
November 5, 2007 ACM WEASEL Tech Efficient Time-Aware Prioritization with Knapsack Solvers Sara Alspaugh Kristen R. Walcott Mary Lou Soffa University of.
Internet Vision - Lecture 3 Tamara Berg Sept 10. New Lecture Time Mondays 10:00am-12:30pm in 2311 Monday (9/15) we will have a general Computer Vision.
(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)
SWE Introduction to Software Engineering
Data Mining.
Computer Science 162 Section 1 CS162 Teaching Staff.
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
1 CMSC 132: Object-Oriented Programming II Software Development I Department of Computer Science University of Maryland, College Park.
EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,
ECE122 L17: Method Development and Testing April 5, 2007 ECE 122 Engineering Problem Solving with Java Lecture 17 Method Development and Testing.
Improved Goalie Strategy with the Aldebaran Nao humanoid Robots* *This research is supported by NSF Grant No. CNS Opinions, findings, conclusions,
AMOST Experimental Comparison of Code-Based and Model-Based Test Prioritization Bogdan Korel Computer Science Department Illinois Institute of Technology.
Revolutionizing the Field of Grey-box Attack Surface Testing with Evolutionary Fuzzing Department of Computer Science & Engineering College of Engineering.
Empirically Revisiting the Test Independence Assumption Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, David Notkin.
Case Base Maintenance(CBM) Fabiana Prabhakar CSE 435 November 6, 2006.
Architecture-Based Runtime Software Evolution Peyman Oreizy, Nenad Medvidovic & Richard N. Taylor.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Some Sub-Activities within Requirements Engineering 1.Prototyping 2.Requirements Documentation 3.Requirements Validation 4.Requirements Measurements 5.Requirements.
Rapid software development 1. Topics covered Agile methods Extreme programming Rapid application development Software prototyping 2.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Test vs. inspection Part 2 Tor Stålhane. Testing and inspection A short data analysis.
The Scientific Process Learning Target: I can generate and evaluate a question that can be answered through a scientific investigation, as well as plan.
Software Sustainability Institute Software Attribution can we improve the reusability and sustainability of scientific software?
Lecture 5: Writing the Project Documentation Part III.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Software Product Line Material based on slides and chapter by Linda M. Northrop, SEI.
Jaroslaw Kutylowski 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Robust Undetectable Interference Watermarks Ryszard Grząślewicz.
CS5103 Software Engineering Lecture 02 More on Software Process Models.
When Tests Collide: Evaluating and Coping with the Impact of Test Dependence Wing Lam, Sai Zhang, Michael D. Ernst University of Washington.
Generating Software Documentation in Use Case Maps from Filtered Execution Traces Edna Braun, Daniel Amyot, Timothy Lethbridge University of Ottawa, Canada.
Testing and inspecting to ensure high quality An extreme and easily understood kind of failure is an outright crash. However, any violation of requirements.
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
CS223: Software Engineering Lecture 2: Introduction to Software Engineering.
Using common indicators: A tool-building approach Becca Blakewood, US Impact Study.
Detecting Assumptions on Deterministic Implementations of Non-deterministic Specifications August Shi, Alex Gyori, Owolabi Legunsen, Darko Marinov 4/12/2016.
Mutation Testing Laraib Zahid & Mariam Arshad. What is Mutation Testing?  Fault-based Testing: directed towards “typical” faults that could occur in.
A PRELIMINARY EMPIRICAL ASSESSMENT OF SIMILARITY FOR COMBINATORIAL INTERACTION TESTING OF SOFTWARE PRODUCT LINES Stefan Fischer Roberto E. Lopez-Herrejon.
Test Case Purification for Improving Fault Localization presented by Taehoon Kwak SoftWare Testing & Verification Group Jifeng Xuan, Martin Monperrus [FSE’14]
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Ingredients:
Software Testing and Quality Assurance Practical Considerations (1) 1.
Experience Report: System Log Analysis for Anomaly Detection
Modifications to the DRS4’s code
Vivaldi: A Decentralized Network Coordinate System
Ryan Lekivetz JMP Division of SAS Abstract Covering Arrays
Application Level Fault Tolerance and Detection
1z0-320 Exam dumps - Get 1z0-320 PDF With Actual Questions Answers
Rename Local Variable Refactoring Instances
White-Box Testing.
Mutation Testing Meets Approximate Computing
Owolabi Legunsen, Farah Hariri, August Shi,
Alex Groce, Josie Holmes, Darko Marinov, August Shi, Lingming Zhang
August Shi, Tifany Yung, Alex Gyori, and Darko Marinov
Word Embeddings with Limited Memory
White-Box Testing.
Test Case Purification for Improving Fault Localization
Software Testing and Maintenance Maintenance and Evolution Overview
Flu and big data Week 10.2.
The Vision of Self-Aware Performance Models
Chapter 8 Software Evolution.
By Hyunsook Do, Sebastian Elbaum, Gregg Rothermel
MAPO: Mining and Recommending API Usage Patterns
An Analysis of OO Mutation Operators Jingyu Hu, Nan Li, and Jeff Offutt Presented by Nan Li 03/24/2011.
Why do we refactor? Technical Debt Items Versus Size: A Forensic Investigation of What Matters Hello everyone, I’m Ehsan Zabardast I am a PhD candidate.
Mitigating the Effects of Flaky Tests on Mutation Testing
August Shi, Wing Lam, Reed Oei, Tao Xie, Darko Marinov
Presentation transcript:

Balancing Trade-Offs in Test-Suite Reduction August Shi, Alex Gyori, Milos Gligoric, Andrey Zaytsev, and Darko Marinov 11/18/2014 FSE 22 Hong Kong NSF Grant Nos. CNS-0958199, CCF-1012759, CCF-1439957

Testing Can Be Slow Code Under Test test0 test1 test2 test3 … testN-1 EMPH: lots of tests -> slow development!

Speed Up by Removing Tests … testN-1 testN Code Under Test Speed up by removing tests

Test-Suite Reduction Code Under Test test1 test3 … testN In other words, make reduced test suite… Test engineer finds acceptable, just as good as full test suite, smaller Reduced test suite has fewer tests than full test suite but representative of full test suite on this code version

Test-Suite Reduction and Changes … testN Versioni Versioni+1 test1 test3 … testN changes Run same reduced test suite on future versions

Test-Suite Reduction and Changes … testN Versioni Versioni+1 test1 test3 … testN changes EMPH: is reduced test suite still representative?! Is reduced test suite still representative of full test suite on future versions?

How does evolution affect reduced test suites? EMPH: we still cannot judge evolution’s effect!

Test-Suite Reduction T = Tests S = Statements S1 S2 S3 S4 S5 T1 X T2 DO NOT SAY “MAINLY” WORKED ON SINGLE VERSION EMPH: Only single version previously, we do multiple (later)

Test-Suite Reduction Reduced Test Suite R1 = {T3, T5} T = Tests S = Statements S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 Reduced Test Suite R1 = {T3, T5}

Test-Suite Reduction Reduced Test Suite R1 = {T3, T5} T = Tests S = Statements S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 Reduced Test Suite R1 = {T3, T5} Statement Adequate Reduction (SAR)

Test-Suite Reduction T = Tests S = Statements X T2 T3 T4 T5 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100

Test-Suite Reduction T = Tests S = Statements X T2 T3 T4 T5 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎%

Test-Suite Reduction T = Tests S = Statements X T2 T3 T4 T5 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅1 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100

Test-Suite Reduction T = Tests S = Statements X T2 T3 T4 T5 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅1 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100=𝟎%

Test-Suite Reduction T = Tests S = Statements M = Mutants X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅1 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100=𝟎%

Test-Suite Reduction T = Tests S = Statements M = Mutants X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅1 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100=𝟎% 𝑀𝑢𝑡𝐿𝑜𝑠𝑠= |𝑚𝑢𝑡 𝑇 |−|𝑚𝑢𝑡 𝑅1 | |𝑚𝑢𝑡(𝑇)| ∗100

Test-Suite Reduction T = Tests S = Statements M = Mutants X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅1 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100=𝟎% 𝑀𝑢𝑡𝐿𝑜𝑠𝑠= |𝑚𝑢𝑡 𝑇 |−|𝑚𝑢𝑡 𝑅1 | |𝑚𝑢𝑡(𝑇)| ∗100

Test-Suite Reduction T = Tests S = Statements M = Mutants X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅1 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100=𝟎% 𝑀𝑢𝑡𝐿𝑜𝑠𝑠= |𝑚𝑢𝑡 𝑇 |−|𝑚𝑢𝑡 𝑅1 | |𝑚𝑢𝑡(𝑇)| ∗100= 6 −4 6 ∗100 =𝟑𝟑%

Test-Suite Reduction Reduced Test Suite R2 = {T1, T4, T5} T = Tests S = Statements M = Mutants S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X Reduced Test Suite R2 = {T1, T4, T5}

Test-Suite Reduction Reduced Test Suite R2 = {T1, T4, T5} T = Tests S = Statements M = Mutants S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X Reduced Test Suite R2 = {T1, T4, T5} Mutant Adequate Reduction (MAR)

Test-Suite Reduction T = Tests S = Statements M = Mutants X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅2| |𝑇| ∗100= 5 −3 5 ∗100=𝟒𝟎%

Test-Suite Reduction T = Tests S = Statements M = Mutants X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅2| |𝑇| ∗100= 5 −3 5 ∗100=𝟒𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅2 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100

Test-Suite Reduction T = Tests S = Statements M = Mutants X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅2| |𝑇| ∗100= 5 −3 5 ∗100=𝟒𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅2 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100= 5−5 5 =𝟎%

Test-Suite Reduction T = Tests S = Statements M = Mutants X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅2| |𝑇| ∗100= 5 −3 5 ∗100=𝟒𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅2 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100= 5−5 5 =𝟎% 𝑀𝑢𝑡𝐿𝑜𝑠𝑠= |𝑚𝑢𝑡 𝑇 |−|𝑚𝑢𝑡 𝑅2 | |𝑚𝑢𝑡(𝑇)| ∗100=𝟎%

Choosing a Test Suite SizRed StmtLoss MutLoss R1 60% 0% 33% R2 40% T

Choosing a Test Suite SizRed StmtLoss MutLoss R1 60% 0% 33% R2 40% T Single Version Evaluation

How does evolution affect reduced test suites? (Example) EMPH: we still cannot judge evolution’s effect!

How does evolution affect reduced test suites? X T2 T3 T4 T5 Versioni For instance…

How does evolution affect reduced test suites? X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 Changes to code underneath, statement coverage shuffled…

How does evolution affect reduced test suites? X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 All 6 statements covered by full test suite

How does evolution affect reduced test suites? X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 Reduced test suite does not cover all

How does evolution affect reduced test suites? X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 Missing 2 (compared to full test suite) EMPH: there is some loss originally unaware of

How does evolution affect reduced test suites? X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 EMPH: we cannot see there is loss due to evolution with current metrics! How do we measure change in loss?

Evolution-Aware Metrics X T2 T3 T4 T5 Versioni Concerned with performance of reduced test suite from earlier version on later version

Evolution-Aware Metrics X T2 T3 T4 T5 Versioni 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 = 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% Concerned with performance of reduced test suite from earlier version on later version

Evolution-Aware Metrics X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟎% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% Concerned with performance of reduced test suite from earlier version on later version

Evolution-Aware Metrics X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟎% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% Concerned with performance of reduced test suite from earlier version on later version 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖+1 = 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖+1 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 | ∗100

Evolution-Aware Metrics X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟎% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% Full test suite covers 6 statements on new 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖+1 = 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖+1 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 | ∗100

Evolution-Aware Metrics X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟎% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% Reduced test suite covers 4 statements 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖+1 = 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖+1 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 | ∗100

Evolution-Aware Metrics X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟎% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖+1 = 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖+1 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 | ∗100= 5−3 5 =𝟒𝟎%

Evolution-Aware Metrics X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟎% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖+1 =𝟒𝟎% 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖+1 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 | ∗100= 5−3 5 =𝟒𝟎% 𝑹𝑬𝑪 𝒊 𝒊+𝟏 = 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖+1 − 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖

Evolution-Aware Metrics X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟎% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖+1 =𝟒𝟎% 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖+1 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 | ∗100= 5−3 5 =𝟒𝟎% 𝑹𝑬𝑪 𝒊 𝒊+𝟏 = 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖+1 − 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =40% −0%=𝟒𝟎𝒑𝒑

Evolution-Aware Metrics X T2 T3 T4 T5 M1’ M2’ M3’ M4’ M5’ M6’ T1 X T2 T3 T4 T5 Versioni Versioni+1

Evolution-Aware Metrics X T2 T3 T4 T5 M1’ M2’ M3’ M4’ M5’ M6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑀𝑢𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟑𝟑% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎%

Evolution-Aware Metrics X T2 T3 T4 T5 M1’ M2’ M3’ M4’ M5’ M6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑀𝑢𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟑𝟑% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% 𝑀𝑢𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖+1 = 𝑚𝑢𝑡 𝑖+1 𝑇 𝑖 −| 𝑚𝑢𝑡 𝑖+1 𝑅 𝑖 | | 𝑚𝑢𝑡 𝑖+1 𝑇 𝑖 | ∗100= 6−3 6 =𝟓𝟎%

Evolution-Aware Metrics X T2 T3 T4 T5 M1’ M2’ M3’ M4’ M5’ M6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑀𝑢𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟑𝟑% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% 𝑀𝑢𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖+1 =𝟓𝟎% 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖+1 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖+1 𝑇 𝑖 | ∗100= 5−3 5 =𝟒𝟎% 𝑹𝑬𝑪 𝒊 𝒊+𝟏 = 𝑀𝑢𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖+1 − 𝑀𝑢𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =50% −33%=𝟏𝟕𝒑𝒑

Evolution-Aware Metrics Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Can collect requirements for multiple versions

Evolution-Aware Metrics Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Can collect requirements for multiple versions 𝑹𝑬𝑪 𝒅 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖

Evolution-Aware Metrics Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Compute REC pairwise Compute REC1 𝑹𝑬𝑪 𝟏 = 𝟒𝟎𝒑𝒑

Evolution-Aware Metrics Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Even further away… Compute REC2 𝑹𝑬𝑪 𝟏 = 𝟒𝟎𝒑𝒑 𝑹𝑬𝑪 𝟐 = 𝟐𝟓𝒑𝒑

Evolution-Aware Metrics Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Can start reduction point on a later point 𝑹𝑬𝑪 𝟏 = 𝟒𝟎𝒑𝒑 𝑹𝑬𝑪 𝟐 = 𝟐𝟓𝒑𝒑

Evolution-Aware Metrics Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Get more data for certain distances 𝑹𝑬𝑪 𝟏 = 𝟒𝟎𝒑𝒑, 𝟐𝟓𝒑𝒑 𝑹𝑬𝑪 𝟐 = 𝟐𝟓𝒑𝒑 Compute REC1

How does evolution affect reduced test suites? (Evaluation) We are equipped to answer… We want to see how real projects’ reduced test suites behave

Evaluation: Implementation Use PIT to collect statement coverage and mutants killed by tests on each version Use Greedy heuristic to perform test-suite reduction Use Statement Adequate Reduction and Mutant Adequate Reduction

Evaluation: Projects Let’s see how evolution affects reduced test suites on some projects 18 open-source projects from GitHub

Evaluation: Projects Variety of applications

Evaluation: Projects Distance between versions = 30 Git commits NOTE: Not all 10 versions, some projects not as mature

Evaluation: Projects

Evaluation: Projects Median number of tests across versions NOTE: Might not be all, limitations of the tool

Evaluation: Projects

Evaluation: Projects

Statement Adequate Reduction (SAR) 𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖 Explain x and y axis Explain colors Aggregation of all projects

Statement Adequate Reduction (SAR) 𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖 EMPH: quality does not drop much! Median REC for SAR is around 0, does not drop much

Mutant Adequate Reduction (MAR) 𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖

Mutant Adequate Reduction (MAR) 𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖 Median REC for MAR is around 0, does not drop much

How does evolution affect reduced test suites? (Answer) EMPH: we can now answer question!

The quality of the reduced test suite relative to the quality of the full test suite remains about the same during evolution Answer (based on our evaluation over 18 projects)

We further evaluated… Different test-suite reduction algorithms Four other algorithms in addition to Greedy Inadequate test-suite reduction Reduction that does not preserve all statements covered or mutants killed …you can find more details in our paper

Threats to Validity Reduced test suites evaluated with same metrics (statement coverage and killed mutants) used for reduction Tests tracked by name 30 commits between versions

Related Work Improving test-suite reduction Hao et al. [ICSE 2012] Lin and Huang [IST 2009] Yoo and Harman [ISSTA 2007] Jeffrey and Gupta [TSE 2007] Black et al. [ICSE 2004] Offutt et al. [ICTCS 1995] Effect of software evolution on coverage Elbaum et al. [ICSM 2001]

Conclusions Q: How does software evolution affect test-suite reduction?

Conclusions Q: How does software evolution affect test-suite reduction? Introduced new evolution-aware metrics (REC) Performed the largest evaluation of test-suite reduction Different metrics/algorithms, inadequate test-suite reduction

Conclusions Q: How does software evolution affect test-suite reduction? Introduced new evolution-aware metrics (REC) Performed the largest evaluation of test-suite reduction Different metrics/algorithms, inadequate test-suite reduction A: Reduced test suites do not reduce in quality relative to the full test suite over time If reduced test suite is acceptable for this version, then it is likely acceptable in future versions EMPH: the answer!

Conclusions Q: How does software evolution affect test-suite reduction? Introduced new evolution-aware metrics (REC) Performed the largest evaluation of test-suite reduction Different metrics/algorithms, inadequate test-suite reduction A: Reduced test suites do not reduce in quality relative to the full test suite over time If reduced test suite is acceptable for this version, then it is likely acceptable in future versions August Shi: awshi2@illinois.edu http://mir.cs.illinois.edu/evolred/

BACKUP

Project Filters Uses Maven build system Has > 100 commits in history HEAD at time of experiments could be successfully run through PIT Has > 4 versions working through PIT in history

Evaluation: Basic Test-Suite Reduction

Different Algorithms? Versioni S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 Different algorithms can behave differently under evolution

Different Algorithms? Versioni S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 Different algorithms can behave differently under evolution

Different Algorithms? Versioni Versionj S1 S2 S3 S4 S5 T1 X T2 T3 T4 Different algorithms can behave differently under evolution

Different Algorithms? Greedy HGS GRE ILP GE Versioni Versionj S1 S2 S3 X T2 T3 T4 T5 S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 Versioni Versionj Different algorithms can behave differently under evolution Greedy HGS GRE ILP GE

Evaluation: Other Algorithms Other 4 algorithms produce very similar results compared to Greedy Difference in size reduction at most 5.26pp across all algorithms Difference in MutLoss for SAR algorithms at most 7.15pp Difference in StmtLoss for MAR algorithms at most 4.15pp Difference in RECd for any distance d at most 0.33pp for MutLoss, 0.67pp for StmtLoss Cut absolute stuff for MutLoss and StmtLoss?

Inadequate Reduction? Versioni S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 Inadequate reduction if budget is very tight How does inadequately reduced test suite compare over time?

Inadequate Reduction? Versioni S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 Inadequate reduction if budget is very tight How does inadequately reduced test suite compare over time?

Inadequate Reduction? Versioni Versionj S1 S2 S3 S4 S5 T1 X T2 T3 T4 Inadequate reduction if budget is very tight How does inadequately reduced test suite compare over time?

Evaluation: Inadequate Reduction Statement Inadequate Reduction (SIR) Mutant Inadequate Reduction (MIR)

Evaluation: SIR Evolution 𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖 Results for Greedy Explain x and y axis Explain colors Trends are stable Same for all requirements used in reduction This is for Greedy, similar trends for other algorithms

Evaluation: MIR Evolution 𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖 Results for Greedy

Code Change