August Shi, Tifany Yung, Alex Gyori, and Darko Marinov

Slides:

Advertisements

Similar presentations

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

Advertisements

CS527: Advanced Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 18, 2008.

1 On the Limitations of Finite State Models as Sources of Tests for Access Control and Authentication Aditya Mathur Professor of Computer Science Purdue.

Test Case Filtering and Prioritization Based on Coverage of Combinations of Program Elements Wes Masri and Marwa El-Ghali American Univ. of Beirut ECE.

Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.

1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.

(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 1 Software Test and Analysis in a Nutshell.

AMOST Experimental Comparison of Code-Based and Model-Based Test Prioritization Bogdan Korel Computer Science Department Illinois Institute of Technology.

Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.

Regression Testing. 2  So far  Unit testing  System testing  Test coverage  All of these are about the first round of testing  Testing is performed.

Chapter 14 Part II: Architectural Adaptation BY: AARON MCKAY.

CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 16, 2010.

When Tests Collide: Evaluating and Coping with the Impact of Test Dependence Wing Lam, Sai Zhang, Michael D. Ernst University of Washington.

Mutation Testing Breaking the application to test it.

Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.

Mutation Testing Laraib Zahid & Mariam Arshad. What is Mutation Testing?  Fault-based Testing: directed towards “typical” faults that could occur in.

Test Loads Andy Wang CIS Computer Systems Performance Analysis.

Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.

Software Testing and Quality Assurance Practical Considerations (1) 1.

Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.

Analysis Manager Training Module

Abstract In this paper, the k-coverage problem is formulated as a decision problem, whose goal is to determine whether every point in the service area.

Regression Testing with its types

White-Box Testing Techniques IV

White-Box Testing Techniques IV

CluTim Algorithm for Drift Chambers Readout Electronics

A paper on Join Synopses for Approximate Query Answering

Andy Wang CIS 5930 Computer Systems Performance Analysis

Analyzing the Validity of Selective Mutation with Dominator Mutants

IM-pack: Software Installation Using Disk Images

Learning Software Behavior for Automated Diagnosis

Swapping Segmented paging allows us to have non-contiguous allocations

MultiRefactor: Automated Refactoring To Improve Software Quality

Aditya P. Mathur Purdue University

Regression Testing.

Mutation Testing Meets Approximate Computing

Owolabi Legunsen, Farah Hariri, August Shi,

Finding Heuristics Using Abstraction

Balancing Trade-Offs in Test-Suite Reduction

Objective 4.2 Explain how a case study could be used to investigate a problem in an organization or group.

Alex Groce, Josie Holmes, Darko Marinov, August Shi, Lingming Zhang

A Few Review Questions Dan Fleck Fall 2009.

Test Case Purification for Improving Fault Localization

Fundamental Test Process

Indexing and Hashing Basic Concepts Ordered Indices

Andy Wang Operating Systems COP 4610 / CGS 5765

Qi Li,Qing Wang,Ye Yang and Mingshu Li

How to improve (decrease) CPI

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

IPOG: A General Strategy for T-Way Software Testing

Or are they really guidelines?

Chapter 12 Power Analysis.

CSE403 Software Engineering Autumn 2000 More Testing

Regression Testing.

ECE 352 Digital System Fundamentals

A Few Review Questions Dan Fleck Spring 2009.

The SMART Way to Migrate Replicated Stateful Services

By Hyunsook Do, Sebastian Elbaum, Gregg Rothermel

Recommending Adaptive Changes for Framework Evolution

Sarah Diesburg Operating Systems CS 3430

CMSC 202 Exceptions.

Qi Li,Qing Wang,Ye Yang and Mingshu Li

Chapter 1 The Science of Biology

Why do we refactor? Technical Debt Items Versus Size: A Forensic Investigation of What Matters Hello everyone, I’m Ehsan Zabardast I am a PhD candidate.

Sarah Diesburg Operating Systems COP 4610

Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.

Mitigating the Effects of Flaky Tests on Mutation Testing

Mutation Testing Faults are introduced into the program by creating many versions of the program called mutants. Each mutant contains a single fault. Test.

August Shi, Wing Lam, Reed Oei, Tao Xie, Darko Marinov

Presentation transcript:

August Shi, Tifany Yung, Alex Gyori, and Darko Marinov Comparing and Combining Test-Suite Reduction and Regression Test Selection August Shi, Tifany Yung, Alex Gyori, and Darko Marinov FSE 2015 Bergamo, Italy 09/02/2015 NSF Grant Nos. CCF-1012759, CCF-1421503, CCF-1434590, CCF-1439957

Testing is Important but Slow … testN-1 testN Code Under Test V0 Testing is an important part of software development, but running tests is slot. For a given code under test, developers have to run a test suite with a large number of tests and the process of running all these tests takes a long time.

Regression Testing is Slow(er) … testN-1 testN Code Under Test V0 test0 test1 test2 test3 … testN-1 testN Code Under Test V1 test0 test1 test2 test3 … testN-1 testN Code Under Test V2 Unfortunately, the situation gets even worse in the context of regression testing. In regression testing, after every change, a developer has to run this large test suite to ensure the changes he/she made did not break any existing functionality.

Speeding up Regression Testing Test-Suite Reduction Regression Test Selection Test-Suite Parallelization Refactoring Tests Many More

Speeding up Regression Testing Test-Suite Reduction Regression Test Selection Test-Suite Parallelization Refactoring Tests Many More In this work, we study the first two approaches, test-suite reduction and regression test selection, with the goal of seeing which one is better.

Test-Suite Reduction (TSR) … testN-1 testN Code Under Test V0 test0 test1 test2 test3 … testN-1 testN Code Under Test V1 test0 test1 test2 test3 … testN-1 testN Code Under Test V2 Analysis only on the first revision, no need to do anymore later

Regression Test Selection (RTS) Δ test0 test1 test2 test3 … testN-1 testN Code Under Test V0 test0 test1 test2 test3 … testN-1 testN Code Under Test V1 Another approach is RTS… Select tests that are dependent on the change Do not select tests that are not dependent on the change

Regression Test Selection (RTS) … testN-1 testN Code Under Test V0 test0 test1 test2 test3 … testN-1 testN Code Under Test V1 test0 test1 test2 test3 … testN-1 testN Code Under Test V2 Emphasize this is analysis is all based on changes, so analysis run at every revision

TSR versus RTS (Known Qualitative Comparison) Test-Suite Reduction Regression Test Selection How are tests chosen to run? Redundancy (one revision) Changes (two revisions) How often is analysis performed? Infrequently Every revision We can compare the two approaches qualitatively Can it miss failing tests from the original test suite? Yes No (if safe)

How do TSR and RTS compare quantitatively How do TSR and RTS compare quantitatively? How can TSR and RTS be combined? We said how they compare qualitatively, we care about quantitatively (measurements, numbers)

How do TSR and RTS compare quantitatively How do TSR and RTS compare quantitatively? How can TSR and RTS be combined? Furthermore, from qualitative comparison they are orthogonal, so how to combine them?

TSR Background T = Tests S = Statements S1 S2 S3 S4 S5 T1 X T2 T3 T4 Quality metrics always on ONE REVISION Emphasize traditional (using different requirements to do evaluation)!!!

TSR Background Reduced Test Suite R = {T3,T5} T = Tests S = Statements X T2 T3 T4 T5 Quality metrics always on ONE REVISION Emphasize traditional (using different requirements to do evaluation)!!! Reduced Test Suite R = {T3,T5}

TSR Background R = {T3,T5} T = Tests S = Statements S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 Researchers want to see how good TSR technique is and to compare different TSR techniques R = {T3,T5} Size

TSR Background R = {T3,T5} T = Tests S = Statements S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 R = {T3,T5} Size 𝑆𝑖𝑧= |𝑅| |𝑂| =40%

TSR Background R = {T3,T5} T = Tests S = Statements S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 Emphasize traditional (using different requirements to do evaluation)!!! R = {T3,T5} Size Fault-Detection Capability 𝑆𝑖𝑧= |𝑅| |𝑂| =40%

TSR Background R = {T3,T5} T = Tests S = Statements M = Mutants S1 S2 X T2 T3 T4 T5 M1 M2 M3 M4 X Emphasize traditional (using different requirements to do evaluation)!!! R = {T3,T5} Size Fault-Detection Capability 𝑆𝑖𝑧= |𝑅| |𝑂| =40% 𝑅𝑒𝑞𝐿𝑜𝑠𝑠= |𝑟𝑒𝑞 𝑂 \ 𝑟𝑒𝑞 𝑅 | |𝑟𝑒𝑞(𝑂)| =25% 𝑟𝑒𝑞 ∈{𝑠𝑡𝑚𝑡, 𝑚𝑢𝑡𝑎𝑛𝑡}

RTS Background Δ Vi-1 Vi T = Tests S = Statements S1 S2 S3 S4 S5 T1 X S2 changed S6 added S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 Vi-1 Vi

RTS Background Δ Selected Tests Si,Δ = {T1,T2,T3} Vi-1 Vi T = Tests S = Statements Δ S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S2 changed S6 added S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 Vi-1 Vi Selected Tests Si,Δ = {T1,T2,T3}

RTS Background Δ Si,Δ = {T1,T2,T3} Vi-1 Vi T = Tests S = Statements S1 X T2 T3 T4 T5 S2 changed S6 added S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 Vi-1 Vi Si,Δ = {T1,T2,T3} Size 𝑆𝑖𝑧= | 𝑆 𝑖,∆ | |𝑂| =60%

RTS Background Δ Si,Δ = {T1,T2,T3} Vi-1 Vi T = Tests S = Statements S1 X T2 T3 T4 T5 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S2 changed S6 added S2 changed S6 added S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 Safe RTS -> does not select tests whose behavior does not change Vi-1 Vi Si,Δ = {T1,T2,T3} Size Fault-Detection Capability 𝑆𝑖𝑧= | 𝑆 𝑖,∆ | |𝑂| =60% Safe RTS does not fail to detect change-related faults

How can TSR and RTS be combined? Furthermore, from qualitative comparison they are orthogonal, so how to combine them?

Applying RTS after TSR Vi-1 T = Tests S = Statements S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 After diving into the details of TSR and RTS, we can start to see a way to combine the two approaches… Vi-1

Applying RTS after TSR R = {T3,T5} Vi-1 T = Tests S = Statements S1 S2 X T2 T3 T4 T5 Vi-1 R = {T3,T5}

Applying RTS after TSR Δ Ri = {T3,T5} Vi Vi-1 T = Tests S = Statements X T2 T3 T4 T5 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S2 changed S6 added S2 changed S6 added S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 Vi-1 Vi Ri = {T3,T5}

Applying RTS after TSR Δ Ri = {T3,T5} Vi-1 Vi T = Tests S = Statements X T2 T3 T4 T5 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S2 changed S6 added S2 changed S6 added S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 Vi-1 Vi Ri = {T3,T5}

Applying RTS after TSR Selection of Reduction (SeRe) Δ 𝑆𝑅 𝑖,∆ = {T3} T = Tests S = Statements Δ S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S2 changed S6 added S2 changed S6 added S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 Vi-1 𝑆𝑅 𝑖,∆ = {T3} Vi Size Fault-Detection Capability 𝑆𝑖𝑧= | 𝑆𝑅 𝑖,∆ | |𝑂| =20% If RTS is safe, then as good as reduced test-suite Selection of Reduction (SeRe)

Metrics to compare between approaches Size Decrease: TSR: |𝑅| |𝑂| RTS: | 𝑆 𝑖,∆ | |𝑂| SeRe: | 𝑆𝑅 𝑖,∆ | |𝑂| Fault-Detection Capability Decrease Currently, NO metric for fault-detection capability between approaches We need a metric that takes CHANGE into account How can we compare all these different approaches? They all already have some way to compare techniques of the same approach… We already have Size, which is pretty straightforward But we want fault-detection capability, but also it needs to be change-related (more on that in a bit)

Map Tests to Faults T = Tests F = Faults F1 F2 F3 F4 F5 T1 X T2 T3 T4 T5 F1 F2 F3 F4 F5 F6 T1 X T2 T3 T4 T5 This is idealized evaluation, we somehow have this mapping from failing tests to faults they detect Other faults existed before and developer consciously ignored them These change-related faults are important because they are faults that can only be detected after the developer’s changes Vi-1 Vi Need criteria that includes these change-related faults

Detect all Faults? Vi-1 Vi T = Tests F = Faults F1 F2 F3 F4 F5 T1 X T2 One could potentially demand to detect all faults… But they are not change-related! We should not care about faults that could be detected before! QUESTION: is this the mindset developers should be in? Vi-1 Vi

Which is Better? Vi-1 Vi T = Tests F = Faults F1 F2 F3 F4 F5 T1 X T2 One could potentially demand to detect all faults… But they are not change-related! We should not care about faults that could be detected before! QUESTION: is this the mindset developers should be in? Vi-1 Vi Detects 5 faults Detects 1 change-related fault

Which is Better? Vi-1 Vi T = Tests F = Faults F1 F2 F3 F4 F5 T1 X T2 One could potentially demand to detect all faults… But they are not change-related! We should not care about faults that could be detected before! QUESTION: is this the mindset developers should be in? Vi-1 Vi Detects 5 faults Detects 1 change-related fault Detects 4 faults Detects 2 change-related faults If criteria is to detect all faults, can get misleading comparisons with respect to these change-related faults!

Finding Change-Related Faults T = Tests F = Faults F1 F2 F3 F4 F5 T1 X T2 T3 T4 T5 F1 F2 F3 F4 F5 F6 T1 X T2 T3 T4 T5 Vi-1 Vi Safe RTS will not fail to select tests whose behavior differs after the change

Finding Change-Related Faults T = Tests F = Faults F1 F2 F3 F4 F5 T1 X T2 T3 T4 T5 F1 F2 F3 F4 F5 F6 T1 X T2 T3 T4 T5 Vi-1 Vi Si,Δ = {T1,T2,T3}

Finding Change-Related Faults T = Tests F = Faults F1 F2 F3 F4 F5 T1 X T2 T3 T4 T5 F1 F2 F3 F4 F5 F6 T1 X T2 T3 T4 T5 Vi-1 Vi Si,Δ = {T1,T2,T3} Faults(Si,Δ) = {F1,F2,F3,F4,F6}

Finding Change-Related Faults Faults detected by non-selected tests cannot be change-related! ChangeRelatedFaultsi,Δ = Faults(Si,Δ) \ Faults(Oi \ Si,Δ) Faults(Si,Δ) \ Faults(Oi \ Si,Δ) = Faults({T1,T2,T3}) \ Faults({T4,T5}) Faults(S1,Δ) \ Faults(O1 \ S1,Δ= {F1,F2,F3,F4,F6} \ {F1,F3,F5} = {F2,F4,F6} F1 F2 F3 F4 F5 T1 X T2 T3 T4 T5 F1 F2 F3 F4 F5 F6 T1 X T2 T3 T4 T5 Vi-1 Vi

Change-Related Requirements (CRR) Use testing requirements (statements covered or mutants killed) to approximate fault-detection capability of test suite T chosen from Oi 𝐶𝑅𝑅 𝑆 𝑖,∆ 𝑇 =𝑟𝑒𝑞 𝑆 𝑖,∆ ∩𝑇 \ 𝑟𝑒𝑞(𝑇\ 𝑆 𝑖,∆ ) Evaluate loss in change-related fault-detection capability of reduced test suite Since we don’t have idealized version with faults, we use testing requirements, liked used for TSR evaluation 𝐶𝑅𝑅𝐿𝑜𝑠𝑠 𝑖,∆ =100× | 𝐶𝑅𝑅 𝑆 𝑖,∆ 𝑂 𝑖 \ 𝐶𝑅𝑅 𝑆 𝑖,∆ 𝑅 𝑖 | | 𝐶𝑅𝑅 𝑆 𝑖,∆ 𝑂 𝑖 |

Evaluation Setup

Projects LOC range from 5652 to 110937 Tests range from 62 to 5281

Experimental Setup Use Greedy heuristic to perform TSR Remove redundant tests with respect to statement coverage Statement coverage/mutants killed collected using PIT http://pitest.org Use Ekstazi to perform (safe) RTS Select tests based on file-level dependencies Tests selected at test class level http://www.ekstazi.org Simulate evolving reduced test suite and selection of reduction

Evolving Reduced Test Suite Code Under Test V0 test0 test1 test2 test3 Code Under Test V0 Reduced test suite does not necessarily stay static… Ideally we would talk with developers and see how they evolve their reduced test suite, but we can’t so we instead simulate

Evolving Reduced Test Suite Code Under Test V0 test0 test1 test2 test3 Code Under Test V0 𝐸 0,0 = 𝑅 0

Evolving Reduced Test Suite Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 test0 test1 test2 test3 Code Under Test V0 𝐸 0,0 = 𝑅 0

Evolving Reduced Test Suite Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 test0 test1 test2 test3 Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 𝐸 0,0 = 𝑅 0

Evolving Reduced Test Suite Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 test0 test1 test2 test3 Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 𝐸 0,0 = 𝑅 0 𝐸 0,1 = 𝐸 0,0 ∪( 𝑂 1 \ 𝑂 0 )

Evolving Reduced Test Suite Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 test0 test1 test2 test3 test4 Code Under Test V2 test0 test1 test2 test3 Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 𝐸 0,0 = 𝑅 0 𝐸 0,1 = 𝐸 0,0 ∪( 𝑂 1 \ 𝑂 0 )

Evolving Reduced Test Suite Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 test0 test1 test2 test3 test4 Code Under Test V2 test0 test1 test2 test3 Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 test0 test1 test2 test3 test4 Code Under Test V2 𝐸 0,0 = 𝑅 0 𝐸 0,1 = 𝐸 0,0 ∪( 𝑂 1 \ 𝑂 0 )

Evolving Reduced Test Suite Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 test0 test1 test2 test3 test4 Code Under Test V2 test0 test1 test2 test3 Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 test0 test1 test2 test3 test4 Code Under Test V2 𝐸 0,0 = 𝑅 0 𝐸 0,1 = 𝐸 0,0 ∪( 𝑂 1 \ 𝑂 0 ) 𝐸 0,2 = 𝐸 0,1 ∩ 𝑂 2

Evolving Reduced Test Suite Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 test0 test1 test2 test3 test4 Code Under Test V2 test0 test1 test2 test3 Code Under Test V0 test0 test1 test2 test3 test4 Code Under Test V1 test0 test1 test2 test3 test4 Code Under Test V2 After reducing the test suite at a revision r, evolve the reduced test suite to subsequent revision i Can use this evolved reduced test suite everywhere else we used reduced test suite 𝐸 𝑟,𝑖 =( 𝑅 𝑟 ∩ 𝑂 𝑖 )∪( 𝑂 𝑖 \ 𝑂 𝑟 )

Selection of Reduction Can see result of selection of reduction by looking at tests chosen by TSR and RTS Given 𝐸 𝑟,𝑖 and 𝑆 𝑖,∆ , can intersect the two to see what tests from 𝐸 𝑟,𝑖 are selected due to changes 𝑆𝐸 𝑖,∆ = 𝐸 𝑟,𝑖 ∩ 𝑆 𝑖,∆

Size Comparison

Evaluation: Size Comparisons | 𝐸 𝑟,𝑖 | | 𝑂 𝑖 | ×100 | 𝑆 𝑖,∆ | | 𝑂 𝑖 | ×100 | 𝑆𝐸 𝑖,∆ | | 𝑂 𝑖 | ×100 Apache Commons-Lang

Evaluation: Size Comparisons | 𝐸 𝑟,𝑖 | | 𝑂 𝑖 | ×100 | 𝑆 𝑖,∆ | | 𝑂 𝑖 | ×100 | 𝑆𝐸 𝑖,∆ | | 𝑂 𝑖 | ×100 LA4J

Evaluation: Size Comparison (Aggregated) Apache commons-lang LA4J P7 = SQL-Parser P15 = LA4J | 𝐸 𝑟,𝑖 | | 𝑂 𝑖 | ×100 | 𝑆 𝑖,∆ | | 𝑂 𝑖 | ×100 | 𝑆𝐸 𝑖,∆ | | 𝑂 𝑖 | ×100 RTS runs fewer tests than TSR (difference in median of 40.15pp) SeRe runs even fewer tests (difference in median of 5.34pp)

Change-Related Fault-Detection Capability Comparison

Evaluation: Fault-Detection Capability Comparison Highest median = 5.93% (JOPT-Simple) 𝐶𝑅𝑅𝐿𝑜𝑠𝑠 𝑖,∆ =100× | 𝐶𝑅𝑅 𝑆 𝑖,∆ 𝑂 𝑖 \ 𝐶𝑅𝑅 𝑆 𝑖,∆ 𝐸 𝑟,𝑖 | | 𝐶𝑅𝑅 𝑆 𝑖,∆ 𝑂 𝑖 | 𝐶𝑅𝑅 𝑆 𝑖,∆ 𝑇 =𝑚𝑢𝑡 𝑆 𝑖,∆ ∩𝑇 \ 𝑚𝑢𝑡(𝑇\ 𝑆 𝑖,∆ ) TSR has small loss in change-related fault-detection capability (greatest median loss 5.93%) RTS has no loss SeRe has same loss as TSR

Discussion CRR is not an optimal way of measuring change- related fault-detection capability But better than only looking at changed portions of code Future work in finding better criteria

Conclusions Regression testing is slow, but there are approaches to speed it up Test-suite reduction (TSR) and regression test selection (RTS) are such approaches, and we compare them quantitatively RTS performs better than TSR Runs fewer tests (40.15pp), no loss in change-related fault-detection capability Selection of Reduction (SeRe) runs even fewer tests (5.34pp) with small loss in change related- fault-detection capability (5.93%) awshi2@illinois.edu

BACKUP

Threats to Validity Results for projects used for evaluation may not generalize for all projects RTS tracks dependencies at file level and selects at test class level, TSR tracks dependencies at statement level and reduces at test method level RTS selects at coarser granularity level, yet our findings show that it selects fewer tests on average than TSR CRR relies on RTS to be safe and precise Although RTS tool is safe, it is imprecise, meaning possibly more requirements are considered change- related than actually should be

Evaluation: SeRe Selection Ratio Median difference 0.72pp Surprising in that they are so similar, and not much smaller or much larger Much smaller = there is much redundancy in the tests that are selected and reduction helps get rid of that Much larger = reduction tends to choose the large tests, so those are more likely to be affected by change and always being selected | 𝑆 𝑖,∆ | | 𝑂 𝑖 | ×100 | 𝑆𝑅 𝑖,∆ | | 𝐸 𝑟,𝑖 | ×100 Ratios are very similar (mean ratio difference only 0.72pp) Reduced test suite representative of original test suite

LA4J Re-Reduction

Evaluation: Size Comparisons Apache Commons-Lang Joda-Time | 𝐸 𝑟,𝑖 | | 𝑂 𝑖 | ×100 | 𝑆 𝑖,∆ | | 𝑂 𝑖 | ×100

Evaluation: Size Comparisons LA4j (Reduced Early) LA4J (Reduced Late) | 𝐸 𝑟,𝑖 | | 𝑂 𝑖 | ×100 | 𝑆 𝑖,∆ | | 𝑂 𝑖 | ×100

Evaluation: Size Comparisons LA4j (Reduced Early) LA4J (Reduced Late) | 𝐸 𝑟,𝑖 | | 𝑂 𝑖 | ×100 | 𝑆 𝑖,∆ | | 𝑂 𝑖 | ×100