Balancing Trade-Offs in Test-Suite Reduction

Balancing Trade-Offs in Test-Suite Reduction
August Shi, Alex Gyori, Milos Gligoric, Andrey Zaytsev, and Darko Marinov 11/18/2014 FSE 22 Hong Kong NSF Grant Nos. CNS , CCF , CCF

Testing Can Be Slow Code Under Test test0 test1 test2 test3 … testN-1
EMPH: lots of tests -> slow development!

Speed Up by Removing Tests
… testN-1 testN Code Under Test Speed up by removing tests

Test-Suite Reduction Code Under Test test1 test3 … testN
In other words, make reduced test suite… Test engineer finds acceptable, just as good as full test suite, smaller Reduced test suite has fewer tests than full test suite but representative of full test suite on this code version

Test-Suite Reduction and Changes
… testN Versioni Versioni+1 test1 test3 … testN changes Run same reduced test suite on future versions

Test-Suite Reduction and Changes
… testN Versioni Versioni+1 test1 test3 … testN changes EMPH: is reduced test suite still representative?! Is reduced test suite still representative of full test suite on future versions?

How does evolution affect reduced test suites?
EMPH: we still cannot judge evolution’s effect!

Test-Suite Reduction T = Tests S = Statements S1 S2 S3 S4 S5 T1 X T2
DO NOT SAY “MAINLY” WORKED ON SINGLE VERSION EMPH: Only single version previously, we do multiple (later)

Test-Suite Reduction Reduced Test Suite R1 = {T3, T5} T = Tests
S = Statements S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 Reduced Test Suite R1 = {T3, T5}

Test-Suite Reduction Reduced Test Suite R1 = {T3, T5}
T = Tests S = Statements S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 Reduced Test Suite R1 = {T3, T5} Statement Adequate Reduction (SAR)

Test-Suite Reduction T = Tests S = Statements
X T2 T3 T4 T5 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100

X T2 T3 T4 T5 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎%

X T2 T3 T4 T5 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅1 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100

X T2 T3 T4 T5 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅1 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100=𝟎%

Test-Suite Reduction T = Tests S = Statements M = Mutants
X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅1 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100=𝟎%

X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅1 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100=𝟎% 𝑀𝑢𝑡𝐿𝑜𝑠𝑠= |𝑚𝑢𝑡 𝑇 |−|𝑚𝑢𝑡 𝑅1 | |𝑚𝑢𝑡(𝑇)| ∗100

X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅1| |𝑇| ∗100= 5 −2 5 ∗100=𝟔𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅1 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100=𝟎% 𝑀𝑢𝑡𝐿𝑜𝑠𝑠= |𝑚𝑢𝑡 𝑇 |−|𝑚𝑢𝑡 𝑅1 | |𝑚𝑢𝑡(𝑇)| ∗100= 6 −4 6 ∗100 =𝟑𝟑%

Test-Suite Reduction Reduced Test Suite R2 = {T1, T4, T5} T = Tests
S = Statements M = Mutants S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X Reduced Test Suite R2 = {T1, T4, T5}

Test-Suite Reduction Reduced Test Suite R2 = {T1, T4, T5}
T = Tests S = Statements M = Mutants S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X Reduced Test Suite R2 = {T1, T4, T5} Mutant Adequate Reduction (MAR)

X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅2| |𝑇| ∗100= 5 −3 5 ∗100=𝟒𝟎%

X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅2| |𝑇| ∗100= 5 −3 5 ∗100=𝟒𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅2 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100

X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅2| |𝑇| ∗100= 5 −3 5 ∗100=𝟒𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅2 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100= 5−5 5 =𝟎%

X T2 T3 T4 T5 M1 M2 M3 M4 M5 M6 X 𝑆𝑖𝑧𝑅𝑒𝑑= |𝑇|−|𝑅2| |𝑇| ∗100= 5 −3 5 ∗100=𝟒𝟎% 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠= |𝑠𝑡𝑚𝑡 𝑇 |−|𝑠𝑡𝑚𝑡 𝑅2 | |𝑠𝑡𝑚𝑡(𝑇)| ∗100= 5−5 5 =𝟎% 𝑀𝑢𝑡𝐿𝑜𝑠𝑠= |𝑚𝑢𝑡 𝑇 |−|𝑚𝑢𝑡 𝑅2 | |𝑚𝑢𝑡(𝑇)| ∗100=𝟎%

Choosing a Test Suite SizRed StmtLoss MutLoss R1 60% 0% 33% R2 40% T

Choosing a Test Suite SizRed StmtLoss MutLoss R1 60% 0% 33% R2 40% T
Single Version Evaluation

(Example) EMPH: we still cannot judge evolution’s effect!

X T2 T3 T4 T5 Versioni For instance…

X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 Changes to code underneath, statement coverage shuffled…

X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 All 6 statements covered by full test suite

X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 Reduced test suite does not cover all

X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 Missing 2 (compared to full test suite) EMPH: there is some loss originally unaware of

X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 EMPH: we cannot see there is loss due to evolution with current metrics! How do we measure change in loss?

Evolution-Aware Metrics
X T2 T3 T4 T5 Versioni Concerned with performance of reduced test suite from earlier version on later version

X T2 T3 T4 T5 Versioni 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 = 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% Concerned with performance of reduced test suite from earlier version on later version

X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑆𝑡𝑚𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟎% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎% Concerned with performance of reduced test suite from earlier version on later version

X T2 T3 T4 T5 M1’ M2’ M3’ M4’ M5’ M6’ T1 X T2 T3 T4 T5 Versioni Versioni+1

X T2 T3 T4 T5 M1’ M2’ M3’ M4’ M5’ M6’ T1 X T2 T3 T4 T5 Versioni Versioni+1 𝑀𝑢𝑡𝐿𝑜𝑠𝑠 𝑖 𝑖 =𝟑𝟑% 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 −| 𝑠𝑡𝑚𝑡 𝑖 𝑅 𝑖 | | 𝑠𝑡𝑚𝑡 𝑖 𝑇 𝑖 | ∗100=𝟎%

Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Can collect requirements for multiple versions

Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Can collect requirements for multiple versions 𝑹𝑬𝑪 𝒅 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖

Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Compute REC pairwise Compute REC1 𝑹𝑬𝑪 𝟏 = 𝟒𝟎𝒑𝒑

Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Even further away… Compute REC2 𝑹𝑬𝑪 𝟏 = 𝟒𝟎𝒑𝒑 𝑹𝑬𝑪 𝟐 = 𝟐𝟓𝒑𝒑

Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Can start reduction point on a later point 𝑹𝑬𝑪 𝟏 = 𝟒𝟎𝒑𝒑 𝑹𝑬𝑪 𝟐 = 𝟐𝟓𝒑𝒑

Versioni Versioni+1 Versioni+2 S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 S1’ S2’ S3’ S4’ S5’ S6’ T1 X T2 T3 T4 T5 S1’’ S2’’ S3’’ S4’’ T1 X T2 T3 T4 T5 Get more data for certain distances 𝑹𝑬𝑪 𝟏 = 𝟒𝟎𝒑𝒑, 𝟐𝟓𝒑𝒑 𝑹𝑬𝑪 𝟐 = 𝟐𝟓𝒑𝒑 Compute REC1

(Evaluation) We are equipped to answer… We want to see how real projects’ reduced test suites behave

Evaluation: Implementation
Use PIT to collect statement coverage and mutants killed by tests on each version Use Greedy heuristic to perform test-suite reduction Use Statement Adequate Reduction and Mutant Adequate Reduction

Evaluation: Projects Let’s see how evolution affects reduced test suites on some projects 18 open-source projects from GitHub

Evaluation: Projects Variety of applications

Evaluation: Projects Distance between versions = 30 Git commits
NOTE: Not all 10 versions, some projects not as mature

Evaluation: Projects

Evaluation: Projects Median number of tests across versions
NOTE: Might not be all, limitations of the tool

Evaluation: Projects

Statement Adequate Reduction (SAR)
𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖 Explain x and y axis Explain colors Aggregation of all projects

Statement Adequate Reduction (SAR)
𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖 EMPH: quality does not drop much! Median REC for SAR is around 0, does not drop much

Mutant Adequate Reduction (MAR)
𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖

Mutant Adequate Reduction (MAR)
𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖 Median REC for MAR is around 0, does not drop much

(Answer) EMPH: we can now answer question!

The quality of the reduced test suite relative to the quality of the full test suite remains about the same during evolution Answer (based on our evaluation over 18 projects)

We further evaluated… Different test-suite reduction algorithms
Four other algorithms in addition to Greedy Inadequate test-suite reduction Reduction that does not preserve all statements covered or mutants killed …you can find more details in our paper

Threats to Validity Reduced test suites evaluated with same metrics (statement coverage and killed mutants) used for reduction Tests tracked by name 30 commits between versions

Related Work Improving test-suite reduction
Hao et al. [ICSE 2012] Lin and Huang [IST 2009] Yoo and Harman [ISSTA 2007] Jeffrey and Gupta [TSE 2007] Black et al. [ICSE 2004] Offutt et al. [ICTCS 1995] Effect of software evolution on coverage Elbaum et al. [ICSM 2001]

Conclusions Q: How does software evolution affect test-suite reduction?

Conclusions Q: How does software evolution affect test-suite reduction? Introduced new evolution-aware metrics (REC) Performed the largest evaluation of test-suite reduction Different metrics/algorithms, inadequate test-suite reduction

Conclusions Q: How does software evolution affect test-suite reduction? Introduced new evolution-aware metrics (REC) Performed the largest evaluation of test-suite reduction Different metrics/algorithms, inadequate test-suite reduction A: Reduced test suites do not reduce in quality relative to the full test suite over time If reduced test suite is acceptable for this version, then it is likely acceptable in future versions EMPH: the answer!

Conclusions Q: How does software evolution affect test-suite reduction? Introduced new evolution-aware metrics (REC) Performed the largest evaluation of test-suite reduction Different metrics/algorithms, inadequate test-suite reduction A: Reduced test suites do not reduce in quality relative to the full test suite over time If reduced test suite is acceptable for this version, then it is likely acceptable in future versions August Shi:

BACKUP

Project Filters Uses Maven build system
Has > 100 commits in history HEAD at time of experiments could be successfully run through PIT Has > 4 versions working through PIT in history

Evaluation: Basic Test-Suite Reduction

Different Algorithms? Versioni S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5
Different algorithms can behave differently under evolution

Different Algorithms? Versioni Versionj S1 S2 S3 S4 S5 T1 X T2 T3 T4
Different algorithms can behave differently under evolution

Different Algorithms? Greedy HGS GRE ILP GE Versioni Versionj S1 S2 S3
X T2 T3 T4 T5 S1 S2 S3 S4 S5 S6 T1 X T2 T3 T4 T5 Versioni Versionj Different algorithms can behave differently under evolution Greedy HGS GRE ILP GE

Evaluation: Other Algorithms
Other 4 algorithms produce very similar results compared to Greedy Difference in size reduction at most 5.26pp across all algorithms Difference in MutLoss for SAR algorithms at most 7.15pp Difference in StmtLoss for MAR algorithms at most 4.15pp Difference in RECd for any distance d at most 0.33pp for MutLoss, 0.67pp for StmtLoss Cut absolute stuff for MutLoss and StmtLoss?

Inadequate Reduction? Versioni S1 S2 S3 S4 S5 T1 X T2 T3 T4 T5
Inadequate reduction if budget is very tight How does inadequately reduced test suite compare over time?

Inadequate Reduction? Versioni Versionj S1 S2 S3 S4 S5 T1 X T2 T3 T4
Inadequate reduction if budget is very tight How does inadequately reduced test suite compare over time?

Evaluation: Inadequate Reduction
Statement Inadequate Reduction (SIR) Mutant Inadequate Reduction (MIR)

Evaluation: SIR Evolution
𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖 Results for Greedy Explain x and y axis Explain colors Trends are stable Same for all requirements used in reduction This is for Greedy, similar trends for other algorithms

Evaluation: MIR Evolution
𝑅𝐸𝐶 𝑑 = 𝑅𝐸𝐶 𝑖 𝑗 | 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑=𝑗−𝑖 Results for Greedy

Code Change

Balancing Trade-Offs in Test-Suite Reduction

Similar presentations

Presentation on theme: "Balancing Trade-Offs in Test-Suite Reduction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Balancing Trade-Offs in Test-Suite Reduction

Similar presentations

Presentation on theme: "Balancing Trade-Offs in Test-Suite Reduction"— Presentation transcript:

Similar presentations

About project

Feedback