Analyzing the Validity of Selective Mutation with Dominator Mutants

Slides:

Advertisements

Similar presentations

Data Flow Coverage. Reading assignment L. A. Clarke, A. Podgurski, D. J. Richardson and Steven J. Zeil, "A Formal Evaluation of Data Flow Path Selection.

Advertisements

Informed Search Methods How can we improve searching strategy by using intelligence? Map example: Heuristic: Expand those nodes closest in “as the crow.

MUTATION TESTING. Mutation testing is a technique that focuses on measuring the adequacy (quality) of test data (or test cases). Modify a program by introducing.

An Analysis and Survey of the Development of Mutation Testing by Yue Jia and Mark Harmon A Quick Summary For SWE6673.

Of 17 Assessing the Influence of Multiple Test Case Selection on Mutation Experiments Marcio E. Delamaro and Jeff Offutt George Mason University & Universidade.

Mutant Subsumption Graphs Mutation 2014 March 31, 2014 Bob Kurtz, Paul Ammann, Marcio Delamaro, Jeff Offutt, Lin Deng.

Trees. 2 Definition of a tree A tree is like a binary tree, except that a node may have any number of children Depending on the needs of the program,

Dealing with NP-Complete Problems

(c) 2007 Mauro Pezzè & Michal Young Ch 16, slide 1 Fault-Based Testing.

Trees. Definition of a tree A tree is like a binary tree, except that a node may have any number of children –Depending on the needs of the program, the.

Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.

Trees. 2 Definition of a tree A tree is a node with a value and zero or more children Depending on the needs of the program, the children may or may not.

Introduction to Software Testing Chapter 5.2 Program-based Grammars Paul Ammann & Jeff Offutt

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

Software Testing and Validation SWE 434

Class Specification Implementation Graph By: Njume Njinimbam Chi-Chang Sun.

CMSC 345 Fall 2000 Unit Testing. The testing process.

© SERG Dependable Software Systems (Mutation) Dependable Software Systems Topics in Mutation Testing and Program Perturbation Material drawn from [Offutt.

Timothy Reeves: Presenter Marisa Orr, Sherrill Biggers Evaluation of the Holistic Method to Size a 3-D Wheel/Soil Model.

Recursion and Dynamic Programming. Recursive thinking… Recursion is a method where the solution to a problem depends on solutions to smaller instances.

Testing Testing Techniques to Design Tests. Testing:Example Problem: Find a mode and its frequency given an ordered list (array) of with one or more integer.

Mutation Testing G. Rothermel. Fault-Based Testing White-box and black-box testing techniques use coverage of code or requirements as a “proxy” for designing.

Bug Localization with Association Rule Mining Wujie Zheng

Introduction to Software Testing Chapter 9.2 Program-based Grammars Paul Ammann & Jeff Offutt

Establishing Theoretical Minimal Sets of Mutants ICST 2014 Paul Ammann Joint work with Marcio Eduardo Delamaro Jeff Offutt April 1, 2014.

Week 5-6 MondayTuesdayWednesdayThursdayFriday Testing III No reading Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.

Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.

Some facts about the cal() program February 5, 2015 Paul Ammann, SWE 437 with thanks to Bob Kurtz.

1 Using a Fault Hierarchy to Improve the Efficiency of DNF Logic Mutation Testing Gary Kaminski and Paul Ammann ICST 2009.

Mutation Testing Breaking the application to test it.

Mutation Testing Laraib Zahid & Mariam Arshad. What is Mutation Testing?  Fault-based Testing: directed towards “typical” faults that could occur in.

Introduction to Software Testing (2nd edition) Chapter 5 Criteria-Based Test Design Paul Ammann & Jeff Offutt

Introduction to Machine Learning, its potential usage in network area,

Department of Mathematics

A Review of Software Testing - P. David Coward

Review: Tree search Initialize the frontier using the starting state

Paul Ammann & Jeff Offutt

User-Written Functions

Linguistic Graph Similarity for News Sentence Searching

BackTracking CS255.

Sofus A. Macskassy Fetch Technologies

Introduction to Software Testing Chapter 9.2 Program-based Grammars

Rule Induction for Classification Using

Presented by: Dr Beatriz de la Iglesia

White-Box Testing.

Data Mining (and machine learning)

Introduction to Software Testing Chapter 9.2 Program-based Grammars

Heuristic-based Recommendation for Metamodel–OCL Coevolution

August Shi, Tifany Yung, Alex Gyori, and Darko Marinov

Fun facts about the cal() program

Fabiano Ferrari Software Engineering Federal University of São Carlos

White-Box Testing.

Introduction to Software Testing Chapter 5.2 Program-based Grammars

Test Case Purification for Improving Fault Localization

A Few Review Questions.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Control Structure Testing

Examining Variables on Flow Paths

More on HW 2 (due Jan 26) Again, it must be in Python 2.7.

George Mason University

CS 416 Artificial Intelligence

Paul Ammann & Jeff Offutt

Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.

Area Coverage Problem Optimization by (local) Search

Mutation Testing Faults are introduced into the program by creating many versions of the program called mutants. Each mutant contains a single fault. Test.

Presentation transcript:

Analyzing the Validity of Selective Mutation with Dominator Mutants FSE 2016, Seattle, Washington, USA Bob Kurtz, Paul Ammann, Jeff Offutt, Marcio E. Delamaro, Mariet Kurtz, and Nida Gökçe George Mason University, USA Universidade de São Paulo, Brazil The MITRE Corporation Muğla Sıtkı Koçman University, Turkey

Mutation testing in 2016 MASON GMU FSE 2016 ACME Co.

GMU Mutation testing in 2016 int max(int i, int j) { if (i > j) { return i; } else { return j; int max(int i, int j) { if (i <= j) { return i; } else { return j; int max(int i, int j) { if (i > j) { return j; } else { int max(int i, int j) { if (i > j) { return i; } else { return j++; Mutation Operators ROR LVR AOI MASON GMU FSE 2016 ACME Co.

≠ GMU   Mutation testing in 2016 Original Mutants assertEquals(2, int max(int i, int j) { if (i > j) { return i; } else { return j; ≠   int max(int i, int j) { if (i <= j) { return i; } else { return j; int max(int i, int j) { if (i > j) { return j; } else { int max(int i, int j) { if (i > j) { return i; } else { return j++; int max(int i, int j) { if (i > j) { return i; } else { int max(int i, int j) { if (i <= j) { return i; } else { return j; int max(int i, int j) { if (i > j) { return j; } else { int max(int i, int j) { if (i > j) { return i; } else { return j++; int max(int i, int j) { if (i > j) { return i; } else { int max(int i, int j) { if (i >= j) { return i; } else { return j; int max(int i, int j) { if (i > i) { return i; } else { return j; int max(int i, int j) { if (i) { return i; } else { return j; int max(int i, int j) { if (i > j) { return --i; } else { return j; assertEquals(2, max(2,1)); int max(int i, int j) { if (i >= j) { return i; } else { return j; int max(int i, int j) { if (i > i) { return i; } else { return j; int max(int i, int j) { if (i) { return i; } else { return j; int max(int i, int j) { if (i > j) { return --i; } else { return j; MASON GMU FSE 2016 ACME Co.

Mutation testing in 2016 20,000 Mutants 2,000 SLOC ACME Co. FSE 2016 int max(int i, int j) { if (i > j) { return i; } else { return j; 2,000 SLOC FSE 2016 ACME Co.

Mutation testing in 2016 FSE 2016 ACME Co.

=  Mutation testing in 2016 int max(int i, int j) { if (i > j) { return i; } else { return j; if (i >= j) { =  FSE 2016 ACME Co.

Mutation testing in 2016 FSE 2016 ACME Co.

Path too long, no end in sight! Mutation testing in 2016 ??? Equivalent mutants! You can’t kill ‘em! How many? 10-20% Path too long, no end in sight! MASON GMU FSE 2016 ACME Co.

Mutation testing in 2016 FSE 2016 ACME Co.

=    What went wrong? Equivalent mutants Syntactically different but semantically identical to the original program Cannot be killed by tests, must be manually evaluated one-by-one Requires unrealistic amounts of work! int max(int i, int j) { if (i > j) { return i; } else { return j; int max(int i, int j) { if (i >= j) { return i; } else { return j; int max(int i, int j) { if (i > j) { return i; } else { return j++; =    FSE 2016

What went wrong? Redundant mutants A mutant is redundant if it is always killed when some other mutant is killed ≈98% of non-equivalent mutants How far along is testing? FSE 2016

Mutant reduction strategies int max(int i, int j) { if (i > j) { return i; } else { return j; Selective mutation Use the “best” operators to produce fewer mutants CCDL ORAN VLSR … OAAN SRSR FSE 2016

Mutant reduction strategies int max(int i, int j) { if (i > j) { return i; } else { return j; Random mutant selection Typically select ≈5% of all mutants CCDL OAAN ORAN SRSR VLSR … FSE 2016

Research questions Selective mutation has generally been thought of as the best approach since 1993 Random mutant selection has been used since 1980 How effective are these techniques? Can we improve on them? FSE 2016

mi → mj Mutant subsumption Given a set of mutants M, mutant mi subsumes mutant mj (mi → mj) iff: Some test kills mi All tests that kill mi also kill mj All Tests + Tests that kill mj Tests that kill mk Tests that kill mi mi → mj FSE 2016 [Ammann, et al., ICST 2014]

Subsumption graph m1 m1 m2 m3 m2 m3 m4 m4 m5 m5 m6 m7 m6 m7 m8 m8 Shows the subsumption relationship between mutants Leaf nodes not subsumed by any other mutant are “dominator mutants” Tests that kill all these dominators will also kill all other mutants All the other mutants are redundant! m1 m1 m2 m3 m2 m3 m4 m4 m5 m5 m6 m7 m6 m7 m8 m8 FSE 2016 [Kurtz, et al., Mutation 2014]

Using selective mutation, random mutants, or some other technique Testing model Begin Using selective mutation, random mutants, or some other technique Mutants All Mutants Select some mutant set M Mutants All Tests Select minimal test set T that kills M Find all mutants killed by T Mutation Scores Dominator Scores End FSE 2016

Nondeterministic process Testing model Begin Mutants All Mutants Select some mutant set M Nondeterministic process Mutants All Tests Select minimal test set T that kills M Find all mutants killed by T Mutation Scores Dominator Scores End FSE 2016

Testing model Begin Mutants All Mutants Select some mutant set M #𝑚𝑢𝑡𝑎𝑛𝑡 𝑠 𝑘𝑖𝑙𝑙𝑒𝑑 #𝑚𝑢𝑡𝑎𝑛𝑡 𝑠 𝑡𝑜𝑡𝑎𝑙 −#𝑚𝑢𝑡𝑎𝑛𝑡 𝑠 𝑒𝑞𝑢𝑖𝑣𝑎𝑙𝑒𝑛𝑡 Mutants All Tests Select minimal test set T that kills M #𝑑𝑜𝑚𝑖𝑛𝑎𝑡𝑜 𝑟𝑠 𝑘𝑖𝑙𝑙𝑒𝑑 #𝑑𝑜𝑚𝑖𝑛𝑎𝑡𝑜 𝑟𝑠 𝑡𝑜𝑡𝑎𝑙 Find all mutants killed by T Mutation Scores Dominator Scores End FSE 2016

Siemens suite scores Mutation and dominator score using the 5 E-Selective mutation operators from Mothra [Offutt, et al., 1993] FSE 2016 [Ammann, et al., ICST 2014]

The Siemens suite? Really?? REAL SOFTWARE SIEMENS SUITE FSE 2016

The Siemens suite? Really! The Siemens suite is useful here because: It has a thorough test suite that elicits rich subsumption relationships between mutants The Proteum mutation tool for C has a very large set of mutation operators If we can show that selective mutation is not a good solution for any particular program, then it is not a good solution for programs in general FSE 2016

Mutation vs. Dominator Score Mutation score for the Siemens replace program (1,000 iterations) FSE 2016 [Kurtz, et al., Mutation 2016]

Mutation vs. Dominator Score Mutation and dominator score for the Siemens replace program FSE 2016 [Kurtz, et al., Mutation 2016]

1-4 Operators for print_tokens Our Goal We define work as the number of mutants the engineer must evaluate (write a killing test or to show it’s equivalent) FSE 2016

1-4 Operators for print_tokens 231,525 operator combinations We define work as the number of mutants the engineer must evaluate (to write a killing test or to show it’s equivalent) FSE 2016

1-4 Operators for print_tokens Optimal selective operator combinations for this program 231,525 operator combinations FSE 2016

1-4 Operators for Siemens suite Operator combination are far less optimal for all programs There is no set of up to 4 operators that is good for every program 489,504 operator combinations FSE 2016

Selective and random Traditional mutant reduction approaches are not optimal, consistent with prior research that selective and random are similar FSE 2016

More than 4 operators m1 m1 m2 m3 m2 m3 m4 m4 m5 m6 m5 m5 m6 m7 m6 m7 We’d like to expand to sets of mutation operators larger than 4 Brute force approach is infeasible 1-4 operators = 489,504 sets ≈ 24 hours 1-59 operators ≈ 6x1017 sets ≈ 3x109 years Approximate test results based on subsumption Spearman’s rank ρ=0.966 Recalculate best operator combinations in the usual way m1 m1 m2 m3 m2 m3 m4 m4 m5 m6 m5 m5 m6 m7 m6 m7 m8 m8 FSE 2016

Optimal selective solutions Best 1-program solutions Best 1-59 operator combinations Even the optimal operator combinations are not very good FSE 2016

Is mutation testing dead? I don’t think so, but we need to do something better! FSE 2016

Mutation testing in 2020? Use machine learning to generate optimized mutants based on program features int max(int i, int j) { if (i > j) { return j; } else { int max(int i, int j) { if (i > j) { return i; } else { return j++; int max(int i, int j) { if (i > j) { return i; } else { return j; int max(int i, int j) { if (i > j) { return i; } else { int max(int i, int j) { if (i > i) { return i; } else { return j; int max(int i, int j) { if (i > j) { return i; } else { return j; int max(int i, int j) { if (i) { return i; } else { return j; int max(int i, int j) { if (i > j) { return --i; } else { return j; FSE 2016 [Future work]

Mutation testing in 2020? Use machine learning to generate optimized mutants based on program features Use static analysis to determine partial subsumption & tests int max(int i, int j) { if (i > j) { return i; } else { return j; int max(int i, int j) { if (i > j) { return j; } else { int max(int i, int j) { if (i > j) { return i; } else { return j++; int max(int i, int j) { if (i > j) { return i; } else { int max(int i, int j) { if (i > i) { return i; } else { return j; int max(int i, int j) { if (i) { return i; } else { return j; int max(int i, int j) { if (i > j) { return --i; } else { return j; FSE 2016 [Kurtz, et al., Mutation 2015]

Mutation testing in 2020? Use machine learning to generate optimized mutants based on program features Use static analysis to determine partial subsumption & tests Execute tests to refine subsumption and kill mutants int max(int i, int j) { if (i > j) { return i; } else { return j; FSE 2016 [Kurtz, et al., Mutation 2015]

Mutation testing in 2020? Use machine learning to generate optimized mutants based on program features Use static analysis to determine partial subsumption & tests Execute tests to refine subsumption and kill mutants Remove subsumed mutants and redundant tests  int max(int i, int j) { if (i > j) { return i; } else { return j; FSE 2016 [Kurtz, et al., Mutation 2014]

Mutation testing in 2020? Use machine learning to generate optimized mutants based on program features Use static analysis to determine partial subsumption & tests Execute tests to refine subsumption and kill more mutants Remove subsumed mutants and redundant tests Output a set of tests and a FEW probable-high-value mutants for the engineer to kill int max(int i, int j) { if (i > j) { return i; } else { return j; int max(int i, int j) { if (i > j) { return i; } else { return --i; return j; FSE 2016

Conclusions Mutation score is an imprecise metric inflated by redundant mutants Researchers should use dominator score instead Current mutant selection techniques are not optimal There are no mutation operator combinations that are effective for a range of programs To optimize dominator score per unit of work, we need to customize mutants to the program under test! FSE 2016

Backup Slides FSE 2016

Future work Develop a mutant selection approach that is customized for a program Can’t tell if a mutation is good or bad without program context Use machine learning Correlate program features and mutation operators with mutant “goodness” Select more dominator and near-dominator mutants Select fewer highly- subsumed and equivalent mutants  FSE 2016

Mutant Subsumption Graphs Given the following score function: m1 m3,m6 m5 m2 t1 t2 t3 t4 m1  m2 m3 m4 m5 m6 m7 m8 m9 m10 m7,m9 m4 m8,m10 When we construct the mutant subsumption graph from the score function, we see three root nodes that are not subsumed by any other mutants. One mutant from each of these nodes forms a dominator set: { m1, m3, m5 }, { m1, m6, m5 } All other mutants are redundant! FSE 2016

Dominator mutation score Assume we execute test t1 m1 m3,m6 m5 m2 t1 t2 t3 t4 m1  m2 m3 m4 m5 m6 m7 m8 m9 m10 m7,m9 m4 m8,m10 Killed mutants are shown in gray Mutation score: 7 of 9 killable mutants = 0.78 Dominator score: 1 of 3 mutants in a dominator set = 0.33 FSE 2016

Redundancy and equivalency We want to investigate how the accuracy changes as the number of redundant and equivalent mutants change, we need a way to measure redundancy and equivalency, preferably in a decoupled manner FSE 2016

Work and Normalized Work How much effort is required? We use a simple definition: the number of mutants that a tester must examine To effectively compare work between different programs with different numbers of mutants, we define equivalent work: FSE 2016

Redundancy and Work FSE 2016

Equivalency and Work FSE 2016

Average / worst-case correlation Spearman’s rank correlation ρ=0.966 FSE 2016

What’s an optimal solution? We score non-optimal points using the Hausdorff distance (dH), the distance from the nearest optimal point Hausdorff Distance dh FSE 2016

Today’s common techniques Mutant Selection Technique Hausdorff Distance (smaller is better) SSDL 0.104 E-Selective 0.134 5% Random 0.150 10% Random 0.274 15% Random 0.147 20% Random 0.229 25% Random 0.111 FSE 2016

Incrementally better Operator set “A” Operator set “B” 9 mutation operators Same dominator score as E- Selective Only 42% of the work Operator set “B” 8 mutation operators Same work as E-Selective 29% higher dominator score Operator set “C” 14 mutation operators A knee in the curve None are as good as the best solutions for a single program! FSE 2016

Determining local context FSE 2016

Threats to validity The Siemens suite is a small number of small programs Are they really representative of real-world programs? Existence of unkilled (but killable) mutants injects errors into determining dominator scores Specific operator sets identified as optimal may not be optimal, but broader points are not affected FSE 2016