Establishing Theoretical Minimal Sets of Mutants ICST 2014 Paul Ammann Joint work with Marcio Eduardo Delamaro Jeff Offutt April 1, 2014.

Slides:

Advertisements

Similar presentations

Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)

Advertisements

Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:

Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.

An Analysis and Survey of the Development of Mutation Testing by Yue Jia and Mark Harmon A Quick Summary For SWE6673.

Of 17 Assessing the Influence of Multiple Test Case Selection on Mutation Experiments Marcio E. Delamaro and Jeff Offutt George Mason University & Universidade.

Program Slicing Mark Weiser and Precise Dynamic Slicing Algorithms Xiangyu Zhang, Rajiv Gupta & Youtao Zhang Presented by Harini Ramaprasad.

Introduction to Software Testing Chapter 9.3 Challenges in Testing Software Test Criteria and the Future of Testing Paul Ammann & Jeff Offutt

Mutant Subsumption Graphs Mutation 2014 March 31, 2014 Bob Kurtz, Paul Ammann, Marcio Delamaro, Jeff Offutt, Lin Deng.

Comparison of Unit-Level Automated Test Generation Tools Shuang Wang Co-authored with Jeff Offutt April 4,

Infinite Horizon Problems

(c) 2007 Mauro Pezzè & Michal Young Ch 16, slide 1 Fault-Based Testing.

1 A Distributed Delay-Constrained Dynamic Multicast Routing Algorithm Quan Sun and Horst Langendorfer Telecommunication Systems Journal, vol.11, p.47~58,

Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.

Improving Code Generation Honors Compilers April 16 th 2002.

ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Exact Two-level Minimization Quine-McCluskey Procedure.

ECE 667 Synthesis and Verification of Digital Systems

1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 25 Instructor: Paul Beame.

Introduction to Software Testing Chapter 5.2 Program-based Grammars Paul Ammann & Jeff Offutt

Introduction to Software Testing Chapter 9.3 Challenges in Testing Software Test Criteria and the Future of Testing Paul Ammann & Jeff Offutt

Software Testing and Validation SWE 434

Lecture 6: The Ultimate Authorship Problem: Verification for Short Docs Moshe Koppel and Yaron Winter.

Seongbo Shim, Yoojong Lee, and Youngsoo Shin Lithographic Defect Aware Placement Using Compact Standard Cells Without Inter-Cell Margin.

© SERG Dependable Software Systems (Mutation) Dependable Software Systems Topics in Mutation Testing and Program Perturbation Material drawn from [Offutt.

Coverage – “Systematic” Testing Chapter 20. Dividing the input space for failure search Testing requires selecting inputs to try on the program, but how.

Coverage Literature of software testing is primarily concerned with various notions of coverage Four basic kinds of coverage: Graph coverage Logic coverage.

Big Ideas & Better Questions, Part II Marian Small May, ©Marian Small, 2009.

Researchers: Preet Bola Mike Earnest Kevin Varela-O’Hara Han Zou Advisor: Walter Rusin Data Storage Networks.

An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design Won-Hyong Chung and Seong-Bae Park Dept. of Computer Engineering.

Design of an Evolutionary Algorithm M&F, ch. 7 why I like this textbook and what I don’t like about it!

David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.

Private Approximation of Search Problems Amos Beimel Paz Carmi Kobbi Nissim Enav Weinreb (Technion)

Introduction to Software Testing Chapter 3.6 Disjunctive Normal Form Criteria Paul Ammann & Jeff Offutt

1 Grammar Extraction and Refinement from an HPSG Corpus Kiril Simov BulTreeBank Project ( Linguistic Modeling Laboratory, Bulgarian.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Mutation Testing G. Rothermel. Fault-Based Testing White-box and black-box testing techniques use coverage of code or requirements as a “proxy” for designing.

Area Under the Curve We want to approximate the area between a curve (y=x 2 +1) and the x-axis from x=0 to x=7 We want to approximate the area between.

Using Logic Criterion Feasibility to Reduce Test Set Size While Guaranteeing Double Fault Detection Gary Kaminski and Paul Ammann Software Engineering.

Introduction to Software Testing Chapter 3.6 Disjunctive Normal Form Criteria Paul Ammann & Jeff Offutt

Introduction to Software Testing Chapter 3.6 Disjunctive Normal Form Criteria Paul Ammann & Jeff Offutt

S G G S G G S G G S G G S G G S Will have better matching

IT 60101: Lecture #121 Foundation of Computing Systems Lecture 12 Trees: Part VII.

CS 3343: Analysis of Algorithms Lecture 19: Introduction to Greedy Algorithms.

CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.

Some facts about the cal() program February 5, 2015 Paul Ammann, SWE 437 with thanks to Bob Kurtz.

Fast SLAM Simultaneous Localization And Mapping using Particle Filter A geometric approach (as opposed to discretization approach)‏ Subhrajit Bhattacharya.

Foundations of Software Testing Slides based on: Draft V4.0. November 19, 2006 Test Adequacy Measurement and Enhancement Using Mutation Last update: January15,

1 Using a Fault Hierarchy to Improve the Efficiency of DNF Logic Mutation Testing Gary Kaminski and Paul Ammann ICST 2009.

2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.

Mutation Testing Breaking the application to test it.

CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:

Foundations of Software Testing Chapter 7: Test Adequacy Measurement and Enhancement Using Mutation Last update: September 3, 2007 These slides are copyrighted.

Standard Form Objective: To be able to understand and use standard form.

The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.

Foundations of Software Testing Chapter 7: Test Adequacy Measurement and Enhancement Using Mutation Last update: September 3, 2007 These slides are copyrighted.

LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.

Software Testing and Quality Assurance Practical Considerations (1) 1.

Introduction to Software Testing (2nd edition) Chapter 5 Criteria-Based Test Design Paul Ammann & Jeff Offutt

Area Under the Curve We want to approximate the area between a curve (y=x2+1) and the x-axis from x=0 to x=7 We will use rectangles to do this. One way.

Paul Ammann & Jeff Offutt

Analyzing the Validity of Selective Mutation with Dominator Mutants

Alex Groce, Josie Holmes, Darko Marinov, August Shi, Lingming Zhang

Fun facts about the cal() program

Fabiano Ferrari Software Engineering Federal University of São Carlos

Cryptography Lecture 7.

CSE 589 Applied Algorithms Spring 1999

Data Flow Analysis Compiler Design

Cryptography Lecture 7.

CSE 373: Data Structures and Algorithms

Cryptography Lecture 6.

Presentation transcript:

Establishing Theoretical Minimal Sets of Mutants ICST 2014 Paul Ammann Joint work with Marcio Eduardo Delamaro Jeff Offutt April 1, 2014

2 Outline The situation – Researchers use mutation analysis to evaluate test selection strategies The problem – What do mutation scores mean? The model – Motivating idea: Minimal mutant sets don’t have redundant mutants Need to define notion of redundancy – Main result: Dynamic subsumption = Minimal mutant sets Reduced mutation: Is it close to minimal? – Apply model to Siemens suite Result: Huge gap – Good news: That’s an opportunity!

Researchers Use Mutation Analysis to Evaluate Test Selection Strategies Test Selection Strategy C Test Selection Strategy B Test Selection Strategy A Carefully Chosen Artifacts “Good” Tests Deep Analysis Test Set C Select Test Sets with Test Selection Strategies Test Set B Test Set A Measure Exactly What Does A Score of 91% Mean? 3

The Problem With Mutation Scores m1m1 m2m2 m3m3 m4m4 t1t1 t2t2 t3t3 t4t4 t5t5 Evaluate 3 Test Sets with 4 Mutants: A: {t 1, t 2 } B : {t 2, t 5 } C: {t 3 } Mutation Scores for 3 Test Sets B scores 75% Is that good? 4

Let’s Add Some More Mutants m1m1 m2m2 m3m3 m4m4 m5m5 m6m6 m7m7 m8m8 m9m9 m 10 t1t1 t2t2 t3t3 t4t4 t5t5 Evaluate 3 Test Sets with 10 Mutants: A: {t 1, t 2 } B: {t 2, t 5 } C : {t 3 } Mutation Scores for 3 Test Sets Now B scores 90%! Did B just get better? Every test kills m 8 What’s the point? The same tests kill m 3 and m 6. We say that T does not distinguish m 3 from m 6 Ditto for m 5 and m 9 5

Let’s Throw Away Some Mutants m1m1 m2m2 t1t1 t2t2 t3t3 t4t4 t5t5 Evaluate 3 Test Sets with 2 Mutants: A: {t 1, t 2 } B : {t 2, t 5 } C : {t 3 } Mutation Scores for 3 Test Sets Now B scores 100% Did B get even better? 6

All Together Now m1m1 m2m2 m3m3 m4m4 m5m5 m6m6 m7m7 m8m8 m9m9 m 10 t1t1 t2t2 t3t3 t4t4 t5t5 Evaluate 3 Test Sets with Various Mutants: A: {t 1, t 2 } B : {t 2, t 5 } C: {t 3 } Cumulative Scores Is B lousy or good? What about C? 7

What Makes a Mutant Redundant? m1m1 m2m2 m3m3 m4m4 t1t1 t2t2 t3t3 t4t4 t5t5 Choose M = {m 1, m 2, m 3, m 4 } Choose T = {t 1, t 2, t 3, t 4, t 5 } Minimal test sets wrt M and T: {t 1, t 2 }, {t 1, t 3 }, {t 4 } Try removing m 1 : M 1 = M - {m 1 } Minimal test sets wrt M 1 and T: {t 1, t 2 }, {t 1, t 3 }, {t 4 } No change, so m 1 is redundant Basic Idea: Throwing away a redundant mutant has no effect on the minimal test sets. Try removing m 3 : M 3 = M - {m 3 } Minimal test sets wrt M 3 and T: {t 1 }, {t 4 } A change, so m 3 is not redundant Try removing m 4 : M 4 = M - {m 4 } Minimal test sets wrt M 4 and T: {t 1, t 2 }, {t 1, t 3 }, {t 2, t 5 }, {t 3, t 5 }, {t 4 } A change, so m 4 is not redundant Ditto for M 2 = M - {m 2 } 8

9 Minimal Sets of Mutants Definition – M is minimal if it does not contain redundant mutants Minimal mutant sets from the definition – Requires computing all minimal test sets, which is NP complete  We need an efficient algorithm for finding minimal mutant sets – Turn to dynamic subsumption Subsumption with respect to a test set

Dynamic Subsumption 10 Test set T Tests that kill m j Tests that kill m i Tests that kill m k m i → m j m i → m k ✔ ✖ ? ?

11 Efficiently Computing Minimal Sets of Mutants Formally: m x dynamically subsumes m y wrt T iff – Some test in T kills m x – Every test in T that kills m x also kills m y Main result: Mutant set M minimal wrt T = no dynamic subsumption in M Properties – Only need to consider mutants in pairs Groups of mutants do not make another mutant redundant – Fast – Every minimal mutant set has the same cardinality Contrast with minimal test sets

12 What Does This Mean in Practice? Apply the definitions to the Siemens test bed – See what happens! 7 programs – print_tokens – print_tokens2 – replace – schedule – schedule2 – tcas – totinfo Extensive hand-crafted test set

13 Test Characteristics Notes: – 512 is an artifact of the Proteum tool Our approach applies with any test set – Most tests used were also distinguished – Minimal test set size modest compared to number of tests used ProgramTests Available Tests Used Distinguished Tests Minimal Tests print_tokens print_tokens replace schedule schedule tcas totinfo

14 Mutant Characteristics “Killed Mutants” means those killed by the test set of size 512 Vast majority of remainder are equivalent ProgramTotal Mutants Killed Mutants Distinguished Mutants Minimal Mutants print_tokens print_tokens replace schedule schedule tcas totinfo Most mutants are redundant! –Tiny fraction of mutants are actually minimal wrt 512 tests! –print_tokens: Killing the right 28 mutants guarantees killing all 3711

15 How Good Are Reduced Mutation Strategies? We considered five approaches to reduced mutation – STMT: Statement deletion (Proteum SSDL) – ROR: Relation operators (Proteum ORRN) – CON: Replace scalars with constants (Proteum CCSR) – 5RND: 5% Random selection of all mutants – SELECT: “Selective” mutation (Proteum OOAN, OLLN, ORRN, OLNG) Method: – Choose test sets adequate for each reduced mutation approach wrt test sets analyzed earlier – Compute mutation score Against all mutants Against minimal mutant set Equivalent mutants hand-identified and removed

16 Reduced Mutation Scores: Raw vs. Minimal Notes: – Table entries: Raw Mutation Score: Minimal Mutation Score – Raw Reduced mutation scores make test strategies look good – Minimal Reduced mutation scores do not ProgramSTMTRORCON5RNDSELECT print_tokens99 : 7898 : 7799 : 7899 : 8299 : 81 print_tokens299 : 4799 : 5699 : 4999 : 4899 : 57 replace97 : 3197 : 3899 : 5799 : 5698 : 48 schedule97 : 6894 : 5398 : 6598 : 6797 : 65 schedule297 : 7292 : 5698 : 7798 : 7297 : 72 tcas88 : 2790 : 3888 : 3394 : 4593 : 43 totinfo97 : 3899 : 5999 : 3999 : 5499 : 60

17 Closer Look: Raw vs. Minimal for STMT Raw mutation scores show little variation Minimal mutation scores show a lot

18 Reduced Mutation: Mutants vs. Tests Notes: – Table entries: Number of Mutants : Size of Minimal Test Set – Reduced approaches Generate many more mutants than minimal But not nearly enough tests ProgramSTMTRORCON5RNDSELECTMinimal print_tokens196 : 1198 : 9358 : : : 1028 : 12.4 print_tokens2203 : 5192 : 8445 : 8198 : 9244 : 930 : 12.1 replace219 : : : : : 3558 : 44.4 schedule127 : 1049 : 778 : 1395 : 1284 : 1042 : 14.5 schedule2117 : 975 : 6119 : : : 1246 : 17.1 tcas42 : 1245 : 1466 : 1499 : : 1861 : 41.4 totinfo110 : 6167 : : : : 1519 : 13.3

19 Closer Look: Mutants and Tests for STMT STMT usually generates too many mutants Unfortunately, they aren’t the right ones – Hence, not nearly enough tests

20 Discussion Huge gap: Reduced mutation vs. minimal mutant sets – Research opportunity! The problem with reduced mutation – Reduced approaches don’t consider specific program under test – Maybe it’s time to change that Can we analyze specific mutants in a specific program? Problem with minimal mutant sets for practical testing – Need mutation adequate tests to compute minimal mutant sets! Aren’t we done at that point? There is a lot we don’t know about minimal mutant sets – Let’s look at an example from yesterday’s Mutation workshop

Subsumption Graph Example: cal() 31 nodes of indistinguished mutants 7 nodes of minimal mutants – muJava generated 145 non-equivalent mutants – we only need 7 for given test set Static analysis can refine this graph 21

22 Questions? Contact: – {pammann, –