Download presentation
Presentation is loading. Please wait.
Published byAvice Cameron Modified over 8 years ago
1
Is Mutation Analysis Ready for Prime Time? Jeff Offutt George Mason University http://www.cs.gmu.edu/~offutt/softwaretest/ Based on the book Introduction to Software Testing, edition 2 Ammann & Offutt Cambridge University Press, 2016 (forthcoming)
2
OUTLINE © Jeff Offutt 2 1.Our backgrounds 2.Coverage criteria overview 3.Mutation analysis overview 4.Mutation for source code 5.Mutation for input space grammars 6.Open research problems
3
My Background Professor of Software Engineering (George Mason) –> 175 refereed publications, H-index = 57 –Editor-in-Chief: Journal of Software Testing, Verif., and Reliability –Co-Founder: IEEE Intl Conf. on Software Testing –Author: Introduction to Software Testing –2013 GMU Teaching Excellence Award, Teaching With Technology –Mason Outstanding Faculty Member, 2008, 2009 –Advised 15 PhD students, 4 in progress Research Highlights –First model-based testing paper (UML 1999) –Distributed research tools : muJava, Mothra, Godzilla, Coverage web apps –Seminal papers : Mutation testing, automatic test data generation, OO testing, web app testing, combinatorial testing, logic-based testing, model-based testing © Jeff Offutt 3
4
Your Background? © Jeff Offutt 4 How long have you been in graduate school? Do you have a research advisor? Have you published yet? Do you have a research topic?
5
Background Questions 1. Did you take an undergraduate course in software testing ? 2. Did you take a graduate course in software testing ? 3. Have you ever been paid to test software (that is, in industry)? 4. Have you ever seen an introduction to mutation analysis ? 5. Have you every read a mutation research paper ? © Jeff Offutt 5 1.Write down answers (yes or no) 2.Submit during the break 3.Names are optional
6
OUTLINE © Jeff Offutt 6 1.Our backgrounds 2.Coverage criteria overview 3.Mutation analysis overview 4.Mutation for source code 5.Mutation for input space grammars 6.Open research problems
7
© Jeff Offutt 7 Software Fault : A static defect in the software Software Failure : External, incorrect behavior with respect to the requirements or other description of the expected behavior Software Error : An incorrect internal state that is the manifestation of some fault Software does not degrade—faults are more like design mistakes in hardware Software Faults, Errors & Failures Testing can find faults, but can NEVER prove the absence of faults
8
Fault & Failure Model (RIPR) Four conditions necessary for a failure to be observed 1. Reachability : The location or locations in the program that contain the fault must be reached 2. Infection : The state of the program must be incorrect 3. Propagation : The infected state must cause some output or final state of the program to be incorrect 4. Reveal : The tester must observe part of the incorrect portion of the program state © Jeff Offutt 8
9
RIPR Model Reachability Infection Propagation Revealability Test Fault Incorrect Program State Test Oracle Final Program State Observed Final Program State Reaches Infects Propagates Reveals Incorrect Final State © Jeff Offutt 9 Observed Final Program State
10
Self-Check Problem © Jeff Offutt 10 /** * Find last index of element * @param x array to search * @param y value to look for * @return last index of y in x; -1 if absent * @throws NullPointerException if x is null */ public static int findLast (int[] x, int y) { for (int i=x.length-1; i > 0; i--) { if (x[i] == y) { return i; } return -1; } findLast() has a fault a)Describe the fault, including a change to fix it b)Give a test that does not reach the fault c)Give a test that reaches the fault, but does not infect the program state d)Give a test that infects the program state, but does not propagate to a failure e)Give a test that reaches the fault, infects the state, and propagates to a failure
11
Self-Check Problem Answer © Jeff Offutt 11 A Solution a)Describe the fault, including a change to fix it b)Give a test that does not reach the fault c)Give a test that reaches the fault, but does not infect the program state d)Give a test that infects the program state, but does not propagate to a failure e)Give a test that reaches the fault, infects the state, and propagates to a failure The for-loop does not include the 0 index: for (int i=x.length-1; i >= 0; i--) A null value for x: x = null; y = 3; y is in the array, but not in the first (zeroth) position: x = [2, 3, 5]; y = 3; y is not in the array—the final iteration is not taken, so the state is wrong, but there is no failure: x = [2, 3, 5]; y = 7; y is in the first position: x = [2, 3, 5]; y = 2;
12
Coverage Criteria Even small programs have too many inputs to fully test them all –private static double computeAverage (int A, int B, int C) –32-bit machine; each variable more than 4 billion possible values –More than 80 octillion possible tests!! –Input space might as well be infinite Testers search a huge input space –Trying to find the fewest inputs that will find the most problems Coverage criteria give structured, practical ways to search the input space –Search the input space thoroughly –Not much overlap in the tests © Jeff Offutt 12
13
Test Requirements and Criteria Test Criterion : A collection of rules and a process that define test requirements ̶ Cover every statement ̶ Cover every functional requirement Test Requirements : Specific things that must be satisfied or covered during testing –Each statement might be a test requirement –Each functional requirement might be a test requirement © Jeff Offutt 13 Testing researchers have defined hundreds of criteria, but they are all really just a few criteria on four types of structures …
14
Criteria Based on Structures © Jeff Offutt 14 Structures : Four ways to model software 1.Input Domains (sets) A: {0, 1, >1} B: {600, 700, 800} C: {cs, math, swe, ece} 2.Graphs 3.Logical Expressions (not X or not Y) and A and B 4.Syntactic Structures (grammars—mutation) if (x > y) z = x - y; else z = 2 * x;
15
Criteria and the RIPR Model © Jeff Offutt 15 1.Input Domain Testing Does not ensure reachability 2.Graph Testing Ensures reachability 3.Logical Expression Testing Ensures reachability & infection 4.Syntactic Structures (Mutation testing) Ensures reachability, infection, & propagation
16
© Jeff Offutt 16 Two Ways to Use Test Criteria 1.Directly generate test values to satisfy the criterion –Often assumed by the research community –Most obvious way to use criteria –Very hard without automated tools 2.Generate test values externally and measure against the criterion –Usually favored by industry –Sometimes misleading –If tests do not reach 100% coverage, what does that mean? Test criteria are sometimes called metrics
17
Advantages of Criteria-Based Test Design Criteria maximize the “bang for the buck” –Fewer tests that are more effective at finding faults Comprehensive test set with minimal overlap Traceability from software artifacts to tests –The “why” for each test is answered –Built-in support for regression testing A “stopping rule” for testing—advance knowledge of how many tests are needed Natural to automate © Jeff Offutt 17
18
Criteria Summary © Jeff Offutt 18 Many companies still use “monkey testing” A human sits at the keyboard, wiggles the mouse and bangs the keyboard No automation Minimal training required Some companies automate human-designed tests But companies that use both automation and criteria- based testing Many companies still use “monkey testing” A human sits at the keyboard, wiggles the mouse and bangs the keyboard No automation Minimal training required Some companies automate human-designed tests But companies that use both automation and criteria- based testing Save money Find more faults Build better software
19
OUTLINE © Jeff Offutt 19 1.Our backgrounds 2.Coverage criteria overview 3.Mutation analysis overview 4.Mutation for source code 5.Mutation for input space grammars 6.Open research problems
20
What is a Mutant? © Jeff Offutt 20 Mutant A small syntactic change to a programming artifact (program, statechart, XML, SQL, specification, …) Mutation operators are defined on the underlying grammar (change an operator to another compatible operator, change an edge from one target state to another, …)
21
What is Mutation ? © Jeff Offutt 21 General View We are performing mutation analysis whenever we use well defined rules use well defined rules defined on syntactic descriptions defined on syntactic descriptions to make systematic changes to make systematic changes to the syntax or to objects developed from the syntax to the syntax or to objects developed from the syntax mutation operators grammars grammar ground strings (tests or programs) Applied universally or according to empirically verified distributions
22
Killing Mutants © Jeff Offutt 22 Causing the mutated artifact to behave differently from the original artifact If a test kills a mutant, that means the test is valuable for finding problems in the software
23
Mutation Analysis © Jeff Offutt 23 Mutation analysis refers to the process of applying mutation operators to modify syntactic artifacts, creating mutants Mutation analysis is mostly used for testing, hence the term mutation testing This is also called “syntax-based testing”
24
OUTLINE © Jeff Offutt 24 1.Our backgrounds 2.Coverage criteria overview 3.Mutation analysis overview 4.Mutation for source code 5.Mutation for input space grammars 6.Open research problems
25
Program-based Mutation The original and most widely known application of mutation is to modify programs Operators modify a ground string (program under test) to create mutant programs Mutant programs must compile correctly (must be valid strings in the grammar) Once mutants are defined, tests must be found to cause mutants to fail This is called “killing mutants” © Jeff Offutt 25
26
Categorizing Mutants © Jeff Offutt 26 Dead mutant : A test case has killed it Stillborn mutant : Syntactically illegal Trivial mutant : Almost every test can kill it Equivalent mutant : No test can kill it (same behavior as original) Testers can keep adding tests until all mutants have been killed
27
© Jeff Offutt 27 Program Mutation Example Original Method int Min (int A, int B) { int minVal; minVal = A; if (B < A) { minVal = B; } return (minVal); } // end Min With Embedded Mutants int Min (int A, int B) { int minVal; minVal = A; ∆ 1 minVal = B; if (B < A) ∆ 2 if (B >= A) ∆ 3 if (B < minVal) { minVal = B; ∆ 4 Bomb (); ∆ 5 minVal = A; ∆ 6 minVal = failOnZero (B); } return (minVal); } // end Min 6 mutants Each represents a separate program Replace one variable with another Replaces operator Immediate runtime failure … if reached Immediate runtime failure if B==0, else does nothing
28
© Jeff Offutt 28 Equivalent Mutation Example Mutant 3 in the Min() example is equivalent: minVal = A; if (B < A) ∆ 3 if (B < minVal) The mutant can only be killed if (B < A) != (B < minVal) However, the previous statement was “minVal = A” –Substituting, we get: “(B < A) != (B < A)” –This is a logical contradiction ! Thus no input can kill this mutant
29
© Jeff Offutt 29 Mutation and RIPR The RIPR model : Reachability : The test causes the mutated statement to be reached Infection : The test causes the mutated statement to result in an incorrect state Propagation : The incorrect state propagates to incorrect output Revealability : The tester must observe part of the incorrect output The RIPR model leads to two variants of mutation coverage … 1. Strong mutation : propagation is required 2. Weak mutation : only infection, but not propagation
30
© Jeff Offutt 30 Weak Mutation Example The complete test specification to kill mutant 1: Reachability : true // Always get to that statement Infection : A ≠ B Propagation: (B < A) = false // Skip the next assignment Full Test Specification : true (A ≠ B) ((B < A) = false) ≡ (A ≠ B) (B ≥ A) ≡ (B > A) Weakly kill mutant 1, but not strongly? minVal = A; ∆ 1 minVal = B; if (B < A) minVal = B; Mutant 1 in the Min( ) example is: A = 5, B = 3
31
© Jeff Offutt 31 Automated steps Mutation Testing Process Input test method Prog Create mutants Run T on P Run mutants: schema-based weak selective Eliminate ineffective TCs Generate test cases Run equivalence detector Threshold reached ? Define threshold no P (T) correct ? yes Fix P no
32
Self-Check Questions 1. What does strong mutation require that weak mutation does not? 2. What do we call infeasible test requirements in mutation testing? 3. Do mutation operators (a) mimic programmer mistakes, (b) encourage common test heuristics, (c) both, or (d) something else? 4. Does mutation testing (a) evaluate tests, (b) help testers design tests, (c) both, or (d) neither? © Jeff Offutt 32
33
Self-Check Answers 1. What does strong mutation require that weak mutation does not? 2. What do we call infeasible test requirements in mutation testing? 3. Do mutation operators (a) mimic programmer mistakes, (b) encourage common test heuristics, (c) both, or (d) something else? 4. Does mutation testing (a) evaluate tests, (b) help testers design tests, (c) both, or (d) neither? © Jeff Offutt 33 Propagation Equivalent Both
34
© Jeff Offutt 34 Why Mutation Works This is not an absolute ! The mutants guide the tester to an effective set of tests A very challenging problem : –Find a fault and a set of mutation-adequate tests that do not find the fault Of course, this depends on the mutation operators … Fundamental Premise of Mutation Testing If the software contains a fault, there will usually be a set of mutants that can only be killed by a test case that also detects that fault
35
© Jeff Offutt 35 Designing Mutation Operators At the method level, mutation operators for different programming languages are similar Mutation operators do one of two things : –Mimic typical programmer mistakes ( incorrect variable name ) –Encourage common test heuristics ( cause expressions to be 0 ) Researchers design lots of operators, then experimentally select the most useful Effective mutation operators yield mutants that are hard to kill, but not impossible –Tests that kill mutants from effective operators will also kill most other mutants
36
Mutation Operators for muJava 1. ABS –– Absolute Value Insertion 2. AOR –– Arithmetic Operator Replacement 3. ROR –– Relational Operator Replacement 4. COR –– Conditional Operator Replacement 5. SOR –– Shift Operator Replacement 6. LOR –– Logical Operator Replacement 7. ASR –– Assignment Operator Replacement 8. UOI –– Unary Operator Insertion 9. UOD –– Unary Operator Deletion 10. SVR –– Scalar Variable Replacement 11. BSR –– Bomb Statement Replacement © Jeff Offutt 36
37
Code Defenders Code Defenders is a mutation “game” designed and built by Gordon Fraser and José Miguel Rojas at the University of Sheffield Two players start with a Java class under test –Attacker creates a mutant of the class –Defender creates tests to try to kill the mutant © Jeff Offutt 37 http://code-defenders.dcs.shef.ac.uk/
38
Code Defenders Find a partner (2-player game) Each needs to create an account Attacker creates a game –cal.java –Level easy (hard is default) –Remember your game number Defender joins your game –Class under test is on the left, tests are on the right Attacker creates mutants—defender designs tests to kill © Jeff Offutt 38 http://code-defenders.dcs.shef.ac.uk/ http://code-defenders.dcs.shef.ac.uk/survey.html The authors asked us to complete a survey:
39
OUTLINE © Jeff Offutt 39 1.Our backgrounds 2.Coverage criteria overview 3.Mutation analysis overview 4.Mutation for source code 5.Mutation for input space grammars 6.Open research problems
40
© Jeff Offutt 40 Input Space Grammars The input space can be described in many ways –User manuals –Unix man pages –Method signature or method preconditions –A language Most input spaces can be described as grammars Grammars are usually not provided, but creating them is a valuable service by the tester –Errors will often be found simply by creating the grammar Input Space The set of allowable inputs to software
41
© Jeff Offutt 41 Validating Inputs Software should reject or handle invalid data Programs often do this incorrectly Some programs assume all input data is correct Even if it works today … software may be changed or reused Input Validation Deciding if input values can be processed by the software Input validation finds out what the software does when it receives invalid data
42
Representing Input Domains © Jeff Offutt 42 goal Desired inputs (goal domain) specified Described inputs (specified domain) implemented Accepted inputs (implemented domain)
43
Example Input Domains Goal domains are often irregular Goal domain for credit cards † –First digit is the Major Industry Identifier –First 6 digits and length specify the issuer –Final digit is a “check digit” –Other digits identify a specific account Common specified domain –First digit is in { 3, 4, 5, 6 } (travel and banking) –Length is between 13 and 16 Common implemented domain –All digits are numeric © Jeff Offutt 43 † More details are on : http://www.merriampark.com/anatomycc.htm
44
Representing Input Domains © Jeff Offutt 44 goal goal domain specified specified domain implemented implemented domain This region is a rich source of software errors …
45
Designing Tests From Grammars This form of testing allows us to focus on interactions among the components –Originally applied to Web services, which depend on XML A formal model of the grammar was used –BNF –XML / XML Schemas Valid and invalid tests can be created The grammar is mutated The mutated grammar is used to generate inputs © Jeff Offutt 45
46
© Jeff Offutt 46 BNF Grammar for Bank bank ::= action* action ::= dep | deb dep ::= “deposit” account amount deb ::= “debit” account amount account ::= digit 4 amount ::= “$” digit + “.” digit 2 digit ::= “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” Consider a program that processes a sequence of deposits and debits to a bank Inputs deposit 5306 $4.30 debit 0343 $4.14 deposit 5306 $7.29
47
Mutating BNF Grammars © Jeff Offutt 47 Nonterminal Replacement dep ::= “deposit” account amount dep ::= “deposit” amount amount dep ::= “deposit” account digit deposit $1500.00 $3789.88 deposit 4400 5 Terminal Replacement amount ::= “$” digit + “.” digit 2 amount ::= “.” digit + “.” digit 2 amount ::= “$” digit + “$” digit 2 amount ::= “$” digit + “1” digit 2 deposit 4400.1500.00 deposit 4400 $1500$00 deposit 4400 $1500100 Terminal and Nonterminal Deletion dep ::= “deposit” account amount dep ::= account amount dep ::= “deposit” amount dep ::= “deposit” account 4400 $1500.00 deposit $1500.00 deposit 4400
48
© Ammann & Offutt 48 XML Book Message XML messages are defined by grammars –Schemas and DTDs Schemas can define many kinds of types Schemas include “facets,” which refine the grammar 0471043281 The Art of Software Testing Glen Myers Wiley 50.00 1979 Introduction to Software Testing, edition 2 (Ch 9) schemas define input spaces for software components
49
© Ammann & Offutt 49 Book Grammar – Schema Introduction to Software Testing, edition 2 (Ch 9) Built-in types
50
© Ammann & Offutt 50 Mutating XML XML schemas can be mutated If a schema does not exist, testers should derive one –As usual, this will help find problems immediately Many programs validate messages against a grammar –Software may still behave correctly, but testers must verify Programs are less likely to check all schema facets –Mutating facets can lead to very effective tests Introduction to Software Testing, edition 2 (Ch 9)
51
© Ammann & Offutt 51 Mutating XML Schemas Original Schema (Partial) Mutants : value = “3” value = “1” Mutants : value = “100” value = “2000” XML from Original Schema 0-201-74095-8 37.95 2002 Mutant XML 1 0-201-74095-8 37.95 2002 505 Mutant XML 2 0-201-74095-8 37.95 2002 5 Mutant XML 3 0-201-74095-8 37.95 2002 99.00 Mutant XML 4 0-201-74095-8 37.95 2002 1500.00 Introduction to Software Testing, edition 2 (Ch 9)
52
Input Testing Exercise © Jeff Offutt 52 https://cs.gmu.edu:8443/offutt/servlet/calculate Use mutation to test the input space of the following web app : 1.Analyze the inputs and write a grammar to describe the allowable inputs You can write the grammar in BNF, XML schema, or whatever you feel most comfortable 2. Generate tests by mutating the grammar 3. Run the tests Please work with 1 or 2 partners
53
Input Testing Exercise—BNF © Jeff Offutt 53 Input ::= action action ::= actL | actR | “Reset” actL ::= LHS RHS BTN | LHS RHS Result BTN actR ::= NM “Compute Length” | NM Length “Compute Length” LHS ::= digit* RHS ::= digit* BTN ::= “Add” | “Subtract” | “Multiply” | “Divide” NM ::= char+ digit ::= “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” char ::= digit | “a” | “b” | “c” | …. Many possible answers—here is one :
54
OUTLINE © Jeff Offutt 54 1.Our backgrounds 2.Coverage criteria overview 3.Mutation analysis overview 4.Mutation for source code 5.Mutation for input space grammars 6.Open research problems
55
Papers 2014-2016—by Venue © Jeff Offutt 55 VenueCount Conferences Mutation workshops34 ICST5 ICSE3 ISSTA3 Journals STVR8 TSE3 EmSE1
56
Papers 2014-2016—by Topic © Jeff Offutt 56 #TopicCount 1 Languages & applications24 2Minimal mutation8 3Equivalent mutant detection4 4Automatic program repair3 5Automatic test data generation3 6Process and applicability3 7Tools3 8Experimental process1 9Higher-order mutation1 1010Other7
57
Mutation Application Papers Languages SQL (3) Simulink (2) AOP (2) WS-BPEL Javascript Python Haskell ATL © Jeff Offutt 57 Problems Model-based testing (6) Security Web apps Mobile apps Memory faults GUIs Memory faults
58
Mutation Research Problems Mutation is expensive ! –Lots of computation solutions are available—not a major problem –Human issues abound Too many (redundant) mutants Equivalent mutants Test data generation Test oracle generation Practitioners are not convinced the RoI is positive Lack of professional quality tools © Jeff Offutt 58 Key problem : Why no industry adoption ?
59
Topic Intro Minimal Mutation © Jeff Offutt 59 Typical numbers : 50 LOC, 1000 mutants, 100 equivalent, 15 tests Each test kills dozens of mutants Hundreds of redundant mutants Selective Random sampling Mutant subsumption Minimal mutation
60
Topic Intro Minimal Mutation © Jeff Offutt 60 Mutant A subsumes mutant B if every test that kills A also kills B B is therefore redundant Static analysis Dynamic analysis (symbolic execution) Early research shows that 90%—99% of mutants are redundant ! ????
61
Topic Intro Equivalence Detection © Jeff Offutt 61 Generally undecidable Published approximation algorithms have detected over 50% Compiler optimizations Constraint contradiction Program slicing ????
62
Topic Intro Automatic Program Repair © Jeff Offutt 62 Given a program P and failing tests T : 1.Locate the fault 2.Correct the fault Success means that P is changed to P’ and all tests in T now pass Corrections are called “patches” Patches are often based on mutants What are the most useful mutation operators ?
63
Topic Intro Higher Order Mutants © Jeff Offutt 63 Mutants are usually one change : One operator applied to one location Coupling Effect : “complex” mutants are coupled to “simple” mutants such that tests that kill simple mutants will usually kill complex mutants Researchers are exploring the application of multiple operators that may have interesting properties Equivalent mutant detection ? Testing for special properties ? ????
64
Open Research Discussion © Jeff Offutt 64 What research topics sound interesting to you ? What do you think we need for practical adoption of mutation ? What else do you want to know about mutation analysis ? Theory ? Application ? Process ?
65
Summary © Jeff Offutt 65 Mutation analysis is currently a very active research area A lot of theory to learn and literature to read Many open problems Potential for taking mutation to practice The best software engineering researchers solve real problems so real engineers can make real software better https://cs.gmu.edu/~offutt/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.