Lecture 7 Advanced Topics in Testing. Mutation Testing Mutation testing concerns evaluating test suites for their inherent quality, i.e. ability to reveal.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Sufficient Mutant Operators Offutt, Rothermel, Lee, Untch, and Zapf TOSEM, April 1996 Slide text copyright Robert W. Hasker, 2006 Material (and figures)
Mutation Testing. Mutation testing is a fault-based testing technique. It has been empirically and theoretically validated that a program will be well.
An Introduction to the Model Verifier verds Wenhui Zhang September 15 th, 2010.
An Analysis and Survey of the Development of Mutation Testing by Yue Jia and Mark Harmon A Quick Summary For SWE6673.
David Evans CS655: Programming Languages University of Virginia Computer Science Lecture 19: Minding Ps & Qs: Axiomatic.
ISBN Chapter 3 Describing Syntax and Semantics.
Model-based Testing and Automated Test Case Generation
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Chapter 4 Language Fundamentals. 2 Identifiers Program parts such as packages, classes, and class members have names, which are formally known as identifiers.
(c) 2007 Mauro Pezzè & Michal Young Ch 16, slide 1 Fault-Based Testing.
Describing Syntax and Semantics
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Cormac Flanagan University of California, Santa Cruz Hybrid Type Checking.
System/Software Testing
Introduction to Software Testing Chapter 5.2 Program-based Grammars Paul Ammann & Jeff Offutt
Kyle Mundt February 3,  Richard Lipton, 1971  A way of testing your tests  Alter your code in various ways  Check to see if tests fail on altered.
CSC3315 (Spring 2009)1 CSC 3315 Programming Languages Hamid Harroud School of Science and Engineering, Akhawayn University
1 Abstraction  Identify important aspects and ignore the details  Permeates software development programming languages are abstractions built on hardware.
© SERG Dependable Software Systems (Mutation) Dependable Software Systems Topics in Mutation Testing and Program Perturbation Material drawn from [Offutt.
DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.
Features of Object Oriented Programming Lec.4. ABSTRACTION AND ENCAPSULATION Computer programs can be very complex, perhaps the most complicated artifact.
What is Mutation Testing ? The premise in mutation testing is that small changes are made in a module and then the original and mutant modules are compared.
Software testing techniques Software testing techniques Mutation testing Presentation on the seminar Kaunas University of Technology.
More About Classes Ranga Rodrigo. Information hiding. Copying objects.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
Testing Testing Techniques to Design Tests. Testing:Example Problem: Find a mode and its frequency given an ordered list (array) of with one or more integer.
Introduction to Software Testing Chapter 9.3 Integration and Object- Oriented Testing Paul Ammann & Jeff Offutt
Syntax-Based Testing Mingzhe Du Hongying Du Dharani Chintapatla.
Introduction to Software Testing Chapter 5.3 Integration and Object-Oriented Testing Paul Ammann & Jeff Offutt
Reading and Writing Mathematical Proofs Spring 2015 Lecture 4: Beyond Basic Induction.
MT311 Java Application Development and Programming Languages Li Tak Sing( 李德成 )
Mutation Testing G. Rothermel. Fault-Based Testing White-box and black-box testing techniques use coverage of code or requirements as a “proxy” for designing.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Introduction to Software Testing Chapter 5.3 Integration and Object-Oriented Testing Paul Ammann & Jeff Offutt
Introduction to Software Testing Chapter 5.3 Integration and Object-Oriented Testing Paul Ammann & Jeff Offutt
CSE Winter 2008 Introduction to Program Verification January 15 tautology checking.
Chapter 10: Introduction to Inheritance. Objectives Learn about the concept of inheritance Extend classes Override superclass methods Call constructors.
Testing OO software. State Based Testing State machine: implementation-independent specification (model) of the dynamic behaviour of the system State:
COP3502 Programming Fundamentals for CIS Majors 1 Instructor: Parisa Rashidi.
Software Testing Part II March, Fault-based Testing Methodology (white-box) 2 Mutation Testing.
Software Testing Syntax-based Testing. Syntax Coverage Four Structures for Modeling Software Graphs Logic Input Space Syntax Use cases Specs Design Source.
Introduction to Software Testing Chapter 9.2 Program-based Grammars Paul Ammann & Jeff Offutt
 In the java programming language, a keyword is one of 50 reserved words which have a predefined meaning in the language; because of this,
Week 5-6 MondayTuesdayWednesdayThursdayFriday Testing III No reading Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
CSSE501 Object-Oriented Development. Chapter 10: Subclasses and Subtypes  In this chapter we will explore the relationships between the two concepts.
CS104:Discrete Structures Chapter 2: Proof Techniques.
CS 330 Programming Languages 10 / 23 / 2007 Instructor: Michael Eckmann.
1 C# - Inheritance and Polymorphism. 2 1.Inheritance 2.Implementing Inheritance in C# 3.Constructor calls in Inheritance 4.Protected Access Modifier 5.The.
Terms and Rules II Professor Evan Korth New York University (All rights reserved)
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
Mutation Testing Breaking the application to test it.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 23, 1999.
Foundations of Software Testing Chapter 7: Test Adequacy Measurement and Enhancement Using Mutation Last update: September 3, 2007 These slides are copyrighted.
COP INTERMEDIATE JAVA Inheritance, Polymorphism, Interfaces.
CSE 501N Fall ‘09 03: Class Members 03 September 2009 Nick Leidenfrost.
MUTACINIS TESTAVIMAS Benediktas Knispelis, IFM-2/2 Mutation testing.
Foundations of Software Testing Chapter 7: Test Adequacy Measurement and Enhancement Using Mutation Last update: September 3, 2007 These slides are copyrighted.
Mutation Testing Laraib Zahid & Mariam Arshad. What is Mutation Testing?  Fault-based Testing: directed towards “typical” faults that could occur in.
Software Testing and Quality Assurance Syntax-Based Testing (2) 1.
Introduction to Software Testing Chapter 9.2 Program-based Grammars
Mutation Testing Moonzoo Kim School of Computing KAIST
Mutation testing Julius Purvinis IFM-0/2.
Introduction to Software Testing Chapter 9.2 Program-based Grammars
Introduction to Software Testing Chapter 5.2 Program-based Grammars
Software Testing Syntax-based Testing.
Mutation Testing Moonzoo Kim School of Computing KAIST
An Analysis of OO Mutation Operators Jingyu Hu, Nan Li, and Jeff Offutt Presented by Nan Li 03/24/2011.
Mutation Testing Faults are introduced into the program by creating many versions of the program called mutants. Each mutant contains a single fault. Test.
Presentation transcript:

Lecture 7 Advanced Topics in Testing

Mutation Testing Mutation testing concerns evaluating test suites for their inherent quality, i.e. ability to reveal errors. Need an objective method to determine quality Differs from structural coverage since it tries to define “what is an error” Basic idea is to inject defined errors into the SUT and evaluate whether a given test suite finds them. “Killing a mutation”

Basic Idea We can statistically estimate (say) fish in a lake by releasing a number of marked fish, and then counting the number of marked fish in a subsequent small catch. Example: release 20 fish Catch 40 fish, 4 marked. Then 1 in 10 is marked, so estimate 200 fish Pursue the same idea with marked SW bugs?

Mutations and Mutants The “marked fish” are injected errors, termed mutations The mutated code is termed a mutant Example: replace in a one Boolean expression if ( x 0 ) then … If test suite finds mutation we say this particular mutant is killed Make large set of mutants – typically using a checklist of known mutations - mutation tool

Mutation score Idea is that if we can kill a mutant we can identify a real bug too Mutants which are semantically equivalent to the original code are called equivalents Write Q  P if Q and P are equivalents Clearly cannot kill equivalents Mutation score % = Number of killed mutants total number of non-equivalent mutants *100

Why should it work? Two assumptions are used in this field Competent programmer hypothesis i.e. “The program is mostly correct” and the Coupling Effect

Semantic Neighbourhoods Let Φ be the set of all programs semantically close to P (defined in various ways) Φ is neighbourhood of P Let T be a test suite, f:D  D be a functional spec of P Traditionally assume  t  T P.x = f(x)   x  D P.x = f(x) i.e. T is a reliable test suite Requires exhaustive testing

Competent Programmer Hypothesis P is pathological iff P  Φ Assume programmers have some competence Mutation testing assumption Either P is pathological or else  t  T P.x = f(x)   x  D P.x = f(x) Can now focus on building a test suite T that would distinguish P from all other programs in Φ

Coupling Effect The competent programmer hypothesis limits the problem from infinite to finite. But remaining problem is still too large Coupling effect says that there is a small subset μ  Φ such that: We only need to distinguish P from all programs in μ by tests in T

Problems Can we be sure the coupling effect holds? Do simple syntactic changes define a set? Can we detect and count equivalents? If we can’t kill a mutant Q is Q  P or is Q just hard to kill? How large is μ? May still be too large to be practical?

Equivalent Mutants Offut and Pan [1997] estimated 9% of all mutants equivalent Bybro [2003] concurs with 8% Automatic detection algorithms (basically static analysers) detect about 50% of these Use theorem proving (verification) techniques

Coupling Effect For Q an incorrect version of P Semantic error size = Pr  Q.x  P.x  If for every semantically large fault there is an overlap with at least one small syntactic fault then the coupling effect holds. Selective mutation based on a small set of semantically small errors - “Hard to kill”

Early Research: 22 Standard (Fortran) mutation operators AAR Array reference for array reference replacement ABS Absolute value insertion ACR Array reference for constant replacement AOR Arithmetic operator replacement ASR Array reference for scalar replacement CAR Constant for array reference replacement CNR Comparable array name replacement CRP Constants replacement CSR Constant for Scalar variable replacement DER Do statement End replacement DSA Data statement alterations GLR Goto label replacement LCR Logical connector replacement ROR Relational operator replacement RSR Return statement replacement SAN Statement analysis SAR Scalar for array replacement SCR Scalar for constant replacement SDL Statement deletion SRC Source constant replacement SVR Scalar variable replacement UOI Unary operator insertion

Recent Research: Java Mutation Operators First letter is category A = access control E = common programming mistakes I = inheritance J = Java-specific features O = method overloading P = polymorphism

AMC Access modifier change EAM Accessor method change EMM Modifier method change EOA Reference assignment and content assignment replacement EOC Reference comparison and content comparison replacement IHD Hiding variable deletion IHI Hiding variable insertion IOD Overriding method deletion IOP Overriding method calling position change IOR Overridden method rename IPC Explicit call of parent’s constructor deletion ISK super keyword deletion JDC Java supported default constructor create

JID Member variable initialisation deletion JSC static modifier change JTD this keyword deletion OAO Argument order change OAN Argument number change OMD Overloaded method deletion OMR Overloaded method contents change PMD Instance variable deletion with parent class type PNC new method call with child class type PPD Parameter variable declaration with child class type PRV Reference asssignment with other compatible type

Practical Example Triangle program Myers “complete” test suite (13 test cases) Bybro [2003] Java mutation tool and code 88% mutation score, 96% statement coverage

Status of Mutation Testing Various strategies: weak mutation, interface mutation, specification-based mutation Our version is called strong mutation Many mutation tools available on the internet Cost of generating mutants and detecting equivalents has come down Not yet widely used in industry Still considered “academic”, not understood?

Learning-based Testing 1.Specification-based Black-box Testing 2.Learning-based Testing paradigm (LBT) -connections between learning and testing -testing as a search problem -testing as an identification problem -testing as a parameter inference problem 3.Example frameworks: 1.Procedural systems 2.Boolean reactive systems

Specification-based Black-box Testing 1.System requirement (Sys-Req) 2.System under Test (SUT ) 3.Test verdict pass/fail (Oracle step) SUT TCG Oracle Sys-Reqpass/fail Test case Output Constraint solverConstraint checker Language runtime

Procedural System Example: Newton’s Square root algorithm SUT TCG Oracle precondition x ≥ 0.0 Postcondition Ι y*y – x Ι ≤ ε InputOutput x=4.0 satisfies x ≥ 0.0Verdict x=4.0, y=2.0 satisfies Ι y*y – x Ι ≤ ε Newton Code x=4.0y=2.0 Constraint solver Constraint checker

Reactive System Example: Coffee Machine SUT TCG Oracle InputOutput Coffee machine In 0 := $1out 11 := coffee Sys-Req: always( in=$1 implies after(10, out=coffee) ) Constraint solver pass/fail Constraint checker in 0 := $1, out 11 := coffee Satisfies always( in=1$ implies after(10, out=coffee))

Key Problem: Feedback Problem: How to modify this architecture to.. 1.Improve next test case using previous test outcomes 2.Execute a large number of good quality tests? 3.Obtain good coverage? 4.Find bugs quickly?

SUT TCG Oracle InputOutput Verdict Learning-Based Testing “Model based testing without a model” Sys-Req pass/fail Learner Sys-Model

Basic Idea … LBT is a search heuristic that: 1.Incrementally learns an SUT model 2.Uses generalisation to predict bugs 3.Uses best prediction as next test case 4.Refines model according to test outcome

Abstract LBT Algorithm 1.Use (i 1, o 1 ), …, (i k, o k ) to learn model M k 2.Model check M k against Sys-Req 3.Choose “best counterexample” i k+1 from step 2 4.Execute i k+1 on SUT to produce o k+1 5.Check if (i k+1, o k+1 ) satisfies Sys-Req a)Yes: terminate with i k+1 as a bug b)No: goto step 1 Difficulties lie in the technical details …

General Problems Difficulty is to find combinations of models, requirements languages and checking algorithms (M, L, A) so that … 1.models M are: -expressive, -compact, -partial and/or local (an abstraction method) -easy to manipulate and learn 2.M and L are feasible to model check with A

Incremental Learning Real systems are too large to be completely learned Complete learning is not necessary to test many requirements (e.g. use cases) We use incremental (on-the-fly) learning – Generate a sequence of refined models M 0  M 1  …  M i  – Convergence in the limit

Example: Boolean reactive systems 1.SUT: reactive systems 2.Model: deterministic Kripke structure 3.Requirements: propositional linear temporal logic (PLTL) 4.Learning: IKL incremental learning algorithm 5.Model Checker: NuSMV

LBT Architecture

A Case Study: Elevator Model

Elevator Results Reqt first (sec) t total (sec) MCQ first MCQ tot PQ first PQ tot RQ first RQ tot Req Req Req Req Req Req

Conclusions A promising approach … Flexible general heuristic, many models and requirement languages seem possible Many SUT types might be testable procedural, reactive, real-time etc. Open Questions Benchmarking? Scalability? (abstraction, infinite state?) Efficiency? (model checking and learning?)