Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson.

Slides:



Advertisements
Similar presentations
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Advertisements

Test-Driven Development and Refactoring CPSC 315 – Programming Studio.
Introduction to Software Testing Chapter 5.5 Input Space Grammars Paul Ammann & Jeff Offutt
Introduction to Software Testing Chapter 9.5 Input Space Grammars Paul Ammann & Jeff Offutt
Software testing.
CSE 331 SOFTWARE DESIGN & IMPLEMENTATION TESTING II Autumn 2011.
1 Software Testing and Quality Assurance Lecture 21 – Class Testing Basics (Chapter 5, A Practical Guide to Testing Object- Oriented Software)
A CONTROL INSTRUMENTS COMPANY The Effectiveness of T-way Test Data Generation or Data Driven Testing Michael Ellims.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
1 Today Another approach to “coverage” Cover “everything” – within a well-defined, feasible limit Bounded Exhaustive Testing.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Testing an individual module
Class Testing Software Engineering of Standalone Programs University of Colorado, Boulder.
Outline Types of errors Component Testing Testing Strategy
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.
Test Design Techniques
Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
Upstream Prerequisites
Introduction to Software Testing Chapter 5.5 Input Space Grammars Paul Ammann & Jeff Offutt
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
Verification and Validation Yonsei University 2 nd Semester, 2014 Sanghyun Park.
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
Software Testing. Definition To test a program is to try to make it fail.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Class Specification Implementation Graph By: Njume Njinimbam Chi-Chang Sun.
Introduction Telerik Software Academy Software Quality Assurance.
Chapter 8 – Software Testing Lecture 1 1Chapter 8 Software testing The bearing of a child takes nine months, no matter how many women are assigned. Many.
Dynamic Analysis of Algebraic Structure to Optimize Test Generation and Test Case Selection Anthony J H Simons and Wenwen Zhao.
Testing Workflow In the Unified Process and Agile/Scrum processes.
Feedback-Based Specification, Coding and Testing… …with JWalk Anthony J H Simons, Neil Griffiths and Christopher D Thomson.
Department of CS and Mathematics, University of Pitesti State-based Testing is Functional Testing ! Florentin Ipate, Raluca Lefticaru University of Pitesti,
Verification and Validation in the Context of Domain-Specific Modelling Janne Merilinna.
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
Test Drivers and Stubs More Unit Testing Test Drivers and Stubs CEN 5076 Class 11 – 11/14.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Week 14 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.
Directed Random Testing Evaluation. FDRT evaluation: high-level – Evaluate coverage and error-detection ability large, real, and stable libraries tot.
Using UML, Patterns, and Java Object-Oriented Software Engineering Chapter 11, Testing.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
Lazy Systematic Unit Testing for Java Anthony J H Simons Christopher D Thomson.
New Random Test Strategies for Automated Discovery of Faults & Fault Domains Mian Asbat Ahmad
ANU COMP2110 Software Design in 2003 Lecture 10Slide 1 COMP2110 Software Design in 2004 Lecture 12 Documenting Detailed Design How to write down detailed.
08120: Programming 2: SoftwareTesting and Debugging Dr Mike Brayshaw.
Testing Data Structures Tao Xie Visiting Professor, Peking University Associate Professor, North Carolina State University
Improving Structural Testing of Object-Oriented Programs via Integrating Evolutionary Testing and Symbolic Execution Kobi Inkumsah Tao Xie Dept. of Computer.
Today’s Topics O-Notation Testing/Debugging Data Structures Next Class: Writing correct programs:Column 4.
Dynamic Testing.
Mutation Testing Breaking the application to test it.
Testing JUnit Testing. Testing Testing can mean many different things It certainly includes running a completed program with various inputs It also includes.
Random Test Generation of Unit Tests: Randoop Experience
Software Testing Sudipto Ghosh CS 406 Fall 99 November 23, 1999.
Week 6 MondayTuesdayWednesdayThursdayFriday Testing III Reading due Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.
CS223: Software Engineering Lecture 21: Unit Testing Metric.
Introduction to Software Testing Model-Driven Test Design and Coverage testing Paul Ammann & Jeff Offutt Update.
Introduction to Software Testing (2nd edition) Chapter 5 Criteria-Based Test Design Paul Ammann & Jeff Offutt
Laurea Triennale in Informatica – Corso di Ingegneria del Software I – A.A. 2006/2007 Andrea Polini XVIII. Software Testing.
Testing Tutorial 7.
Testing and Debugging PPT By :Dr. R. Mall.
CS5123 Software Validation and Quality Assurance
Eclat: Automatic Generation and Classification of Test Inputs
Testing, conclusion Based on material by Michael Ernst, University of Washington.
Test Case Purification for Improving Fault Localization
CSE 1020:Software Development
08120: Programming 2: SoftwareTesting and Debugging
Presentation transcript:

Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson

Overview Measuring testing? The Behavioural Response Measuring six test cases Evaluation of JUnit tests Evaluation of JWalk tests

Analogy: Metrics and Testing Things easy to measure (but why?) –metrics: MIT O-O metrics (Chidamber & Kemmerer) –testing: decision-, path-, whatever-coverage –testing: count exceptions, reduce test-set size Properties you really want (but how?) –metrics: Goal, Question, Metric (Basili et al.) –testing: e.g. mutant killing index –testing: effectiveness and efficiency?

Measuring Testing? Most approaches measure testing effort, rather than test effectiveness!

Degrees of Correctness Suppose an ideal test set –BR : behavioural response (set) –T : tests to be evaluated (bag – duplicates?) –T E = BR  T : effective tests (set) –T R = T – T E : redundant tests (bag) Define test metrics –Ef(T) = (|T E | – |T R |) / |BR| : effectiveness –Ad(T) = |T E | / |BR| : adequacy

Ideal Test Set? The ideal test set must verify each distinct response of an object!

What is a Response? Input response –Account.withdraw(int amount) : 3 partitions amount < 0  fail precondition, exception amount > balance  refuse, no change amount <= balance  succeed, debit State response –Stack.pop() : 2 states isEmpty()  fail precondition, exception ! isEmpty()  succeed

Behavioural Response – 1 Input response –c.f. exemplars of equivalence partitions –max responses per method, over all states State response –c.f. state cover, to reach all states –max state-contingent responses, over all methods Behavioural Response –product of input and state response –checks all argument partitions in all states –c.f. transition cover augmented by exemplars

Behavioural Response – 2 Parametric form: BR(x, y) –stronger ideal sets, for higher x, y x = length of sequences from each state y = number of exemplars for each partition Redundant states –higher x rules out faults hiding in duplicated states Boundary values –higher y verifies equivalence partition boundaries Useful measure –precise quantification of what has been tested –repeatable guarantees of quality after testing

Compare Testing Methods JWalk – “Lazy systematic unit testing method” JUnit – “Expert manual unit testing method”

JUnit – Beck, Gamma “Automates testing” –manual test authoring (as good as human expertise) –may focus on positive, miss negative test cases –saved tests automatically re-executed on demand –regression style may mask hard interleaved cases Test harness –bias: test method “testX” for each method “X” –each “testX” contains n assertions = n test cases –same assertions appear redundantly in “testY”, “testZ”

JWalk – Simons Lazy specification –static analysis of compiled code –dynamic analysis of state model –adapts to change, revises the state model Systematic testing –bounded exhaustive state-based exploration –may not generate exemplars for all input partitions –semi-automatic oracle construction (confirm key values) –learns test equivalence classes (predictive testing) –adapts existing oracles, superclass oracles

Six Test Cases Stack1 – simple linked stack Stack2 – bounded array stack –change of implementation Book1 – simple loanable book Book2 – also with reservations –extension by inheritance Account1 – with deposit/withdraw Account2 – with preconditions –refinement of specification

Instructions to Testers Test each response for each class, similar to the transition cover, but with all equivalence partitions for method inputs

Behavioural Response Test ClassAPIInput RState RBR(1,1) Stack Stack Book Book Account Account ideal test target

JUnit – Expert Testing Test ClassTTETE TRTR Ad(T)Ef(T)time Stack Stack Book Book Account Account massive generation still not effective

JWalk – Test Generation Test ClassTTETE TRTR Ad(T)Ef(T)time Stack Stack Book Book Account Account no wasted tests missed 5 inputs

Comparisons JUnit: expert manual testing –massive over-generation of tests (w.r.t. goal) –sometimes adequate, but not effective –stronger (t2, t3); duplicated; and missed tests –hopelessly inefficient – also debugging test suites! JWalk: lazy systematic testing –near-ideal coverage, adequate and effective –a few input partitions missed (simple generation strategy) –very efficient use of the tester’s time – sec. not min. –or: two orders (x 1000) more tests, for same effort

Conclusion Behavioural Response –seems like a useful benchmark (scalable, flexible) –use with formal, semi-formal, informal design methods –measures effectiveness, rather than effort Moral for testing –don’t hype up automatic test (re-)execution –need systematic test generation tools –automate the parts that humans get wrong!

Any Questions? Put me to the test!