Lazy Systematic Unit Testing for Java Anthony J H Simons Christopher D Thomson.

Lazy Systematic Unit Testing for Java Anthony J H Simons Christopher D Thomson

Overview Lazy Systematic Unit Testing  testing concepts and methodology The JWalk Tester tool  flagship of the JWalk 1.0 toolset Dynamic analysis and pruning  smart interactive generation and evaluation Oracle building and test prediction  building a test oracle with minimal user interaction Head-to-head evaluation  a testing contest: JWalk versus JUnit http://www.dcs.shef.ac.uk/~ajhs/jwalk/

Motivation State of the art in agile testing  Test-driven development is good, but…  …no specification to inform the selection of tests  …manual test-sets are fallible (missing, redundant cases)  Can we do better in test-case selection? Regression testing : a touchstone?  No specifications in XP, so use saved tests instead, which become guarantors of correct behaviour  Article of faith – passing saved tests guarantees no faults introduced in the modified unit  Actually no, state partitions cause geometric decrease in effective state coverage (Simons, 2005)

Regression Testing Model Base object proven correct by basic test set Derived object refines Base object in some way Basic test set used to test regression in Derived object Passing regression tests proves that Derived conforms to Base But this is an unreliable assumption! Base Derived Btest proves refines Test assumption: retesting “proves” compatible behaviour conforms

Coverage of Base Discharged ¬ isOnLoan() issue(a) Issued isOnLoan() discharge() borrower() / errorborrower() / OK isOnLoan() Normal new() all pairs validated state T2 = C  (L0  L1  L2)  P Reach every state and validate every transition pair

Coverage of Derived Discharged ¬ isOnLoan() new() issue(a) Issued isOnLoan() discharge() borrower() / error borrower() / OK isOnLoan() Normal OnShelf ¬ reserved() OnLoan ¬ reserved() PutAside reserved() Recalled reserved() discharge() issue(a) reserved() reserve(b) cancel() some pairs reached state Reusing the same T2 test-set

Test Regeneration Model Base Derived Btest Dtest transitively conforms proves refines Only base object proven correct by basic test set Derived object requires all- new tests, regenerated from derived specification Derived object conforms to derived spec. by testing Derived spec. conforms to base spec. by verification Derived object conforms transitively to base spec. New idea: conformity proven by both verification and testing

The Conundrum Regression testing is too weak  saved tests don’t exercise the refined model  manual extra tests don’t cover all path combinations  regression guarantee is progressively weakened Test regeneration is more reliable  all-new tests generated from a refined specification  automatically generated tests cover all path combinations  there is a guarantee of repeatable test quality (for T k ) How to replicate for agile methods?  No up-front specification from which to generate tests  The only artifact is the evolving code, which changes  Can we make any use of this?

Lazy Systematic Unit Testing Lazy Specification  late inference of a specification from evolving code  semi-automatic, by static and dynamic analysis of code with limited user interaction  specification evolves in step with modified code Systematic Testing  bounded exhaustive testing, up to the specification  emphasis on completeness, conformance, correctness properties after testing, repeatable test quality http://en.wikipedia.org/wiki/Lazy_systematic_unit_testing

JWalk Tester Lazy systematic unit testing for Java  static analysis - extracts the public API of a compiled Java class  protocol walk (all paths) – explores, validates all interleaved methods to a given path depth  algebra walk (memory states) – explores, validates all observations on all mutator-method sequences  state walk (high-level states) – explores, validates n-switch transition cover for all high-level states http://www.dcs.shef.ac.uk/~ajhs/jwalk/ Try me

Example: Stack Analysis of the API (protocol, algebra) Test reports for each test cycle Test statistics and summary report

Load the Test Class Choose a location  the working directory  the root of a package is its parent directory Choose a test class  browse for the test class within a directory  browse for a package- qualified class within a package Shortcut  type the (qualified) test class name directly

Pick Settings and Go Strategy  protocol: all methods  algebra: all constructions  states: all states and transitions Modality  inspect: the interface  explore: exercise paths  validate: against oracle Test depth  maximum path length Start testing  click on the JWalker to run a test series

Protocol Inspection Protocol analysis  static analysis of the public API of test class  includes all inherited public methods  may/not include standard Object methods  specify this through the custom settings

Algebraic Inspection Algebraic analysis  dynamic analysis of algebraic categories  primitive, transformer and observer operations Technique  compares concrete object states  identifies unchanged, or re-entrant states  controlled by probe-depth and state-depth (custom)

State Inspection State Analysis  dynamic analysis of high-level states  automatically names discovered states  computes state cover Technique  based on public state predicate methods  seeks the boolean state product (fails gracefully)  controlled by probe-depth

Baseline Approaches Breadth-first generation  all constructors and all interleaved methods (eg JCrasher, DSD-Crasher, Jov)  generate-and-filter (eg Rostra, Java Pathfinder) by state equivalence class Computational cost  exponential growth, memory issues, wasteful over- generation, even if filtering is later applied  #paths = Σ c.m k, for k = 0..n Key: c = #constructors, m = #methods, k = depth

Dynamic Pruning Interleaved analysis  generate-and-evaluate, pruning active paths on the fly (eg JWalk, Randoop)  remove redundant prefix paths after each test cycle, don’t bother to expand in next cycle Increasing sophistication  prune prefix paths ending in exceptions (fail again)  JWalk, Randoop (2007)  and prefixes ending in algebraic observers (unchanged)  JWalk 0.8 (2007)  and prefixes ending in algebraic transformers (reentrant)  JWalk 1.0 (2009)

Protocol Exploration Protocol strategy  explores all interleaved methods by brute force  explores all paths up to length n (test depth)  repeats invocations of the same method Pruning  paths raising exceptions in test cycle i  are not extended in test cycle i+1

Baseline new push top pop push top pop push top pop push top pop Key:novel state exception top poptop pop top pop top poptop pop push Brute-force, breadth- first exploration push top push top push top pop top pop

Prune Exceptions… new push top pop push top pop push top pop push top pop Key:novel state exception top poptop pop top pop top pop push Prune error-prefixes (JWalk0.8, Randoop) top pop

Algebraic Exploration Algebraic strategy  explores all algebraic constructions  grows paths using only primitive operations  observes paths ending in any kind of operation Pruning  prunes paths ending in exceptions (next cycle)  also with re-entrant or unchanged states

Prune Observers new push top pop push top pop push top pop push top pop Key:novel state exception unchanged state push top pop Prune error- and observer-prefixes (JWalk0.8) pop top

…Transformers new push top pop push top pop push top pop top pop Key:novel state exception unchanged state reentrant state Prune error-, observer- and transformer-prefixes (JWalk1.0)

State Exploration State strategy  reaches every high-level state  explores all transition paths up to length n, from each state  has n-switch coverage Pruning  grows only primitive paths to reach all states  prunes paths ending in exceptions (next cycle)

Exploration Summary Test settings  test class, strategy, modality, depth Exploration summary  # executed in total  # discarded (pruned)  # exercised (normal)  # terminated (exception) Technique  calculates discarded from theoretical max paths

The Same State? Some earlier approaches  distinguish observers, mutators by signature (Rostra)  intrusive state equality predicate methods (ASTOOT)  external (partial) state equality predicates (Rostra)  subsumption of execution traces in JVM (Pathfinder) Some algebraic approaches  shallow, deep equality under all observers (TACCLE)  but assumes observations are also comparable  very costly to compute from first principles  serialise object states and hash (Henkel & Diwan)  but not all objects are serialisable  no control over depth of comparison

State Comparison Reflection-and-hash  extract state vector from objects  compute hash code for each field  order-sensitive combination hash code Proper depth control  shallow or deep equality settings, to chosen depth  hash on pointer, or recursively invoke algorithm Fast state comparison  each test evaluation stores posterior state code  fast comparison with preceding, or all prior states  possible to detect unchanged, or reentrant states

Pruning: Stack Stackbaselineexcept.observ.transf. 01111 17777 2433113 32591392519 415556674325 5933133917931 Pruned: 9,300 redundant paths Retained: 31 significant paths (best 0.33%) Table 1: Cumulative paths explored after each test cycle

Pruning: Reservable Book ResBookbaselineexcept.observ.transf. 01111 19999 273 25 35855614933 4468141859741 537449memex16941 Pruned: 37,408 redundant paths Retained: 41 significant paths (best 0.12%) Table 2: Cumulative paths explored after each test cycle

Validation Modality Lazy specification  interacts with tester to confirm key results  uses predictive rules to infer further results  stores key results in reusable test oracle Technique  key results found at the leaves of the algebra tree  apply predictions to other test strategies Tester accepts or rejects outcome

Test Result Prediction Semi-automatic validation  the user confirms or rejects key results  these constitute a test oracle, used in prediction  eventually > 90% test outcomes predicted JWalk test result prediction rules  eg: predict repeat failure  new().pop().push(e) == new().pop()  eg: predict same state  target.size().push(e) == target.push(e)  eg: predict same result  target.push(e).pop().size() == target.size() Try me

Kinds of Prediction Strong prediction  From known results, guarantee further outcomes in the same equivalence class  eg: observer prefixes empirically checked before making any inference, unchanged state is guaranteed  target.push(e).size().top() == target.push(e).top() Weak prediction  From known facts, guess further outcomes; an incorrect guess will be revealed in the next cycle  eg: methods with void type usually return no result, but may raise an exception  target.pop() predicted to have no result  target.pop().size() == -1 reveals an error

Algebraic Validation Algebraic testing  grows all primitive paths ending in all operations  solicits results for leaves of the algebra tree  best mode in which to create an oracle Prediction  predicts void-results  predicts results saved in previous test cycles Oracle predicts a correct outcome Tester confirms an outcome

Protocol Validation Protocol Testing  create oracle first using the algebra-strategy  then apply same oracle in the protocol-strategy  most results predicted! Prediction  (chains of) observers don’t affect states  re-entrant methods return to earlier states Oracle predicts many outcomes

State Validation State testing  extends oracle created for the algebra-strategy  can validate 1000’s of transition paths  for a mere few 10’s of user confirmations Prediction  all results for “nearby” states predicted  needs confirmations for more “remote” states Oracle predicts many outcomes

Validation Summary Test summary  other statistics as before Validation summary  # passed (in total)  # failed (in total)  # confirmed (by user)  # rejected (by user)  # correct (by oracle)  # incorrect (by oracle) 10x automated vs manual checks

Amortized Interaction Costs  number of new confirmations, amortized over 6 test cycles  con = manual confirmations, > 25 test cases/minute  pre = JWalk’s predictions, eventually > 90% of test cases Test classa1a2a3s1s2s3 LibBk con357005 LibBk pre2818 38133 ResBk con3145601183 ResBk pre62789362411649 eg: algebra-test to depth 2, 14 new confirmations eg: state-test to depth 2, 241 predicted results

Feedback-based Methodology Coding  The programmer prototypes a Java class in an editor Exploration  JWalk systematically explores method paths, providing useful instant feedback to the programmer Specification  JWalk infers a specification, building a test oracle based on key test results confirmed by the programmer Validation  JWalk tests the class to bounded exhaustive depths, based on confirmed and predicted test outcomes  JWalk uses state-based test generation algorithms

Example – Library Book Exploration  surprise: target.issue(“a”).issue(“b”).getBorrower() == “b”  violates business rules: fix code to raise an exception Validation  all observations on chains of issue(), discharge()  n-switch cover on states {Default, OnLoan} public class LibraryBook { private String borrower; public LibraryBook(); public void issue(String); public void discharge(); public String getBorrower(); public Boolean isOnLoan(); }

Extension – Reservable Book Exploration  only revisits novel interleaved permutations of methods  surprise: target.reserve(“a”).issue(“b”).getBorrower() == “b” Validation  all obs. on chains of issue(), discharge(), reserve(), cancel()  n-switch cover on states {Default, OnLoan, Reserved, Reserved&OnLoan} public class ReservableBook extends LibraryBook { private String requester; public ReservableBook(); public void reserve(String); public void cancel(); public String getRequester(); public Boolean isReserved(); }

Evaluation User Acceptance  programmers find JWalk habitable  they can concentrate on creative aspects (coding) while JWalk handles systematic aspects (validation, testing) Main Cost is Confirmations  not so burdensome, since amortized over many test cycles  metric: measure amortized confirmations per test cycle Comparison with JUnit  common testing objective for manual and lazy systematic testing; evaluate coverage and testing effort  Eclipse+JUnit vs. JWalkEditor: given the task of testing the “transition cover + all equivalence partitions of inputs”

Comparison with JUnit manual testing method  Manual test creation takes skill, time and effort (eg: ~20 min to develop manual cases for ReservableBook)  The programmer missed certain corner-cases  eg: target.discharge().discharge() - a nullop?  The programmer redundantly tested some properties  eg: assertTrue(target != null) - multiple times  The state coverage for LibraryBook was incomplete, due to the programmer missing hard-to-see cases  The saved tests were not reusable for ReservableBook, for which all-new tests were written to test new interleavings

Advantages of JWalk JWalk lazy systematic testing  JWalk automates test case selection - relieves the programmer of the burden of thinking up the right test cases!  Each test case is guaranteed to test a unique property  Interactive test result confirmation is very fast (eg: ~80 sec in total for 36 unique test cases in ReservableBook)  All states and transitions covered, including nullops, to the chosen depth  The test oracle created for LibraryBook formed the basis for the new oracle for ReservableBook, but…  JWalk presented only those sequences involving new methods, and all interleavings with inherited methods

Measuring the Testing? Suppose an ideal test set  BR : behavioural response (set)  T : tests to be evaluated (bag – duplicates?)  T E = BR  T : effective tests (set)  T R = T – T E : redundant tests (bag) Define test metrics  Ef(T) = (|T E | – |T R |) / |BR| : effectiveness  Ad(T) = |T E | / |BR| : adequacy

Speed and Adequacy of Testing  Test goal: transition cover + equiv. partitions of inputs  manual testing expensive, redundant and incomplete  JWalk testing very efficient, close to complete eg: wrote 104 tests, 21 were effective and 83 not! eg: JWalk achieved 100% test coverage Test classTTETE TRTR Adeqtime min.sec LibBk manual3192290%11.00 ResBk manual104218353%20.00 LibBk jwalk10 0100%0.30 ResBk jwalk36 090%0.46

Some Conclusions JUnit: expert manual testing  massive over-generation of tests (w.r.t. goal)  sometimes adequate, but not effective  stronger (t2, t3); duplicated; and missed tests  hopelessly inefficient – also debugging test suites! JWalk: lazy systematic testing  near-ideal coverage, adequate and effective  a few input partitions missed (simple generation strategy)  very efficient use of the tester’s time – sec. not min.  or: two orders (x 1000) more tests, for same effort

More Conclusions Feedback-based development  unexpected gain: automatic validation of prototype code  c.f. Alloy’s model checking from a partial specification Moral for testing  automatically executing saved tests is not so great  need systematic test generation tools to get coverage  automate the parts that humans get wrong!  let humans focus on right/wrong responses.

JWalk 1.0 Toolset JWalk Tester JWalk Utility JWalk Editor JWalk Marker JWalk Grapher JWalk SOAR

Any Questions? http://www.dcs.shef.ac.uk/~ajhs/jwalk/ Put me to the test! © Anthony Simons, 2009, with help from Chris Thomson, Neil Griffiths, Mihai Gabriel Glont, Arne-Michael Toersel

Custom Configuration Oracle directory  default is the test class directory;  pick a new location Convention  standard: exclude all of Object’s methods  custom: include some  complete: include all Probe depth  max path length for dynamic analysis State depth  tree depth for object state comparison  shallow state (inc. array values) by default

Generators The heart of JWalk  synthesise test input values on demand  try to assure even spread of inputs for a given type  by default, supply monotonic sequences of values MasterGenerator  built-in ObjectGenerator is fairly comprehensive  synthesises basic values, arrays, standard objects, etc. CustomGenerator  take control of how particular types are synthesised  provide custom generators; add to a master as delegates  eg: StringGenerator, EnumGenerator, InterfaceGenerator

Custom Generators Choose a location  default is the test class directory Choose a generator  enter generator directly  browse within package Click add/remove  add a custom generator to the list  remove a generator from the list

CustomGenerator Interface Provide a generator class with: public boolean canCreate(Class type); public Object nextValue(Class type); public void setOwner(MasterGenerator master); Key points:  advertises which types it can synthesise  generates a sequence of objects on demand  may keep a handle to its owning master  eg: InterfaceGenerator maps interface types onto concrete classes; invokes nextValue recursively (on its master).

Example: IndexGenerator public class IndexGenerator implements CustomGenerator { private int seed = 1; private boolean flag = false; public boolean canCreate(Class type) { return type == int.class; } public Object nextValue(Class type) { if (flag) { flag = false; return seed++; } else { flag = true; return seed; } } public void setOwner(MasterGenerator master) {} } Creates repeating pairs of indices Specific for the int index type Nullop: ignores the master generator

When are they Useful? IndexGenerator  generates repeating pairs of indices  exercises put/get pairs in vector, array types StdIOGenerator  redirect System.in, System.out to conventional files  test programs with IO using prepared data in files FileGenerator  take control of filenames and streams (security)  test programs using prepared data in files Arbitrary test set-up  take control of how the environment is established

Lazy Systematic Unit Testing for Java Anthony J H Simons Christopher D Thomson.

Similar presentations

Presentation on theme: "Lazy Systematic Unit Testing for Java Anthony J H Simons Christopher D Thomson."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lazy Systematic Unit Testing for Java Anthony J H Simons Christopher D Thomson.

Similar presentations

Presentation on theme: "Lazy Systematic Unit Testing for Java Anthony J H Simons Christopher D Thomson."— Presentation transcript:

Similar presentations

About project

Feedback