Download presentation
Presentation is loading. Please wait.
Published byMolly French Modified over 9 years ago
1
Lazy Systematic Unit Testing for Java Anthony J H Simons Christopher D Thomson
2
Overview Lazy Systematic Unit Testing testing concepts and methodology The JWalk Tester tool flagship of the JWalk 1.0 toolset Dynamic analysis and pruning smart interactive generation and evaluation Oracle building and test prediction building a test oracle with minimal user interaction Head-to-head evaluation a testing contest: JWalk versus JUnit http://www.dcs.shef.ac.uk/~ajhs/jwalk/
3
Motivation State of the art in agile testing Test-driven development is good, but… …no specification to inform the selection of tests …manual test-sets are fallible (missing, redundant cases) Can we do better in test-case selection? Regression testing : a touchstone? No specifications in XP, so use saved tests instead, which become guarantors of correct behaviour Article of faith – passing saved tests guarantees no faults introduced in the modified unit Actually no, state partitions cause geometric decrease in effective state coverage (Simons, 2005)
4
Regression Testing Model Base object proven correct by basic test set Derived object refines Base object in some way Basic test set used to test regression in Derived object Passing regression tests proves that Derived conforms to Base But this is an unreliable assumption! Base Derived Btest proves refines Test assumption: retesting “proves” compatible behaviour conforms
5
Coverage of Base Discharged ¬ isOnLoan() issue(a) Issued isOnLoan() discharge() borrower() / errorborrower() / OK isOnLoan() Normal new() all pairs validated state T2 = C (L0 L1 L2) P Reach every state and validate every transition pair
6
Coverage of Derived Discharged ¬ isOnLoan() new() issue(a) Issued isOnLoan() discharge() borrower() / error borrower() / OK isOnLoan() Normal OnShelf ¬ reserved() OnLoan ¬ reserved() PutAside reserved() Recalled reserved() discharge() issue(a) reserved() reserve(b) cancel() some pairs reached state Reusing the same T2 test-set
7
Test Regeneration Model Base Derived Btest Dtest transitively conforms proves refines Only base object proven correct by basic test set Derived object requires all- new tests, regenerated from derived specification Derived object conforms to derived spec. by testing Derived spec. conforms to base spec. by verification Derived object conforms transitively to base spec. New idea: conformity proven by both verification and testing
8
The Conundrum Regression testing is too weak saved tests don’t exercise the refined model manual extra tests don’t cover all path combinations regression guarantee is progressively weakened Test regeneration is more reliable all-new tests generated from a refined specification automatically generated tests cover all path combinations there is a guarantee of repeatable test quality (for T k ) How to replicate for agile methods? No up-front specification from which to generate tests The only artifact is the evolving code, which changes Can we make any use of this?
9
Lazy Systematic Unit Testing Lazy Specification late inference of a specification from evolving code semi-automatic, by static and dynamic analysis of code with limited user interaction specification evolves in step with modified code Systematic Testing bounded exhaustive testing, up to the specification emphasis on completeness, conformance, correctness properties after testing, repeatable test quality http://en.wikipedia.org/wiki/Lazy_systematic_unit_testing
10
JWalk Tester Lazy systematic unit testing for Java static analysis - extracts the public API of a compiled Java class protocol walk (all paths) – explores, validates all interleaved methods to a given path depth algebra walk (memory states) – explores, validates all observations on all mutator-method sequences state walk (high-level states) – explores, validates n-switch transition cover for all high-level states http://www.dcs.shef.ac.uk/~ajhs/jwalk/ Try me
11
Example: Stack Analysis of the API (protocol, algebra) Test reports for each test cycle Test statistics and summary report
12
Load the Test Class Choose a location the working directory the root of a package is its parent directory Choose a test class browse for the test class within a directory browse for a package- qualified class within a package Shortcut type the (qualified) test class name directly
13
Pick Settings and Go Strategy protocol: all methods algebra: all constructions states: all states and transitions Modality inspect: the interface explore: exercise paths validate: against oracle Test depth maximum path length Start testing click on the JWalker to run a test series
14
Protocol Inspection Protocol analysis static analysis of the public API of test class includes all inherited public methods may/not include standard Object methods specify this through the custom settings
15
Algebraic Inspection Algebraic analysis dynamic analysis of algebraic categories primitive, transformer and observer operations Technique compares concrete object states identifies unchanged, or re-entrant states controlled by probe-depth and state-depth (custom)
16
State Inspection State Analysis dynamic analysis of high-level states automatically names discovered states computes state cover Technique based on public state predicate methods seeks the boolean state product (fails gracefully) controlled by probe-depth
17
Baseline Approaches Breadth-first generation all constructors and all interleaved methods (eg JCrasher, DSD-Crasher, Jov) generate-and-filter (eg Rostra, Java Pathfinder) by state equivalence class Computational cost exponential growth, memory issues, wasteful over- generation, even if filtering is later applied #paths = Σ c.m k, for k = 0..n Key: c = #constructors, m = #methods, k = depth
18
Dynamic Pruning Interleaved analysis generate-and-evaluate, pruning active paths on the fly (eg JWalk, Randoop) remove redundant prefix paths after each test cycle, don’t bother to expand in next cycle Increasing sophistication prune prefix paths ending in exceptions (fail again) JWalk, Randoop (2007) and prefixes ending in algebraic observers (unchanged) JWalk 0.8 (2007) and prefixes ending in algebraic transformers (reentrant) JWalk 1.0 (2009)
19
Protocol Exploration Protocol strategy explores all interleaved methods by brute force explores all paths up to length n (test depth) repeats invocations of the same method Pruning paths raising exceptions in test cycle i are not extended in test cycle i+1
20
Baseline new push top pop push top pop push top pop push top pop Key:novel state exception top poptop pop top pop top poptop pop push Brute-force, breadth- first exploration push top push top push top pop top pop
21
Prune Exceptions… new push top pop push top pop push top pop push top pop Key:novel state exception top poptop pop top pop top pop push Prune error-prefixes (JWalk0.8, Randoop) top pop
22
Algebraic Exploration Algebraic strategy explores all algebraic constructions grows paths using only primitive operations observes paths ending in any kind of operation Pruning prunes paths ending in exceptions (next cycle) also with re-entrant or unchanged states
23
Prune Observers new push top pop push top pop push top pop push top pop Key:novel state exception unchanged state push top pop Prune error- and observer-prefixes (JWalk0.8) pop top
24
…Transformers new push top pop push top pop push top pop top pop Key:novel state exception unchanged state reentrant state Prune error-, observer- and transformer-prefixes (JWalk1.0)
25
State Exploration State strategy reaches every high-level state explores all transition paths up to length n, from each state has n-switch coverage Pruning grows only primitive paths to reach all states prunes paths ending in exceptions (next cycle)
26
Exploration Summary Test settings test class, strategy, modality, depth Exploration summary # executed in total # discarded (pruned) # exercised (normal) # terminated (exception) Technique calculates discarded from theoretical max paths
27
The Same State? Some earlier approaches distinguish observers, mutators by signature (Rostra) intrusive state equality predicate methods (ASTOOT) external (partial) state equality predicates (Rostra) subsumption of execution traces in JVM (Pathfinder) Some algebraic approaches shallow, deep equality under all observers (TACCLE) but assumes observations are also comparable very costly to compute from first principles serialise object states and hash (Henkel & Diwan) but not all objects are serialisable no control over depth of comparison
28
State Comparison Reflection-and-hash extract state vector from objects compute hash code for each field order-sensitive combination hash code Proper depth control shallow or deep equality settings, to chosen depth hash on pointer, or recursively invoke algorithm Fast state comparison each test evaluation stores posterior state code fast comparison with preceding, or all prior states possible to detect unchanged, or reentrant states
29
Pruning: Stack Stackbaselineexcept.observ.transf. 01111 17777 2433113 32591392519 415556674325 5933133917931 Pruned: 9,300 redundant paths Retained: 31 significant paths (best 0.33%) Table 1: Cumulative paths explored after each test cycle
30
Pruning: Reservable Book ResBookbaselineexcept.observ.transf. 01111 19999 273 25 35855614933 4468141859741 537449memex16941 Pruned: 37,408 redundant paths Retained: 41 significant paths (best 0.12%) Table 2: Cumulative paths explored after each test cycle
31
Validation Modality Lazy specification interacts with tester to confirm key results uses predictive rules to infer further results stores key results in reusable test oracle Technique key results found at the leaves of the algebra tree apply predictions to other test strategies Tester accepts or rejects outcome
32
Test Result Prediction Semi-automatic validation the user confirms or rejects key results these constitute a test oracle, used in prediction eventually > 90% test outcomes predicted JWalk test result prediction rules eg: predict repeat failure new().pop().push(e) == new().pop() eg: predict same state target.size().push(e) == target.push(e) eg: predict same result target.push(e).pop().size() == target.size() Try me
33
Kinds of Prediction Strong prediction From known results, guarantee further outcomes in the same equivalence class eg: observer prefixes empirically checked before making any inference, unchanged state is guaranteed target.push(e).size().top() == target.push(e).top() Weak prediction From known facts, guess further outcomes; an incorrect guess will be revealed in the next cycle eg: methods with void type usually return no result, but may raise an exception target.pop() predicted to have no result target.pop().size() == -1 reveals an error
34
Algebraic Validation Algebraic testing grows all primitive paths ending in all operations solicits results for leaves of the algebra tree best mode in which to create an oracle Prediction predicts void-results predicts results saved in previous test cycles Oracle predicts a correct outcome Tester confirms an outcome
35
Protocol Validation Protocol Testing create oracle first using the algebra-strategy then apply same oracle in the protocol-strategy most results predicted! Prediction (chains of) observers don’t affect states re-entrant methods return to earlier states Oracle predicts many outcomes
36
State Validation State testing extends oracle created for the algebra-strategy can validate 1000’s of transition paths for a mere few 10’s of user confirmations Prediction all results for “nearby” states predicted needs confirmations for more “remote” states Oracle predicts many outcomes
37
Validation Summary Test summary other statistics as before Validation summary # passed (in total) # failed (in total) # confirmed (by user) # rejected (by user) # correct (by oracle) # incorrect (by oracle) 10x automated vs manual checks
38
Amortized Interaction Costs number of new confirmations, amortized over 6 test cycles con = manual confirmations, > 25 test cases/minute pre = JWalk’s predictions, eventually > 90% of test cases Test classa1a2a3s1s2s3 LibBk con357005 LibBk pre2818 38133 ResBk con3145601183 ResBk pre62789362411649 eg: algebra-test to depth 2, 14 new confirmations eg: state-test to depth 2, 241 predicted results
39
Feedback-based Methodology Coding The programmer prototypes a Java class in an editor Exploration JWalk systematically explores method paths, providing useful instant feedback to the programmer Specification JWalk infers a specification, building a test oracle based on key test results confirmed by the programmer Validation JWalk tests the class to bounded exhaustive depths, based on confirmed and predicted test outcomes JWalk uses state-based test generation algorithms
40
Example – Library Book Exploration surprise: target.issue(“a”).issue(“b”).getBorrower() == “b” violates business rules: fix code to raise an exception Validation all observations on chains of issue(), discharge() n-switch cover on states {Default, OnLoan} public class LibraryBook { private String borrower; public LibraryBook(); public void issue(String); public void discharge(); public String getBorrower(); public Boolean isOnLoan(); }
41
Extension – Reservable Book Exploration only revisits novel interleaved permutations of methods surprise: target.reserve(“a”).issue(“b”).getBorrower() == “b” Validation all obs. on chains of issue(), discharge(), reserve(), cancel() n-switch cover on states {Default, OnLoan, Reserved, Reserved&OnLoan} public class ReservableBook extends LibraryBook { private String requester; public ReservableBook(); public void reserve(String); public void cancel(); public String getRequester(); public Boolean isReserved(); }
42
Evaluation User Acceptance programmers find JWalk habitable they can concentrate on creative aspects (coding) while JWalk handles systematic aspects (validation, testing) Main Cost is Confirmations not so burdensome, since amortized over many test cycles metric: measure amortized confirmations per test cycle Comparison with JUnit common testing objective for manual and lazy systematic testing; evaluate coverage and testing effort Eclipse+JUnit vs. JWalkEditor: given the task of testing the “transition cover + all equivalence partitions of inputs”
43
Comparison with JUnit manual testing method Manual test creation takes skill, time and effort (eg: ~20 min to develop manual cases for ReservableBook) The programmer missed certain corner-cases eg: target.discharge().discharge() - a nullop? The programmer redundantly tested some properties eg: assertTrue(target != null) - multiple times The state coverage for LibraryBook was incomplete, due to the programmer missing hard-to-see cases The saved tests were not reusable for ReservableBook, for which all-new tests were written to test new interleavings
44
Advantages of JWalk JWalk lazy systematic testing JWalk automates test case selection - relieves the programmer of the burden of thinking up the right test cases! Each test case is guaranteed to test a unique property Interactive test result confirmation is very fast (eg: ~80 sec in total for 36 unique test cases in ReservableBook) All states and transitions covered, including nullops, to the chosen depth The test oracle created for LibraryBook formed the basis for the new oracle for ReservableBook, but… JWalk presented only those sequences involving new methods, and all interleavings with inherited methods
45
Measuring the Testing? Suppose an ideal test set BR : behavioural response (set) T : tests to be evaluated (bag – duplicates?) T E = BR T : effective tests (set) T R = T – T E : redundant tests (bag) Define test metrics Ef(T) = (|T E | – |T R |) / |BR| : effectiveness Ad(T) = |T E | / |BR| : adequacy
46
Speed and Adequacy of Testing Test goal: transition cover + equiv. partitions of inputs manual testing expensive, redundant and incomplete JWalk testing very efficient, close to complete eg: wrote 104 tests, 21 were effective and 83 not! eg: JWalk achieved 100% test coverage Test classTTETE TRTR Adeqtime min.sec LibBk manual3192290%11.00 ResBk manual104218353%20.00 LibBk jwalk10 0100%0.30 ResBk jwalk36 090%0.46
47
Some Conclusions JUnit: expert manual testing massive over-generation of tests (w.r.t. goal) sometimes adequate, but not effective stronger (t2, t3); duplicated; and missed tests hopelessly inefficient – also debugging test suites! JWalk: lazy systematic testing near-ideal coverage, adequate and effective a few input partitions missed (simple generation strategy) very efficient use of the tester’s time – sec. not min. or: two orders (x 1000) more tests, for same effort
48
More Conclusions Feedback-based development unexpected gain: automatic validation of prototype code c.f. Alloy’s model checking from a partial specification Moral for testing automatically executing saved tests is not so great need systematic test generation tools to get coverage automate the parts that humans get wrong! let humans focus on right/wrong responses.
49
JWalk 1.0 Toolset JWalk Tester JWalk Utility JWalk Editor JWalk Marker JWalk Grapher JWalk SOAR
50
Example: JWalk Editor © Neil Griffiths, 2008
51
Any Questions? http://www.dcs.shef.ac.uk/~ajhs/jwalk/ Put me to the test! © Anthony Simons, 2009, with help from Chris Thomson, Neil Griffiths, Mihai Gabriel Glont, Arne-Michael Toersel
52
Custom Configuration Oracle directory default is the test class directory; pick a new location Convention standard: exclude all of Object’s methods custom: include some complete: include all Probe depth max path length for dynamic analysis State depth tree depth for object state comparison shallow state (inc. array values) by default
53
Generators The heart of JWalk synthesise test input values on demand try to assure even spread of inputs for a given type by default, supply monotonic sequences of values MasterGenerator built-in ObjectGenerator is fairly comprehensive synthesises basic values, arrays, standard objects, etc. CustomGenerator take control of how particular types are synthesised provide custom generators; add to a master as delegates eg: StringGenerator, EnumGenerator, InterfaceGenerator
54
Custom Generators Choose a location default is the test class directory Choose a generator enter generator directly browse within package Click add/remove add a custom generator to the list remove a generator from the list
55
CustomGenerator Interface Provide a generator class with: public boolean canCreate(Class type); public Object nextValue(Class type); public void setOwner(MasterGenerator master); Key points: advertises which types it can synthesise generates a sequence of objects on demand may keep a handle to its owning master eg: InterfaceGenerator maps interface types onto concrete classes; invokes nextValue recursively (on its master).
56
Example: IndexGenerator public class IndexGenerator implements CustomGenerator { private int seed = 1; private boolean flag = false; public boolean canCreate(Class type) { return type == int.class; } public Object nextValue(Class type) { if (flag) { flag = false; return seed++; } else { flag = true; return seed; } } public void setOwner(MasterGenerator master) {} } Creates repeating pairs of indices Specific for the int index type Nullop: ignores the master generator
57
When are they Useful? IndexGenerator generates repeating pairs of indices exercises put/get pairs in vector, array types StdIOGenerator redirect System.in, System.out to conventional files test programs with IO using prepared data in files FileGenerator take control of filenames and streams (security) test programs using prepared data in files Arbitrary test set-up take control of how the environment is established
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.