Presentation is loading. Please wait.

Presentation is loading. Please wait.

Marcelo d’Amorim (UIUC)

Similar presentations


Presentation on theme: "Marcelo d’Amorim (UIUC)"— Presentation transcript:

1 Marcelo d’Amorim (UIUC)
An Empirical Comparison of Automated Generation and Classification Techniques for Object-Oriented Unit Testing Marcelo d’Amorim (UIUC) Carlos Pacheco (MIT) Tao Xie (NCSU) Darko Marinov (UIUC) Michael D. Ernst (MIT) Automated Software Engineering 2006

2 Motivation Unit testing validates individual program units
Hard to build correct systems from broken units Unit testing is used in practice 79% of Microsoft developers use unit testing [Venolia et al., MSR TR 2005] Code for testing often larger than project code Microsoft [Tillmann and Schulte, FSE 2005] Eclipse [Danny Dig, Eclipse project contributor] Unit testing is the activity of testing the individual components of a system. It is an important task in software development: It is hard to build larger components from broken ones.

3 Focus: Object-oriented unit testing
Unit is one class or a set of classes Example [Stotts et al. 2002, Csallner and Smaragdakis 2004, …] // class under test public class UBStack { public UBStack(){...} public void push(int k){...} public void pop(){...} public int top(){...} public boolean equals(UBStack s){ ... } // example unit test case void test_push_equals() { UBStack s1 = new UBStack(); s1.push(1); UBStack s2 = new UBStack(); s2.push(1); assert(s1.equals(s2)); }

4 Unit test case = Test input + Oracle
Sequence of method calls on the unit Example: sequence of push, pop Oracle Procedure to compare actual and expected results Example: assert Test Input Sequence of method calls on the unit Example: sequence of push, pop void test_push_equals() { UBStack s1 = new UBStack(); s1.push(1); UBStack s2 = new UBStack(); s2.push(1); assert(s1.equals(s2)); } void test_push_equals() { UBStack s1 = new UBStack(); s1.push(1); UBStack s2 = new UBStack(); s2.push(1); }

5 Creating test cases Automation requires addressing both:
Test input generation Test classification Oracle from user: rarely provided in practice No oracle from user: users manually inspect generated test inputs Tool uses an approximate oracle to reduce manual inspection Manual creation is tedious and error-prone Delivers incomplete test suites Test creation can be done manually. Unfortunately, when generating tests manually one might miss important sequences or assertion checks that would otherwise reveal a fault. Automation is a way to reduce part of this problem. Test input can be automated using, for example, and random generator. Classifying whether the generated tests reveal faults is a more difficult problem. If formal specs are available they can be used, but they are rarely available. On the other hand, if formal specs are not used the user would have to inspect a considerable amount of tests in order to decide which ones reveal faults. A possibility is to have a tool that generate some approximation of oracles that can be used to reduce the effort for the user to inspect the generated tests.

6 Problem statement Compare automated unit testing techniques by effectiveness in finding faults

7 Outline Motivation, background and problem
Framework and existing techniques New technique Evaluation Conclusions

8 A general framework for automation
Model of correct operation class UBStack invariant size >= 0 optional Test suite test_push_equals() { } Model generator Formal specification Daikon [Ernst et al., 2001] But formal specifications are rarely available Likely fault-revealing test inputs Actually class UBStack{ push(int k){…} pop(){…} equals(UBStack s){…} } test0() { pop(); push(0); } test1() { push(1); pop(); } pop(); push(0); } True fault False alarm Unit testing tool Classifier Test-input generator Program Candidate inputs

9 Reduction to improve quality of output
Model of correct operation Reducer (subset of) Fault-revealing test inputs True fault False alarm Fault-revealing test inputs Classifier It is often the case that the techniques output tests revealing the same fault (for example, violate the same part of the spec from the same part of the program). A component called reducer can be user to remove such redundancies so to ease the task of inspection. Candidate inputs

10 Combining generation and classification
Uncaught exceptions (UncEx) Operational models (OpMod) generation Random (RanGen) [Csallner and Smaragdakis, SPE 2004], … [Pacheco and Ernst, ECOOP 2005] Symbolic (SymGen) [Xie et al., TACAS 2005] ? Introduce symbolic execution here. Say that symbolic execution provides symbolic variables (with no fixed concrete value) as inputs and collect constraints while making decisions in the program. It is necessary to backtrack execution when non-deterministic choices arise in the execution. In the end of one execution, you solve the constraints and generate input values for the test (sequence of visited methods).

11 Random Generation Chooses sequence of methods at random
Chooses arguments for methods at random

12 Instantiation 1: RanGen + UncEx
Fault-revealing test inputs Uncaught exceptions True fault False alarm Program You have a “simple” classifier that just observe runtime exceptions in order to assign a label of fault to a test input. The reducer is also based only on runtime exceptions. Meaning that two test are equal if they both raise the same runtime exception in the same program point. Candidate inputs Random generation

13 Instantiation 2: RanGen + OpMod
Model of correct operation Test suite Model generator Fault-revealing test inputs Operational models True fault False alarm Program You have a “simple” classifier that just observe runtime exceptions in order to assign a label of fault to a test input. The reducer is also based only on runtime exceptions. Meaning that two test are equal if they both raise the same runtime exception in the same program point. Candidate inputs Random generation

14 Symbolic Generation Symbolic execution
Executes methods with symbolic arguments Collects constraints on these arguments Solves constraints to produce concrete test inputs Previous work for OO unit testing [Xie et al., TACAS 2005] Basics of symbolic execution for OO programs Exploration of method sequences

15 Instantiation 3: SymGen + UncEx
Fault-revealing test inputs Uncaught exceptions True fault False alarm Program You have a “simple” classifier that just observe runtime exceptions in order to assign a label of fault to a test input. The reducer is also based only on runtime exceptions. Meaning that two test are equal if they both raise the same runtime exception in the same program point. Candidate inputs Symbolic generation

16 Outline Motivation, background and problem
Framework and existing techniques New technique Evaluation Conclusions

17 Proposed new technique
Model-based Symbolic Testing (SymGen+OpMod) Symbolic generation Operational model classification Brief comparison with existing techniques May explore failing method sequences that RanGen+OpMod misses May find semantic faults that SymGen+UncEx misses

18 Contributions Extended symbolic execution Implementation (Symclat)
Operational models Non-primitive arguments Implementation (Symclat) Modified explicit-state model-checker Java Pathfinder [Visser et al., ASE 2000] Previous research. PATH EXPLORATION: builds symbolic constraints during execution and backtracks to explore unvisited states, STATE REPRESENTATION: rooted heap + path condition, INFEASIBILITY = satisfiability of path condition, and subsumption is heap-symmetry (isomorphism) and implication of PCs. New contributions. We handle non-primitive arguments with wrapper classes (in this approach we save non-primitive arguments in fields of this wrapper classes along the execution and use “previously seen” objects when needed in operations taking non-primitve args). We instrumented the code to use a library that enables symbolic execution to be aware of invariants.

19 Instantiation 4: SymGen + OpMod
Model of correct operation Test suite Model generator Fault-revealing test inputs Operational models True fault False alarm Program Combines symbolic input generation with execution models Candidate inputs Symbolic generation

20 Outline Motivation, background and problem
Framework and existing techniques New technique Evaluation Conclusions

21 Evaluation Comparison of four techniques [Pacheco and Ernst, 2005]
classification Implementation tool Uncaught exceptions Operational models generation Random RanGen+ UncEx RanGen+ OpMod Eclat [Pacheco and Ernst, 2005] Symbolic SymGen+ UncEx SymGen+ OpMod Symclat

22 Subjects Source Subject NCNB LOC #methods
UBStack [Csallner and Smaragdakis 2004, Xie and Notkin 2003, Stotts et al. 2002] UBStack 8 88 11 UBStack 12 Daikon [Ernst et al. 2001] UtilMDE 1832 69 DataStructures [Weiss 99] BinarySearchTree 186 9 StackAr 90 8 StackLi JML samples [Cheon et al.2002] IntegerSetAsHashSet 28 4 Meter 21 3 DLList 286 12 E_OneWayList 171 10 E_SLList 175 OneWayList OneWayNode 65 SLList 92 TwoWayList MIT problem set [Pacheco and Ernst, 2005] RatPoly (46 versions) 582.51 17.20 Mention that we evaluate 10% of the subjects that Eclat evaluated: our implementation of symbolic execution has limitations. For example, support only integer primitives, arrays are not symbolic (we do not allow indexing with symbolic expressions). In addition, primitives are encoded with unbounded-precision which may generate undecidable constraints (e.g., division and mod).

23 Experimental setup Eclat (RanGen) and Symclat (SymGen) tools
With UncEx and OpMod classifications With and without reduction Each tool run for about the same time (2 min. on Intel Xeon 2.8GHz, 2GB RAM) For RanGen, Eclat runs each experiment with 10 different seeds

24 Comparison metrics Compare effectiveness of various techniques in finding faults Each run gives to user a set of test inputs Tests: Number of test inputs given to user Metrics Faults: Number of actually fault-revealing test inputs DistinctF: Number of distinct faults found Prec = Faults/Tests: Precision, ratio of generated test inputs revealing actual faults

25 Evaluation procedure Tests JML formal spec Unit testing tool Faults
True fault False alarm DistinctF Prec = Faults/Tests

26 Summary of results All techniques miss faults and report false positives Techniques are complementary RanGen is sensitive to seeds Reduction can increase precision but decreases number of distinct faults

27 False positives and negatives
Generation techniques can miss faults RanGen can miss important sequences or input values SymGen can miss important sequences or be unable to solve constraints Classification techniques can miss faults and report false alarms due to imprecise models Misclassify test inputs (normal as fault-revealing or fault-revealing as normal)

28 Results without reduction
# of test inputs given to the user # of actual fault-revealing tests generated RanGen+UncEx RanGen+OpMod SymGen+UncEx SymGen+OpMod Tests 4,367.5 1,666.6 6,676 4,828 Faults 256.0 181.2 515 164 DistinctF 17.7 13.1 14 9 Prec 0.20 0.42 0.15 0.14 # distinct actual faults precision = Faults / Tests

29 Results with reduction
RanGen+UncEx RanGen+OpMod SymGen+UncEx SymGen+OpMod Tests 124.4 56.2 106 46 Faults 22.8 13.4 11 7 DistinctF 15.3 11.6 Prec 0.31 0.51 0.17 0.20 DistinctF ↓ and Prec ↑ Reduction misses faults: may remove a true fault and keep false alarm Redundancy of tests decreases precision

30 Sensitivity to random seeds
For one RatPoly implementation RanGen+OpMod (with reduction) 200 tests for 10 seeds 8 revealing faults For only 5 seeds there is (at least) one test that reveals fault RanGen+ UncEx RanGen+OpMod Tests 17.1 20 Faults 0.2 0.8 DistinctF 0.5 Prec 0.01 0.04

31 Outline Motivation, background and problem
Framework and existing techniques New technique Evaluation Conclusions

32 Key: Complementary techniques
Each technique finds some fault that other techniques miss Suggestions Try several techniques on the same subject Evaluate how merging independently generated sets of test inputs affects Faults, DistinctF, and Prec Evaluate other techniques (e.g., RanGen+SymGen [Godefroid et al. 2005, Cadar and Engler 2005, Sen et al. 2005]) Improve RanGen Bias selection (What methods and values to favor?) Run with multiple seeds (Merging of test inputs?)

33 Conclusions Proposed a new technique: Model-based Symbolic Testing
Compared four techniques that combine Random vs. symbolic generation Uncaught exception vs. operational models classification Techniques are complementary Proposed improvements for techniques


Download ppt "Marcelo d’Amorim (UIUC)"

Similar presentations


Ads by Google