Marcelo d’Amorim (UIUC)

Slides:



Advertisements
Similar presentations
Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.
Advertisements

Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
Symbolic execution © Marcelo d’Amorim 2010.
1 of 24 Automatic Extraction of Object-Oriented Observer Abstractions from Unit-Test Executions Dept. of Computer Science & Engineering University of Washington,
Formal Methods of Systems Specification Logical Specification of Hard- and Software Prof. Dr. Holger Schlingloff Institut für Informatik der Humboldt.
1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Automatic Unit Testing Tools Advanced Software Engineering Seminar Benny Pasternak November 2006.
State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
Feed Back Directed Random Test Generation Carlos Pacheco1, Shuvendu K. Lahiri2, Michael D. Ernst1, and Thomas Ball2 1MIT CSAIL, 2Microsoft Research Presented.
Bug Localization with Machine Learning Techniques Wujie Zheng
Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.
Today’s Agenda  Reminder: HW #1 Due next class  Quick Review  Input Space Partitioning Software Testing and Maintenance 1.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Symbolic Execution with Abstract Subsumption Checking Saswat Anand College of Computing, Georgia Institute of Technology Corina Păsăreanu QSS, NASA Ames.
Computer Science 1 Mining Likely Properties of Access Control Policies via Association Rule Mining JeeHyun Hwang 1, Tao Xie 1, Vincent Hu 2 and Mine Altunay.
jFuzz – Java based Whitebox Fuzzing
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
Combining Static and Dynamic Reasoning for Bug Detection Yannis Smaragdakis and Christoph Csallner Elnatan Reisner – April 17, 2008.
( = “unknown yet”) Our novel symbolic execution framework: - extends model checking to programs that have complex inputs with unbounded (very large) data.
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 7, 2010.
Random Test Generation of Unit Tests: Randoop Experience
Symstra: A Framework for Generating Object-Oriented Unit Tests using Symbolic Execution Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin University.
Symbolic Model Checking of Software Nishant Sinha with Edmund Clarke, Flavio Lerda, Michael Theobald Carnegie Mellon University.
Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.
Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.
A Review of Software Testing - P. David Coward
Formal Specification.
Regression Testing with its types
Software Testing.
Generating Automated Tests from Behavior Models
Configuration Fuzzing for Software Vulnerability Detection
Control Flow Testing Handouts
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CompSci 230 Software Construction
Input Space Partition Testing CS 4501 / 6501 Software Testing
Automatic Generation of Program Specifications
Chapter 8 – Software Testing
Using Execution Feedback in Test Case Generation
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Learning Software Behavior for Automated Diagnosis
Outline of the Chapter Basic Idea Outline of Control Flow Testing
A Test Case + Mock Class Generator for Coding Against Interfaces
Coverage-Directed Differential Testing of JVM Implementations
Eclat: Automatic Generation and Classification of Test Inputs
Symbolic Implementation of the Best Transformer
Exception Handling Chapter 9.
Test Case Purification for Improving Fault Localization
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Coding Concepts (Basics)
Automatic Test Generation SymCrete
A QUICK START TO OPL IBM ILOG OPL V6.3 > Starting Kit >
Introduction to Data Structure
Regression Testing.
Java Statements B.Ramamurthy CS114A, CS504 4/23/2019 BR.
CUTE: A Concolic Unit Testing Engine for C
Introduction to Programming
Assertions References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 4/25/2019.
Outline System architecture Current work Experiments Next Steps
Computer Science 340 Software Design & Testing
Automatically Diagnosing and Repairing Error Handling Bugs in C
By Hyunsook Do, Sebastian Elbaum, Gregg Rothermel
Programming Languages 2nd edition Tucker and Noonan
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Mitigating the Effects of Flaky Tests on Mutation Testing
Presentation transcript:

Marcelo d’Amorim (UIUC) An Empirical Comparison of Automated Generation and Classification Techniques for Object-Oriented Unit Testing Marcelo d’Amorim (UIUC) Carlos Pacheco (MIT) Tao Xie (NCSU) Darko Marinov (UIUC) Michael D. Ernst (MIT) Automated Software Engineering 2006

Motivation Unit testing validates individual program units Hard to build correct systems from broken units Unit testing is used in practice 79% of Microsoft developers use unit testing [Venolia et al., MSR TR 2005] Code for testing often larger than project code Microsoft [Tillmann and Schulte, FSE 2005] Eclipse [Danny Dig, Eclipse project contributor] Unit testing is the activity of testing the individual components of a system. It is an important task in software development: It is hard to build larger components from broken ones.

Focus: Object-oriented unit testing Unit is one class or a set of classes Example [Stotts et al. 2002, Csallner and Smaragdakis 2004, …] // class under test public class UBStack { public UBStack(){...} public void push(int k){...} public void pop(){...} public int top(){...} public boolean equals(UBStack s){ ... } // example unit test case void test_push_equals() { UBStack s1 = new UBStack(); s1.push(1); UBStack s2 = new UBStack(); s2.push(1); assert(s1.equals(s2)); }

Unit test case = Test input + Oracle Sequence of method calls on the unit Example: sequence of push, pop Oracle Procedure to compare actual and expected results Example: assert Test Input Sequence of method calls on the unit Example: sequence of push, pop void test_push_equals() { UBStack s1 = new UBStack(); s1.push(1); UBStack s2 = new UBStack(); s2.push(1); assert(s1.equals(s2)); } void test_push_equals() { UBStack s1 = new UBStack(); s1.push(1); UBStack s2 = new UBStack(); s2.push(1); }

Creating test cases Automation requires addressing both: Test input generation Test classification Oracle from user: rarely provided in practice No oracle from user: users manually inspect generated test inputs Tool uses an approximate oracle to reduce manual inspection Manual creation is tedious and error-prone Delivers incomplete test suites Test creation can be done manually. Unfortunately, when generating tests manually one might miss important sequences or assertion checks that would otherwise reveal a fault. Automation is a way to reduce part of this problem. Test input can be automated using, for example, and random generator. Classifying whether the generated tests reveal faults is a more difficult problem. If formal specs are available they can be used, but they are rarely available. On the other hand, if formal specs are not used the user would have to inspect a considerable amount of tests in order to decide which ones reveal faults. A possibility is to have a tool that generate some approximation of oracles that can be used to reduce the effort for the user to inspect the generated tests.

Problem statement Compare automated unit testing techniques by effectiveness in finding faults

Outline Motivation, background and problem Framework and existing techniques New technique Evaluation Conclusions

A general framework for automation Model of correct operation class UBStack //@ invariant //@ size >= 0 optional Test suite test_push_equals() { … } Model generator Formal specification Daikon [Ernst et al., 2001] But formal specifications are rarely available Likely fault-revealing test inputs Actually class UBStack{ … push(int k){…} pop(){…} equals(UBStack s){…} } test0() { pop(); push(0); } test1() { push(1); pop(); } pop(); push(0); } True fault False alarm Unit testing tool Classifier Test-input generator Program Candidate inputs

Reduction to improve quality of output Model of correct operation Reducer (subset of) Fault-revealing test inputs True fault False alarm Fault-revealing test inputs Classifier It is often the case that the techniques output tests revealing the same fault (for example, violate the same part of the spec from the same part of the program). A component called reducer can be user to remove such redundancies so to ease the task of inspection. Candidate inputs

Combining generation and classification Uncaught exceptions (UncEx) Operational models (OpMod) generation Random (RanGen) [Csallner and Smaragdakis, SPE 2004], … [Pacheco and Ernst, ECOOP 2005] Symbolic (SymGen) [Xie et al., TACAS 2005] ? Introduce symbolic execution here. Say that symbolic execution provides symbolic variables (with no fixed concrete value) as inputs and collect constraints while making decisions in the program. It is necessary to backtrack execution when non-deterministic choices arise in the execution. In the end of one execution, you solve the constraints and generate input values for the test (sequence of visited methods). … … …

Random Generation Chooses sequence of methods at random Chooses arguments for methods at random

Instantiation 1: RanGen + UncEx Fault-revealing test inputs Uncaught exceptions True fault False alarm Program You have a “simple” classifier that just observe runtime exceptions in order to assign a label of fault to a test input. The reducer is also based only on runtime exceptions. Meaning that two test are equal if they both raise the same runtime exception in the same program point. Candidate inputs Random generation

Instantiation 2: RanGen + OpMod Model of correct operation Test suite Model generator Fault-revealing test inputs Operational models True fault False alarm Program You have a “simple” classifier that just observe runtime exceptions in order to assign a label of fault to a test input. The reducer is also based only on runtime exceptions. Meaning that two test are equal if they both raise the same runtime exception in the same program point. Candidate inputs Random generation

Symbolic Generation Symbolic execution Executes methods with symbolic arguments Collects constraints on these arguments Solves constraints to produce concrete test inputs Previous work for OO unit testing [Xie et al., TACAS 2005] Basics of symbolic execution for OO programs Exploration of method sequences

Instantiation 3: SymGen + UncEx Fault-revealing test inputs Uncaught exceptions True fault False alarm Program You have a “simple” classifier that just observe runtime exceptions in order to assign a label of fault to a test input. The reducer is also based only on runtime exceptions. Meaning that two test are equal if they both raise the same runtime exception in the same program point. Candidate inputs Symbolic generation

Outline Motivation, background and problem Framework and existing techniques New technique Evaluation Conclusions

Proposed new technique Model-based Symbolic Testing (SymGen+OpMod) Symbolic generation Operational model classification Brief comparison with existing techniques May explore failing method sequences that RanGen+OpMod misses May find semantic faults that SymGen+UncEx misses

Contributions Extended symbolic execution Implementation (Symclat) Operational models Non-primitive arguments Implementation (Symclat) Modified explicit-state model-checker Java Pathfinder [Visser et al., ASE 2000] Previous research. PATH EXPLORATION: builds symbolic constraints during execution and backtracks to explore unvisited states, STATE REPRESENTATION: rooted heap + path condition, INFEASIBILITY = satisfiability of path condition, and subsumption is heap-symmetry (isomorphism) and implication of PCs. New contributions. We handle non-primitive arguments with wrapper classes (in this approach we save non-primitive arguments in fields of this wrapper classes along the execution and use “previously seen” objects when needed in operations taking non-primitve args). We instrumented the code to use a library that enables symbolic execution to be aware of invariants.

Instantiation 4: SymGen + OpMod Model of correct operation Test suite Model generator Fault-revealing test inputs Operational models True fault False alarm Program Combines symbolic input generation with execution models Candidate inputs Symbolic generation

Outline Motivation, background and problem Framework and existing techniques New technique Evaluation Conclusions

Evaluation Comparison of four techniques [Pacheco and Ernst, 2005] classification Implementation tool Uncaught exceptions Operational models generation Random RanGen+ UncEx RanGen+ OpMod Eclat [Pacheco and Ernst, 2005] Symbolic SymGen+ UncEx SymGen+ OpMod Symclat

Subjects Source Subject NCNB LOC #methods UBStack [Csallner and Smaragdakis 2004, Xie and Notkin 2003, Stotts et al. 2002] UBStack 8 88 11 UBStack 12 Daikon [Ernst et al. 2001] UtilMDE 1832 69 DataStructures [Weiss 99] BinarySearchTree 186 9 StackAr 90 8 StackLi JML samples [Cheon et al.2002] IntegerSetAsHashSet 28 4 Meter 21 3 DLList 286 12 E_OneWayList 171 10 E_SLList 175 OneWayList OneWayNode 65 SLList 92 TwoWayList MIT 6.170 problem set [Pacheco and Ernst, 2005] RatPoly (46 versions) 582.51 17.20 Mention that we evaluate 10% of the subjects that Eclat evaluated: our implementation of symbolic execution has limitations. For example, support only integer primitives, arrays are not symbolic (we do not allow indexing with symbolic expressions). In addition, primitives are encoded with unbounded-precision which may generate undecidable constraints (e.g., division and mod).

Experimental setup Eclat (RanGen) and Symclat (SymGen) tools With UncEx and OpMod classifications With and without reduction Each tool run for about the same time (2 min. on Intel Xeon 2.8GHz, 2GB RAM) For RanGen, Eclat runs each experiment with 10 different seeds

Comparison metrics Compare effectiveness of various techniques in finding faults Each run gives to user a set of test inputs Tests: Number of test inputs given to user Metrics Faults: Number of actually fault-revealing test inputs DistinctF: Number of distinct faults found Prec = Faults/Tests: Precision, ratio of generated test inputs revealing actual faults

Evaluation procedure Tests JML formal spec Unit testing tool Faults True fault False alarm DistinctF Prec = Faults/Tests

Summary of results All techniques miss faults and report false positives Techniques are complementary RanGen is sensitive to seeds Reduction can increase precision but decreases number of distinct faults

False positives and negatives Generation techniques can miss faults RanGen can miss important sequences or input values SymGen can miss important sequences or be unable to solve constraints Classification techniques can miss faults and report false alarms due to imprecise models Misclassify test inputs (normal as fault-revealing or fault-revealing as normal)

Results without reduction # of test inputs given to the user # of actual fault-revealing tests generated RanGen+UncEx RanGen+OpMod SymGen+UncEx SymGen+OpMod Tests 4,367.5 1,666.6 6,676 4,828 Faults 256.0 181.2 515 164 DistinctF 17.7 13.1 14 9 Prec 0.20 0.42 0.15 0.14 # distinct actual faults precision = Faults / Tests

Results with reduction RanGen+UncEx RanGen+OpMod SymGen+UncEx SymGen+OpMod Tests 124.4 56.2 106 46 Faults 22.8 13.4 11 7 DistinctF 15.3 11.6 Prec 0.31 0.51 0.17 0.20 DistinctF ↓ and Prec ↑ Reduction misses faults: may remove a true fault and keep false alarm Redundancy of tests decreases precision

Sensitivity to random seeds For one RatPoly implementation RanGen+OpMod (with reduction) 200 tests for 10 seeds 8 revealing faults For only 5 seeds there is (at least) one test that reveals fault RanGen+ UncEx RanGen+OpMod Tests 17.1 20 Faults 0.2 0.8 DistinctF 0.5 Prec 0.01 0.04

Outline Motivation, background and problem Framework and existing techniques New technique Evaluation Conclusions

Key: Complementary techniques Each technique finds some fault that other techniques miss Suggestions Try several techniques on the same subject Evaluate how merging independently generated sets of test inputs affects Faults, DistinctF, and Prec Evaluate other techniques (e.g., RanGen+SymGen [Godefroid et al. 2005, Cadar and Engler 2005, Sen et al. 2005]) Improve RanGen Bias selection (What methods and values to favor?) Run with multiple seeds (Merging of test inputs?)

Conclusions Proposed a new technique: Model-based Symbolic Testing Compared four techniques that combine Random vs. symbolic generation Uncaught exception vs. operational models classification Techniques are complementary Proposed improvements for techniques