Feedback-directed Random Test Generation (to appear in ICSE 2007) Carlos Pacheco Shuvendu Lahiri Michael Ernst Thomas Ball MIT Microsoft Research January.

Slides:



Advertisements
Similar presentations
Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.
Advertisements

Program Verification Using the Spec# Programming System ETAPS Tutorial K. Rustan M. Leino, Microsoft Research, Redmond Rosemary Monahan, NUIM Maynooth.
Java Programming: Advanced Topics 1 Collections and Utilities.
Transparency No. 1 Java Collection API : Built-in Data Structures for Java.
Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.
(c) 2007 Mauro Pezzè & Michal Young Ch 17, slide 1 Test Execution.
Concrete collections in Java library
Korat Automated Testing Based on Java Predicates Chandrasekhar Boyapati, Sarfraz Khurshid, Darko Marinov MIT ISSTA 2002 Rome, Italy.
Identity and Equality Based on material by Michael Ernst, University of Washington.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
Composition CMSC 202. Code Reuse Effective software development relies on reusing existing code. Code reuse must be more than just copying code and changing.
CERTIFICATION OBJECTIVES Use Class Members Develop Wrapper Code & Autoboxing Code Determine the Effects of Passing Variables into Methods Recognize when.
1 of 24 Automatic Extraction of Object-Oriented Observer Abstractions from Unit-Test Executions Dept. of Computer Science & Engineering University of Washington,
Feedback-Directed Random Test Generation Automatic Testing & Validation CSI5118 By Wan Bo.
Prioritizing User-session-based Test Cases for Web Applications Testing Sreedevi Sampath, Renne C. Bryce, Gokulanand Viswanath, Vani Kandimalla, A.Gunes.
Hybrid Concolic Testing Rupak Majumdar Koushik Sen UC Los Angeles UC Berkeley.
CSE503: SOFTWARE ENGINEERING TESTING David Notkin Spring 2011.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Finding Errors in.NET with Feedback-Directed Random Testing By Carlos Pacheco, Shuvendu K. Lahiri and Thomas Ball Presented by Bob Mazzi 10/7/08.
Efficient Modular Glass Box Software Model Checking Michael Roberson Chandrasekhar Boyapati The University of Michigan.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 8: Semi-automated test generation via UDITA.
Korat: Automated Testing Based on Java Predicates Chandrasekhar Boyapati 1, Sarfraz Khurshid 2, and Darko Marinov 3 1 University of Michigan Ann Arbor.
Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State Model Checking -Shreyas Ravindra.
Automated Diagnosis of Software Configuration Errors
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
Feed Back Directed Random Test Generation Carlos Pacheco1, Shuvendu K. Lahiri2, Michael D. Ernst1, and Thomas Ball2 1MIT CSAIL, 2Microsoft Research Presented.
Tao Xie (North Carolina State University) Nikolai Tillmann, Jonathan de Halleux, Wolfram Schulte (Microsoft Research, Redmond WA, USA)
Computer Science and Engineering College of Engineering The Ohio State University JUnit The credit for these slides goes to Professor Paul Sivilotti at.
1 Debugging and Testing Overview Defensive Programming The goal is to prevent failures Debugging The goal is to find cause of failures and fix it Testing.
Introduction to Testing 1. Testing  testing code is a vital part of the development process  the goal of testing is to find defects in your code  Program.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 6: Exhaustive Bounded Testing and Feedback-Directed Random Testing.
The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.
Which Configuration Option Should I Change? Sai Zhang, Michael D. Ernst University of Washington Presented by: Kıvanç Muşlu.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
What is Testing? Testing is the process of finding errors in the system implementation. –The intent of testing is to find problems with the system.
Feedback-directed Random Test Generation Carlos Pacheco Shuvendu Lahiri Michael Ernst Thomas Ball MIT Microsoft Research January 19, 2007.
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
Directed Random Testing Evaluation. FDRT evaluation: high-level – Evaluate coverage and error-detection ability large, real, and stable libraries tot.
David Streader Computer Science Victoria University of Wellington Copyright: David Streader, Victoria University of Wellington Debugging COMP T1.
New Random Test Strategies for Automated Discovery of Faults & Fault Domains Mian Asbat Ahmad
Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson.
CAPP: Change-Aware Preemption Prioritization Vilas Jagannath, Qingzhou Luo, Darko Marinov Sep 6 th 2011.
PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.
Testing Data Structures Tao Xie Visiting Professor, Peking University Associate Professor, North Carolina State University
Improving Structural Testing of Object-Oriented Programs via Integrating Evolutionary Testing and Symbolic Execution Kobi Inkumsah Tao Xie Dept. of Computer.
Automated Test Generation CS Outline Previously: Random testing (Fuzzing) – Security, mobile apps, concurrency Systematic testing: Korat – Linked.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Combining Static and Dynamic Reasoning for Bug Detection Yannis Smaragdakis and Christoph Csallner Elnatan Reisner – April 17, 2008.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 7, 2010.
Random Test Generation of Unit Tests: Randoop Experience
Symstra: A Framework for Generating Object-Oriented Unit Tests using Symbolic Execution Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin University.
Cs498dm Software Testing Darko Marinov January 24, 2012.
SWE 434 SOFTWARE TESTING AND VALIDATION LAB2 – INTRODUCTION TO JUNIT 1 SWE 434 Lab.
Testing Data Structures
Retrospective: Feedback-directed Random Test Generation
Testing Tutorial 7.
Software Testing.
Chapter 8 – Software Testing
Using Execution Feedback in Test Case Generation
Marcelo d’Amorim (UIUC)
Coverage-Directed Differential Testing of JVM Implementations
Eclat: Automatic Generation and Classification of Test Inputs
50.530: Software Engineering
CUTE: A Concolic Unit Testing Engine for C
Outline System architecture Current work Experiments Next Steps
Presentation transcript:

Feedback-directed Random Test Generation (to appear in ICSE 2007) Carlos Pacheco Shuvendu Lahiri Michael Ernst Thomas Ball MIT Microsoft Research January 19, 2007

Random testing Select inputs at random from a programs input space Check that program behaves correctly on each input An attractive error-detection technique Easy to implement and use Yields lots of test inputs Finds errors Miller et al. 1990: Unix utilities Kropp et al.1998: OS services Forrester et al. 2000: GUI applications Claessen et al. 2000: functional programs Csallner et al. 2005, Pacheco et al. 2005: object-oriented programs Groce et al. 2007: flash memory, file systems

Evaluations of random testing Theoretical work suggests that random testing is as effective as more systematic input generation techniques (Duran 1984, Hamlet 1990) and they use completely undirected random test generation. Some empirical studies suggest systematic is more effective than random Ferguson et al. 1996: compare with chaining Marinov et al. 2003: compare with bounded exhaustive Visser et al. 2006: compare with model checking and symbolic execution Studies are performed on small benchmarks, they do not measure error revealing effectiveness,

Contributions We propose feedback-directed random test generation Randomized creation of new test inputs is guided by feedback about the execution of previous inputs Goal is to avoid redundant and illegal inputs Empirical evaluation Evaluate coverage and error-detection ability on a large number of widely-used, well-tested libraries (780KLOC) Compare against systematic input generation Compare against undirected random input generation

Outline Feedback-directed random test generation Evaluation: Randoop: a tool for Java and.NET Coverage Error detection Current and future directions for Randoop

Random testing: pitfalls 1. Useful test Set t = new HashSet(); s.add(hi); assertTrue(s.equals(s)); 3. Useful test Date d = new Date(2006, 2, 14); assertTrue(d.equals(d)); 2. Redundant test Set t = new HashSet(); s.add(hi); s.isEmpty(); assertTrue(s.equals(s)); 4. Illegal test Date d = new Date(2006, 2, 14); d.setMonth(-1); // pre: argument >= 0 assertTrue(d.equals(d)); 5. Illegal test Date d = new Date(2006, 2, 14); d.setMonth(-1); d.setDay(5); assertTrue(d.equals(d)); do not output do not even create

Feedback-directed random test generation Build test inputs incrementally New test inputs extend previous ones In our context, a test input is a method sequence As soon as a test input is created, execute it Use execution results to guide generation away from redundant or illegal method sequences towards sequences that create new object states

Technique input/output Input: classes under test time limit set of contracts Method contracts (e.g. o.hashCode() throws no exception) Object invariants (e.g. o.equals(o) == true) Output: contract-violating test cases. Example: HashMap h = new HashMap(); Collection c = h.values(); Object[] a = c.toArray(); LinkedList l = new LinkedList(); l.addFirst(a); TreeSet t = new TreeSet(l); Set u = Collections.unmodifiableSet(t); assertTrue(u.equals(u)); fails when executed no contracts violated up to last method call

1.Seed components components = {... } 2.Do until time limit expires: a.Create a new sequence i.Randomly pick a method call m(T 1...T k )/T ret ii.For each input parameter of type T i, randomly pick a sequence S i from the components that constructs an object v i of type Ti iii.Create new sequence S new = S 1 ;... ; S k ; T ret v new = m(v1...vk); iv.if S new was previously created (lexically), go to i b.Classify the new sequence S new a.May discard, output as test case, or add to components Technique int i = 0;boolean b = false;

Classifying a sequence execute and check contracts components contract- violating test case contract violated? minimize sequence yes sequence redundant? no yes discard sequence start no

Redundant sequences During generation, maintain a set of all objects created. A sequence is redundant if all the objects created during its execution are members of the above set (using equals to compare) Could also use more sophisticated state equivalence methods E.g. heap canonicalization

Outline Feedback-directed random test generation Evaluation: Randoop: a tool for Java and.NET Coverage Error detection Current and future directions for Randoop

Randoop Implements feedback-directed random test generation Input: An assembly (for.NET) or a list of classes (for Java) Generation time limit Optional: a set of contracts to augment default contracts Output: a test suite (Junit or Nunit) containing Contract-violating test cases Normal-behavior test cases

Randoop outputs oracles Oracle for contract-violating test case: Object o = new Object(); LinkedList l = new LinkedList(); l.addFirst(o); TreeSet t = new TreeSet(l); Set u = Collections.unmodifiableSet(t); assertTrue(u.equals(u)); // expected to fail Oracle for normal-behavior test case: Object o = new Object(); LinkedList l = new LinkedList(); l.addFirst(o); l.add(o); assertEquals(2, l.size()); // expected to pass assertEquals(false, l.isEmpty()); // expected to pass Randoop uses observer methods to capture object state

Some Randoop options Avoid use of null Bias random selection Favor smaller sequences Favor methods that have been less covered Use constants mined from source code Object o = new Object(); LinkedList l = new LinkedList(); l.add(null); Object o = returnNull(); LinkedList l = new LinkedList(); l.add(o); statically… …and dynamically

Outline Feedback-directed random test generation Evaluation: Randoop: a tool for Java and.NET Coverage Error detection Current and future directions for Randoop

Coverage Seven data structures (stack, bounded stack, list, bst, heap, rbt, binomial heap) Used in previous research Bounded exhaustive testing [ Marinov 2003 ] Symbolic execution [ Xie 2005 ] Exhaustive method sequence generation [Xie 2004 ] All above techniques achieve high coverage in seconds Tools not publicly available

Coverage achieved by Randoop data structuretime (s)branch cov. Bounded stack (30 LOC)1100% Unbounded stack (59 LOC)1100% BS Tree (91 LOC)196% Binomial heap (309 LOC)184% Linked list (253 LOC)1100% Tree map (370 LOC)181% Heap array (71 LOC)1100% Comparable with exhaustive/symbolic techniques

Visser containers Visser et al. (2006) compares several input generation techniques Model checking with state matching Model checking with abstract state matching Symbolic execution Symbolic execution with abstract state matching Undirected random testing Comparison in terms of branch and predicate coverage Four nontrivial container data structures Experimental framework and tool available

Predicate coverage best systematic feedback-directed undirected random feedback-directed best systematic undirected random feedback-directed best systematic undirected random feedback-directed

Outline Feedback-directed random test generation Evaluation: Randoop: a tool for Java and.NET Coverage Error detection Current and future directions for Randoop

Subjects LOCClasses JDK (2 libraries) (java.util, javax.xml) 53K272 Apache commons (5 libraries) (logging, primitives, chain jelly, math, collections) 114K974.Net framework (5 libraries)582K3330

Methodology Ran Randoop on each library Used default time limit (2 minutes) Contracts: o.equals(o)==true o.equals(o) throws no exception o.hashCode() throws no exception o.toString() throw no exception No null inputs and: Java: No NPEs.NET: No NPEs, out-of-bounds, of illegal state exceptions

Results test cases output error- revealing tests cases distinct errors JDK32298 Apache commons Net framework192 Total

Errors found: examples JDK Collections classes have 4 methods that create objects violating o.equals(o) contract Javax.xml creates objects that cause hashCode and toString to crash, even though objects are well-formed XML constructs Apache libraries have constructors that leave fields unset, leading to NPE on calls of equals, hashCode and toString (this only counts as one bug) Many Apache classes require a call of an init() method before object is legalled to many false positives.Net framework has at least 175 methods that throw an exception forbidden by the library specification (NPE, out-of- bounds, of illegal state exception).Net framework has 8 methods that violate o.equals(o).Net framework loops forever on a legal but unexpected input

JPF Used JPF to generate test inputs for the Java libraries (JDK and Apache) Breadth-first search (suggested strategy) max sequence length of 10 JPF ran out of memory without finding any errors Out of memory after 32 seconds on average Spent most of its time systematically exploring a very localized portion of the space For large libraries, random, sparse sampling seems to be more effective

Undirected random testing JCrasher implements undirected random test generation Creates random method call sequences Does not use feedback from execution Reports sequences that throw exceptions Found 1 error on Java libraries Reported 595 false positives

Regression testing Randoop can create regression oracles Generated test cases using JDK 1.5 Randoop generated 41K regression test cases Ran resulting test cases on JDK 1.6 Beta 25 test cases failed Suns implementation of the JDK 73 test cases failed Failing test cases pointed to 12 distinct errors These errors were not found by the extensive compliance test suite that Sun provides to JDK developers Object o = new Object(); LinkedList l = new LinkedList(); l.addFirst(o); l.add(o); assertEquals(2, l.size()); // expected to pass assertEquals(false, l.isEmpty()); // expected to pass

Evaluation: summary Feedback-directed random test generation: Is effective at finding errors Discovered several errors in real code (e.g. JDK,.NET framework core libraries) Can outperform systematic input generation On previous benchmarks and metrics (coverage), and On a new, larger corpus of subjects, measuring error detection Can outperform undirected random test generation

Outline Feedback-directed random test generation Evaluation: Randoop: a tool for Java and.NET Coverage Error detection Current and future directions for Randoop

Tech transfer Randoop is currently being maintained by a product group at Microsoft Spent an internship doing the tech transfer How would a test team actually use the tool? Push-button at first, desire more control later Would the tool be cost-effective? Yes Immediately found a few errors With more control, found more errors Pointed to blind spots in existing test suites Existing automated testing tools Which heuristics would be most useful? The simplest ones (e.g. uniform selection) More sophisticated guidance was best left to the users of the tool

Future directions Combining random and systematic generation DART (Godefroid 2005) combines random and systematic generation of test data How to combine random and systematic generation of sequences? Using Randoop for reliability estimation Random sampling amenable to statistical analysis Are programs that Randoop finds more problems with more error-prone? Better oracles To date, we have used a very basic set of contracts Will better contracts lead to more errors? Incorporate techniques that create oracles automatically

Conclusion Feedback-directed random test generation Finds errors in widely-used, well-tested libraries Can outperform systematic test generation Can outperform undirected test generation Randoop: Easy to usejust point at a set of classes Has real clients: used by product groups at Microsoft A mid-point in the systematic-random space of input generation techniques