Eclat: Automatic Generation and Classification of Test Inputs

Slides:

Advertisements

Similar presentations

Feedback-directed Random Test Generation (to appear in ICSE 2007) Carlos Pacheco Shuvendu Lahiri Michael Ernst Thomas Ball MIT Microsoft Research January.

Advertisements

Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.

Testing and Quality Assurance

Towards Self-Testing in Autonomic Computing Systems Tariq M. King, Djuradj Babich, Jonatan Alava, and Peter J. Clarke Software Testing Research Group Florida.

Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.

Data Abstraction II SWE 619 Software Construction Last Modified, Spring 2009 Paul Ammann.

1 of 24 Automatic Extraction of Object-Oriented Observer Abstractions from Unit-Test Executions Dept. of Computer Science & Engineering University of Washington,

Feedback-Directed Random Test Generation Automatic Testing & Validation CSI5118 By Wan Bo.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.

Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by.

OOP #10: Correctness Fritz Henglein. Wrap-up: Types A type is a collection of objects with common behavior (operations and properties). (Abstract) types.

Synthesis of Interface Specifications for Java Classes Rajeev Alur University of Pennsylvania Joint work with P. Cerny, G. Gupta, P. Madhusudan, W. Nam,

Testing Team exercise Have each team member contribute answers: –Do you test your code? If no, why not? If yes: When? How? How often? –What is your team’s.

Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.

Automatic Unit Testing Tools Advanced Software Engineering Seminar Benny Pasternak November 2006.

1 Joe Meehean. 2 Testing is the process of executing a program with the intent of finding errors. -Glenford Myers.

Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.

Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.

Feed Back Directed Random Test Generation Carlos Pacheco1, Shuvendu K. Lahiri2, Michael D. Ernst1, and Thomas Ball2 1MIT CSAIL, 2Microsoft Research Presented.

Objectives Understand the basic concepts and definitions relating to testing, like error, fault, failure, test case, test suite, test harness. Explore.

Computer Science and Engineering College of Engineering The Ohio State University JUnit The credit for these slides goes to Professor Paul Sivilotti at.

Testing Michael Ernst CSE 140 University of Washington.

Testing Especially Unit Testing. V-model Wikipedia:

The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.

Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.

Today’s Agenda  Reminder: HW #1 Due next class  Quick Review  Input Space Partitioning Software Testing and Maintenance 1.

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

What is Testing? Testing is the process of finding errors in the system implementation. –The intent of testing is to find problems with the system.

Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.

Directed Random Testing Evaluation. FDRT evaluation: high-level – Evaluate coverage and error-detection ability large, real, and stable libraries tot.

New Random Test Strategies for Automated Discovery of Faults & Fault Domains Mian Asbat Ahmad

L13: Design by Contract Definition Reliability Correctness Pre- and post-condition Asserts and Exceptions Weak & Strong Conditions Class invariants Conditions.

CPSC 871 John D. McGregor Module 8 Session 1 Testing.

Testing Data Structures Tao Xie Visiting Professor, Peking University Associate Professor, North Carolina State University

Combining Static and Dynamic Reasoning for Bug Detection Yannis Smaragdakis and Christoph Csallner Elnatan Reisner – April 17, 2008.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.

Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.

CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 7, 2010.

Random Test Generation of Unit Tests: Randoop Experience

Defensive Programming. Good programming practices that protect you from your own programming mistakes, as well as those of others – Assertions – Parameter.

Reasoning and Design (and Assertions). How to Design Your Code The hard way: Just start coding. When something doesn’t work, code some more! The easier.

Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.

Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.

Testing Data Structures

Regression Testing with its types

Testing Tutorial 7.

Generating Automated Tests from Behavior Models

Software Testing.

John D. McGregor Session 9 Testing Vocabulary

CompSci 230 Software Construction

Input Space Partition Testing CS 4501 / 6501 Software Testing

ADT’s, Collections/Generics and Iterators

Automatic Generation of Program Specifications

Chapter 8 – Software Testing

CSE 374 Programming Concepts & Tools

White-Box Testing.

Marcelo d’Amorim (UIUC)

It is great that we automate our tests, but why are they so bad?

White-Box Testing.

Test Case Purification for Improving Fault Localization

Software testing.

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Test Driven Development

Regression Testing.

Test Driven Development

Automated test.

Chapter 7 Software Testing.

Automated test.

Presentation transcript:

Eclat: Automatic Generation and Classification of Test Inputs Carlos Pacheco and Michael Ernst Program Analysis Group MIT

The Problem Suppose you have a program that works It passes an existing test suite Its observable behavior appears correct You want improved confidence in the program's reliability Ensure that the program works on different inputs If program’s operation incorrect on some input, fix the program and add a new test case to test suite

Input Generation Can automatically generate test inputs Random generation [Klaessen & Hughes 2002, …] Bounded exhaustive testing [Boyapati et al. 2002] Constraint solving [Korel 1996, Gupta 1998, …] Many more… Without automatic tool support, must inspect each resulting input (unless executable spec/oracle exists) Is the input fault-revealing? Is the input useful?

Research Goal Help the user select from a large number of inputs, a small “promising” subset: Inputs exhibiting new program behavior Inputs likely to reveal faults

model of correct operation The Technique execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

model of correct operation The Technique execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

model of correct operation The Technique execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

model of correct operation The Technique execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

model of correct operation The Technique execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

model of correct operation The Technique execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

model of correct operation The Technique execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

model of correct operation The Technique execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

model of correct operation The Technique execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

Model Generator Our technique uses a model generator to produce a model of correct program operation, derived from observing a correct execution [Ernst et al. 2001, Ammons et al. 2002, Hankel and Diwan 2003, …] Our technique requires Set of properties hold at component boundaries The properties can be evaluated

Example: bounded stack [Stotts et al Example: bounded stack [Stotts et al. 2002, Xie and Notkin 2003, Csallner and Amaragdakis 2004] public class Stack { private int[] elems; private int topOfStack; private int capacity; public Stack() { ... } public void push(int k) { ... } public void pop() { topOfStack --; } public boolean isMember(int i) { ... public class StackTest { ... }

Example: bounded stack capacity == elems.length elems != null capacity == 2 topOfStack >= 0 elems Î { [3,0], [3,2] } k Î elems elems == orig(elems) orig(k) Î elems object properties isMember: exit properties isMember: entry properties pop: entry properties public class Stack { private int[] elems; private int topOfStack; private int capacity; public Stack() { ... } public void push(int k) { ... } public void pop() { topOfStack --; } public boolean isMember(int i) { ... public class StackTest { ... }

model of correct operation Classifier execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

Classifier Run program on candidate input Detect set of violated model properties Classify: entry violations? exit violations? Classification no no normal no yes fault yes no normal (new) yes yes illegal

Classifier: normal input yes no illegal normal fault exit violations? entry normal (new) capacity == elems.length elems != null capacity == 2 topOfStack >= 0 elems Î { [3,0], [3,2] } k Î elems elems == orig(elems) orig(k) Î elems object properties (all methods) isMember: exit properties isMember: entry properties pop: entry properties Classification Stack var1 = new Stack(); var1.push(3); var1.pop();

Classifier: normal input yes no illegal normal fault exit violations? entry normal (new) capacity == elems.length elems != null capacity == 2 topOfStack >= 0 elems Î { [3,0], [3,2] } k Î elems elems == orig(elems) orig(k) Î elems object properties (all methods) isMember: exit properties isMember: entry properties pop: entry properties Classification Stack var1 = new Stack(); var1.push(3); var1.pop();

Classifier: fault-revealing input yes no illegal normal fault exit violations? entry normal (new) capacity == elems.length elems != null capacity == 2 topOfStack >= 0 elems Î { [3,0], [3,2] } k Î elems elems == orig(elems) orig(k) Î elems object properties (all methods) isMember: exit properties isMember: entry properties pop: entry properties Classification Stack var1 = new Stack(); var1.push(3); var1.pop();

Classifier: fault-revealing input yes no illegal normal fault exit violations? entry normal (new) capacity == elems.length elems != null capacity == 2 topOfStack >= 0 elems Î { [3,0], [3,2] } k Î elems elems == orig(elems) orig(k) Î elems object properties (all methods) isMember: exit properties isMember: entry properties pop: entry properties Classification (on exit) Stack var1 = new Stack(); var1.push(3); var1.pop() var1.pop();

Classifier: illegal input yes no illegal normal fault exit violations? entry normal (new) capacity == elems.length elems != null capacity == 2 topOfStack >= 0 elems Î { [3,0], [3,2] } k Î elems elems == orig(elems) orig(k) Î elems object properties (all methods) isMember: exit properties isMember: entry properties pop: entry properties Classification Stack var1 = new Stack(); var1.push(0); var1.isMember(-5);

Classifier: illegal input yes no illegal normal fault exit violations? entry normal (new) capacity == elems.length elems != null capacity == 2 topOfStack >= 0 elems Î { [3,0], [3,2] } k Î elems elems == orig(elems) orig(k) Î elems object properties (all methods) isMember: exit properties isMember: entry properties pop: entry properties Classification Stack var1 = new Stack(); var1.push(0); var1.isMember(-5);

model of correct operation Reducer execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

Reducer Partitions inputs based on the set of properties they violate Reports one inputs from each partition Inputs in same partition are likely to manifest same faulty behavior

Reducer: example Two equivalent inputs: Violation pattern: Stack var1 = new Stack(); var1.push(3); var1.pop(); topOfStack >= 0 (on exit) Violation pattern: Stack var1 = new Stack(); var1.push(0); var1.pop(); var1.push(3); topOfStack >= 0 (on exit) Violation pattern:

model of correct operation Input Generator execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

Bottom-up Random Generator 1. pool := a set of primitives (null, 0, 1, etc.) 2. do N times: 2.1. create new inputs by calling methods/constructors using pool inputs as arguments 2.2. add resulting inputs to the pool Null, 0, 1, 2, 3

Bottom-up Random Generator 1. pool := a set of primitives (null, 0, 1, etc.) 2. do N times: 2.1. create new inputs by calling methods/constructors using pool inputs as arguments 2.2. add resulting inputs to the pool Null, 0, 1, 2 3 Stack var1 = new Stack(); Stack var2 = new Stack(3);

Bottom-up Random Generator 1. pool := a set of primitives (null, 0, 1, etc.) 2. do N times: 2.1. create new inputs by calling methods/constructors using pool inputs as arguments 2.2. add resulting inputs to the pool Null, 0, 1, 2 3 Stack var1 = new Stack(); Stack var2 = new Stack(3); var2.push(3); var1.isMember(2); var1.pop();

Bottom-up Random Generator 1. pool := a set of primitives (null, 0, 1, etc.) 2. do N times: 2.1. create new inputs by calling methods/constructors using pool inputs as arguments 2.2. add resulting inputs to the pool Null, 0, 1, 2 3 Stack var1 = new Stack(); Stack var2 = new Stack(3); var2.push(3); var1.isMember(2); var1.pop(); var1.equals(var2); var2.pop(); var2.push(0);

Avoiding illegal inputs It’s important that the inputs in the pool are legal Inputs in the pool are building blocks for other inputs Input 1(tests pop) Input 2 (tests equals) Input 3 (tests isMember) Stack s = new Stack(); s.pop(); Stack s = new Stack(); s.pop(); Stack s2 = new Stack(3); s.equals(s2); Stack s = new Stack(); s.pop(); s.isMember(1); illegal fault-revealing illegal

Using the classifier for generation execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

Using the classifier for generation execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal Input generator

Enhanced generator 1. pool := a set of primitives (null, 0, 1, etc.) 2. do N times: 2.1. create new inputs by calling methods/constructors using pool inputs as arguments 2.2 classify inputs 2.3 throw away illegal inputs 2.4 save away fault inputs 2.5. add normal inputs to the pool

Eclat Eclat generates inputs for Java unit testing Text output or XML output JUnit tests Eclat Java program Eclat Execution (e.g. test suite) Eclat generates inputs for Java unit testing Eclat uses the Daikon invariant detector to create a model of correct execution Each test input is wrapped as a JUnit test Eclat proposes assertion checks based on violated properties http://pag.csail.mit.edu/eclat

Eclat’s output: example public void test_1_integrate() { RatPoly rp1 = new RatPoly(4, 3); RatPoly rp2 = new RatPoly(1, 1); RatPoly rp3 = rp1.add(rp2); checkPreProperties(rp3); rp3.integrate(0); checkPostProperties(rp3); }

Eclat’s output: example public void test_1_integrate() { RatPoly rp1 = new RatPoly(4, 3); RatPoly rp2 = new RatPoly(1, 1); RatPoly rp3 = rp1.add(rp2); checkPreProperties(rp3); rp3.integrate(0); checkPostProperties(rp3); } Assertion violation!

Eclat’s output: example public void test_1_integrate() { RatPoly rp1 = new RatPoly(4, 3); RatPoly rp2 = new RatPoly(1, 1); RatPoly rp3 = rp1.add(rp2); checkPreProperties(rp3); rp3.integrate(0); checkPostProperties(rp3); } Assertion violation! public void checkPostProperties(RatPoly rp) { ... // on exit: all terms in rp always have non-zero coefficient Assert.assertTrue(!allZeroes(rp.terms)); }

Eclat’s output: example public void test_1_integrate() { RatPoly rp1 = new RatPoly(4, 3); RatPoly rp2 = new RatPoly(1, 1); RatPoly rp3 = rp1.add(rp2); checkPreProperties(rp3); rp3.integrate(0); checkPostProperties(rp3); } Assertion violation! public void checkPostProperties(RatPoly rp) { ... // on exit: all terms in rp always have non-zero coefficient Assert.assertTrue(!allZeroes(rp.terms)); } 94 inputs violate this property. Of these, 3 are shown to the user.

Evaluation We used Eclat to generate inputs for 6 families of libraries 64 distinct interfaces 631 implementations 75,000 NCNB lines of code

Evaluation results: generator execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal 1338 Input generator

Evaluation results: classifier execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program illegal True faults Classifier Reducer fault False alarms candidate inputs normal 1338 Input generator 321 illegal: 78% precision 31 12% precision fault: 986 90% precision normal:

Evaluation results: reducer execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program 1.5 illegal True faults Classifier Reducer fault False alarms candidate inputs normal 5 3.5 1338 30% precision Input generator 321 illegal: 78% precision 31 12% precision fault: 986 90% precision normal:

Evaluation results: reducer execution (e.g. test suite) Model generator model of correct operation potentially fault revealing inputs program 1.5 illegal True faults Classifier Reducer fault False alarms candidate inputs normal 5 3.5 1338 30% precision Input generator talk about 12-30 increase in precision before final numbers talk positively about false alarms 321 illegal: 78% precision 31 12% precision fault: 986 90% precision normal:

Future Directions Incorporate other classification techniques [Podgursky et al. 2003, Brun and Ernst 2004, Bowring et al. 2004, …] Incorporate other generation strategies [ Korel 1996, Gupta 1998, Klaessen & Hughes 2002, Boyapati et al. 2002, …]

Conclusion Technique to generate new program inputs Inputs likely to reveal faults Inputs not covered by an existing test suite Technique is effective in uncovering errors Eclat: automatically generates unit tests for Java classes Eclat: http://pag.csail.mit.edu/eclat

Related Work Harder et al. Improving test suites via operational abstraction. ICSE 2003. Xie and Notkin. Tool-assisted unit test selection based on operational violations. ASE 2003. Csallner and Smaragdakis. JCrasher: an automatic robustness tester for Java. Software Practice and Experience, 2004.