50.530: Software Engineering

Slides:



Advertisements
Similar presentations
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 18 Program Correctness To treat programming.
Advertisements

Feedback-directed Random Test Generation (to appear in ICSE 2007) Carlos Pacheco Shuvendu Lahiri Michael Ernst Thomas Ball MIT Microsoft Research January.
Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.
Korat Automated Testing Based on Java Predicates Chandrasekhar Boyapati, Sarfraz Khurshid, Darko Marinov MIT ISSTA 2002 Rome, Italy.
White Box and Black Box Testing Tor Stålhane. What is White Box testing White box testing is testing where we use the info available from the code of.
50.530: Software Engineering Sun Jun SUTD. Week 10: Invariant Generation.
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
Composition CMSC 202. Code Reuse Effective software development relies on reusing existing code. Code reuse must be more than just copying code and changing.
Data Abstraction II SWE 619 Software Construction Last Modified, Spring 2009 Paul Ammann.
Feedback-Directed Random Test Generation Automatic Testing & Validation CSI5118 By Wan Bo.
CMSC 345, Version 11/07 SD Vick from S. Mitchell Software Testing.
Copyright W. Howden1 Lecture 13: Programming by Contract.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Efficient Modular Glass Box Software Model Checking Michael Roberson Chandrasekhar Boyapati The University of Michigan.
1/23/2003University of Virginia1 Korat: Automated Testing Based on Java Predicates CS751 Presentation by Radu Stoleru C.Boyapaty, S.Khurshid, D.Marinov.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 8: Semi-automated test generation via UDITA.
Korat: Automated Testing Based on Java Predicates Chandrasekhar Boyapati 1, Sarfraz Khurshid 2, and Darko Marinov 3 1 University of Michigan Ann Arbor.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
System/Software Testing
Feed Back Directed Random Test Generation Carlos Pacheco1, Shuvendu K. Lahiri2, Michael D. Ernst1, and Thomas Ball2 1MIT CSAIL, 2Microsoft Research Presented.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Static program checking and verification Slides: Based.
Korat: Automated Testing Based on Java Predicates
1 Debugging and Testing Overview Defensive Programming The goal is to prevent failures Debugging The goal is to find cause of failures and fix it Testing.
COP3530 Data Structures600 Stack Stack is one the most useful ADTs. Like list, it is a collection of data items. Supports “LIFO” (Last In First Out) discipline.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 6: Exhaustive Bounded Testing and Feedback-Directed Random Testing.
Today’s Agenda  Reminder: HW #1 Due next class  Quick Review  Input Space Partitioning Software Testing and Maintenance 1.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
What is Testing? Testing is the process of finding errors in the system implementation. –The intent of testing is to find problems with the system.
Feedback-directed Random Test Generation Carlos Pacheco Shuvendu Lahiri Michael Ernst Thomas Ball MIT Microsoft Research January 19, 2007.
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
Directed Random Testing Evaluation. FDRT evaluation: high-level – Evaluate coverage and error-detection ability large, real, and stable libraries tot.
CSE 143 Lecture 4 More ArrayIntList : Pre/postconditions; exceptions; testing reading: slides created by Marty Stepp and Hélène Martin
PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.
Testing Data Structures Tao Xie Visiting Professor, Peking University Associate Professor, North Carolina State University
Automated Test Generation CS Outline Previously: Random testing (Fuzzing) – Security, mobile apps, concurrency Systematic testing: Korat – Linked.
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 7, 2010.
Random Test Generation of Unit Tests: Randoop Experience
Symstra: A Framework for Generating Object-Oriented Unit Tests using Symbolic Execution Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin University.
Testing Data Structures
A Review of Software Testing - P. David Coward
EECE 310: Software Engineering
Testing Tutorial 7.
Chapter 6 CS 3370 – C++ Functions.
Software Testing.
Software Testing.
John D. McGregor Session 9 Testing Vocabulary
Software Engineering (CSI 321)
Input Space Partition Testing CS 4501 / 6501 Software Testing
More JUnit CS 4501 / 6501 Software Testing
Chapter 8 – Software Testing
Using Execution Feedback in Test Case Generation
CS5123 Software Validation and Quality Assurance
Structural testing, Path Testing
Types of Testing Visit to more Learning Resources.
Eclat: Automatic Generation and Classification of Test Inputs
White-Box Testing Using Pex
Dynamic Symbolic Data Structure Repair
Software Testing (Lecture 11-a)
Lesson Objectives Aims
Generic programming in Java
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
CSE403 Software Engineering Autumn 2000 More Testing
Introduction to Data Structure
Assertions References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 4/25/2019.
CSE 1020:Software Development
Computer Science 340 Software Design & Testing
slides created by Ethan Apter and Marty Stepp
SPL – PS1 Introduction to C++.
Automated Test Generation
Presentation transcript:

50.530: Software Engineering Sun Jun SUTD

Week 2: Automatic Testing

A Big View: Testing the initial state C A B the behaviors we wanted the behaviors we have

A Big View: Testing a test which shows a bug the initial state C A the behaviors we wanted the behaviors we have

Testing Methods: white-box testing, black-box testing, grey-box testing Levels: unit testing, integration testing, system testing, etc. Types: installation testing, compatibility testing, smoke and sanity testing, regression testing, acceptance testing, alpha testing, beta testing, function/non-functional testing, combinatorial testing, performance testing, security testing, etc.

Research Question Isn’t jUnit good enough? How do we automatically generate test cases so as to reveal bugs?

A Big View: Systematic Testing the initial state C A B the behaviors we wanted the behaviors we have

A Big View: Random Testing a test which shows a bug the initial state C A the behaviors we wanted the behaviors we have

Korat: Automated Testing Based on Java Predicates Boyapati et al., ISSTA 2002, ACM SIGSOFT Distinguished Paper Award Korat: Automated Testing Based on Java Predicates

Motivation It is important to be able to generate test cases automatically. It is important to generate test cases which are representative. Korat is merely a sample approach for systematic test case generation, however, it is similar in spirit to many systematic testing techniques (e.g., combinatorial testing, parameterized testing).

Example public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; public void remove (Node n) { //some code … How do we test remove(node n)?

Example How do we test remove(Node n)? We need a valid BinaryTree object bt. We need a valid Node object nd. We need to know what is expected after executing bt.remove(nd) public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; public void remove (Node n) { //some code …

Vocabulary Class invariant: an invariant used to define what are valid objects of the class e.g., size == 0 if root == null and size equals to the number of nodes in the tree public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; public void remove (Node n) { //some code …

Vocabulary Pre-condition (of a method) a condition which must be true prior to the execution of the method e.g., n must not be null. The class invariant is always part of the pre-condition. public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; public void remove (Node n) { //some code …

Vocabulary Post-condition (of a method) a condition which must be true after the execution of the method e.g., after remove, size is decremented by 1. The class invariant is always part of the post-condition. public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; public void remove (Node n) { //some code …

Karat: Assumption A class invariant is encoded as a method repOk(), which return true if and only if the object is in a state which satisfies the class invariant. public boolean repOK() { if (root == null) return size == 0; Set<Node> visited = new HashSet<Node>(); visited.add(root); LinkedList<Node> workList = new LinkedList<Node>(); workList.add(root); while (!workList.isEmpty()) { Node current = (Node) workList.removeFirst(); if (current.left != null) { if (!visited.add(current.left)) return false; workList.add(current.left); } if (current.right != null) { if (!visited.add(current.right)) workList.add(current.right); return (visited.size() == size);

Korat: Assumption Pre-condition and post-condition are encoded in Java Modeling Language //@ public invariant repOk(); // class invariant // for BinaryTree /*@ public normal_behavior // specification for remove @ requires has(n); // precondition @ ensures !has(n); // postcondition @*/ public void remove(Node n) { // ... method body } This is probably too harsh a pre-condition?

Generate a BinaryTree bt and a Node n Karat: Approach Generate a BinaryTree bt and a Node n if repOk() and pre-condition is true otherwise Execute bt.remove(n) if post-condition is true otherwise

Finitization There are infinitely many candidates for bt and n. For each variable in the class, define its domain all possible bt interesting bt

Finitization public static Finitization finBinaryTree(int NUM_Node) { Finitization f = new Finitization (BinaryTree.class); ObjSet nodes = f.createObjSet(“Node”, NUM_Node); nodes.add(null); f.set("root", nodes); f.set("Node.left", nodes); f.set("Node.right", nodes); return f; } public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; …

Finitization public static Finitization finBinaryTree(int NUM_Node) { Finitization f = new Finitization (BinaryTree.class); ObjSet nodes = f.createObjSet(“Node”, NUM_Node); nodes.add(null); f.set("root", nodes); f.set("Node.left", nodes); f.set("Node.right", nodes); return f; } translation nodes = {null, N0, N1, N2} BinaryTree.root is a member of nodes Node.left is a member of nodes Node.right is a member of nodes

Example Trees With finBinaryTree(3), there are 4 objects: one BinaryTree object, three Node objects, which could be set up as follows.

Finitization: the Space How many bt are there with finBinaryTree(3), assume that bt.size is always set to the right value? 4^7 How many bt are there with finBinaryTree(n)? (n+1)^(2n+1) all possible bt interesting bt

Filtering 1 For each candidate bt and n, check the pre-condition of remove. If the pre-condition is not satisfied, ignore that tree. all possible bt interesting bt invalid bt

Is the following bt valid? public boolean repOK() { if (root == null) return size == 0; Set<Node> visited = new HashSet<Node>(); visited.add(root); LinkedList<Node> workList = new LinkedList<Node>(); workList.add(root); while (!workList.isEmpty()) { Node current = (Node) workList.removeFirst(); if (current.left != null) { if (!visited.add(current.left)) return false; workList.add(current.left); } if (current.right != null) { if (!visited.add(current.right)) workList.add(current.right); return (visited.size() == size); Is the following bt valid?

Korat: Search Algorithm Order all the elements in every class domain and every field domain Node class ordering: <null, N0, N1, N2> Assume domain of size: <3> Generate a candidate as a vector of field domain indices, e.g., [1,0,2,2,0,0,0,0]

Korat: Search Algorithm Invoke repOk() to check if the candidate is valid, e.g., [1,0,2,2,0,0,0,0] is invalid Backtrack to generate the next candidate in line, e.g., [1,0,2,2,0,0,0,1]

Optimization 1 During the execution of repOk, Korat monitors the fields that repOk accesses. e.g., [0, 2, 3] for the following example If repOk() results in false, backtrack until the accessed fields are different e.g., try [1,0,2,3,0,0,0,0] after [1,0,2,2,0,0,0,0] Is this justified?

Theory For non-deterministic repOk methods, All candidates for which repOk() always returns true are generated Candidates for which repOk() always returns false are never generated; Candidates for which repOk() sometimes returns true and sometimes false may or may not be generated.

Optimization 2 If we generated the above, we may not want to generate [1, 0, 3, 2, 0, 0, 0, 0]. Is this justified? N2 N1

Vocabulary object graph: Isomorphic: two object graphs C and C’ are isomorphic iff there is a permutation per such that per(C) = C’ and per(C’) = C e.g., per = {N1->N2, N2->N1} N2 N1

Optimization 2 interesting bt representative all candidates in the same region are isomorphic

Representative Given the two graphs below [1, 0, 2, 3, 0, 0, 0, 0] and [1, 0, 3, 2, 0, 0, 0, 0], Korat takes the latter as a representative, as it is “bigger”. N2 N1

Implementation: Op 2 When backtracking from [a, b, c, …,k, …], Korat tries [a, b, c, …,k+1, …] if k+1 is smaller than or equal to any number in the vector which has the same associated type. Korat tries [a, b, c, …, j+1, …] otherwise.

Example When backtrack from [1, 0, 2, 2, 0, 0, 0, 0], Korat skips [1, 0, 2, 3, 0, 0, 0, 0] (since there is a “bigger” representative [1,0,3,2,0,0,0,0]), and continues with [1,0,3,0,0,0,0,0]

Result Only 5 bt are generated – assuming size is set to 3 always.

Evaluation Is this biased?

Experiment I

Experiment II

Experiment III

Conclusion Korat generates test cases from a specified domain and correctness specification. Korat reduces test cases based on pre-condition a simple learning symmetry reduction

Exercise 1 Apply Korat to java.util.Stack by answering the following questions. What is the repOk()? What is the pre-condition and post-condition of method push and pop? How would you track which fields are accessed in repOk()? When are two stack objects isomorphic?

Discussion Any thought on Korat?

Feedback-directed Random Test Generation Pacheco et. al. ICSE 2007, cited 440+ Feedback-directed Random Test Generation

Random Testing Easy to implement Yields lots of test cases Finds errors 1990: Unix utilities 1998: OS services 2000: GUI applications 2000: functional programs 2005: object-oriented programs 2007: flash memory, file systems Perhaps simply got lucky?

Research Question Which one is better: systematic testing or random testing?

Random vs Systematic Theoretical work suggests that random testing is as effective as more systematic input generation techniques Duran et al. 1984 and Hamlet et al. 1990 Some empirical studies suggest systematic is more effective than random Ferguson et al. 1996: vs. chaining Marinov et al. 2003: vs. bounded exhaustive Visser et al. 2006: vs. model checking and symbolic execution small benchmarks; no measurement on error revealing effectiveness

Contributions Propose feedback-directed random test generation Randomized creation of new test inputs is guided by feedback about the execution of previous inputs Goal is to avoid redundant and illegal inputs Empirical evaluation Evaluate coverage and error-detection ability on a large number of widely-used, well-tested libraries (780KLOC) Compare against systematic input generation Compare against undirected random input generation

Sample Test Case public static void test1 () { LinkedList l1 = new LinkedList(); Object o1 = new Object(); l1.addFirst(o1); TreeSet t1 = new TreeSet(l1); Set s1 = Collections.unmodifiableSet(t1); Assert.assertTrue(s1.equals(s1)); }

Randoop Input: a class with multiple public methods. Output: a set of test cases (sequences of method calls) Main idea: Build test inputs incrementally: New test inputs extend previous ones As soon as a test input is created, execute it Use execution results to guide generation

The Oracle Problem If we are to do automatic testing, we must know what are the correct results, but how?

Specification How to get a better specification in general?

Algorithm

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.” 1980, C.A.R.Hoare

assertTrue(s.equals(s)); Randoop Example Date s = new Date(2006, 2, 14); Assert specification assertTrue(s.equals(s)); How do we randomly generate construct parameter values like 2006, 2, 14?

assertTrue(s.equals(s)); Randoop Example HashSet s = new HashSet(); Randomly pick a public method s.add(“”); Assert specification assertTrue(s.equals(s)); The default value for String “” is used since there is no other String in the system.

assertTrue(s.equals(s)); Randoop Example HashSet s = new HashSet(); Randomly pick a public method s.add(“”); Randomly pick a public method s.isEmpty(); Assert specification assertTrue(s.equals(s)); A method is probably an observer method if it has no parameters; it is public and non-static; it returns primitive values; and its name is size, count, length, toString, or begins with get or is.

Randoop Example Date d = new Date(2006, 2, 14); Randomly pick a public method d.setMonth(-1); // pre: argument >= 0 A sequence of method calls result in an exception is added to errSeqs.

assertTrue(s.equals(s)); Randoop Example Date d = new Date(2006, 2, 14); Randomly pick a public method d.setMonth(-1); // pre: argument >= 0 d.setDay(5); Assert specification assertTrue(s.equals(s));

Classifying a sequence contract violated? execute and check contracts minimize sequence yes start no components sequence redundant? no contract- violating test case yes discard sequence

Redundancy Checking Randoop maintains a set of objects for each type. A sequence (of method calls) is redundant if the objects created during its execution are members of the above set. Use equals() to compare Or user-defined more sophisticated checking

Some Randoop options Avoid use of null Biased random selection Favor smaller sequences Favor methods that have been less covered Use constants mined from source code statically… …and dynamically Object o = new Object(); LinkedList l = new LinkedList(); l.add(null); Object o = returnNull(); LinkedList l = new LinkedList(); l.add(o);

Research Question How effective would Randoop be? How do we judge whether one set of random test cases are better than another set?

Coverage Code block coverage: a set of random test cases are better if it covers more code blocks. For instance, consider each branch as a block Predicate coverage: given a set of predicates, a set of random test cases are better if it covers more valuations of the predicates. For instance, consider the predicates to be the propositions in the program.

Coverage Achieved by Randoop data structure time (s) branch cov. Bounded stack (30 LOC) 1 100% Unbounded stack (59 LOC) BS Tree (91 LOC) 96% Binomial heap (309 LOC) 84% Linked list (253 LOC) Tree map (370 LOC) 81% Heap array (71 LOC) Is this representative?

Predicate Coverage feedback-directed best systematic feedback-directed best systematic undirected random undirected random On binary tree and fibonacci heap, randoop achieves higher coverage than both systematic and undirected generation. On binomial heap and tree map, randoop achieves the same coverage as systematic generation, but does so faster. feedback-directed feedback-directed best systematic best systematic undirected random undirected random

Bug Detection LOC Classes JDK (2 libraries) 53K 272 (java.util, javax.xml) 53K 272 Apache commons (5 libraries) (logging, primitives, chain jelly, math, collections) 114K 974 .Net framework (5 libraries) 582K 3330 A C How would Korat perform on these examples?

Methodology Ran Randoop on each library Contracts: Used default time limit (2 minutes) Contracts: o.equals(o)==true o.equals(o) throws no exception o.hashCode() throws no exception o.toString() throw no exception No null inputs and: Java: No NullPointerEexceptions .NET: No NPEs, out-of-bounds, of illegal state exceptions

Results JDK 32 29 8 Apache commons 187 6 .Net framework 192 Total 411 test cases output error-revealing tests cases distinct errors JDK 32 29 8 Apache commons 187 6 .Net framework 192 Total 411 250 206

Errors found: examples JDK Collections classes have 4 methods that create objects violating o.equals(o) contract Javax.xml creates objects that cause hashCode and toString to crash, even though objects are well-formed XML constructs Apache libraries have constructors that leave fields unset, leading to NPE on calls of equals, hashCode and toString (this only counts as one bug) Many Apache classes require a call of an init() method before object is legal—led to many false positives .Net framework has at least 175 methods that throw an exception forbidden by the library specification (NPE, out-of-bounds, of illegal state exception) .Net framework has 8 methods that violate o.equals(o) .Net framework loops forever on a legal but unexpected input

Regression testing Randoop can create regression oracles Generated test cases using JDK 1.5 Randoop generated 41K regression test cases Ran resulting test cases on JDK 1.6 Beta 25 test cases failed Sun’s implementation of the JDK 73 test cases failed Failing test cases pointed to 12 distinct errors These errors were not found by the extensive compliance test suite that Sun provides to JDK developers Object o = new Object(); LinkedList l = new LinkedList(); l.addFirst(o); l.add(o); assertEquals(2, l.size()); // expected to pass assertEquals(false, l.isEmpty()); // expected to pass

Evaluation: summary Feedback-directed random test generation: Is effective at finding errors Discovered several errors in real code (e.g. JDK, .NET framework core libraries) Can outperform systematic input generation On previous benchmarks and metrics (coverage), and On a new, larger corpus of subjects, measuring error detection Can outperform undirected random test generation

Conclusion Feedback-directed random test generation Randoop: Finds errors in widely-used, well-tested libraries Can outperform systematic test generation Can outperform undirected test generation Randoop: Easy to use—just point at a set of classes Has real clients: used by product groups at Microsoft A mid-point in the systematic-random space of input generation techniques

Exercise 2 Apply Randoop, manually, to OrderSet.java To create 2 valid tests, one redundant test and one illegal sequence. Create a test case to expose the bug.

How do we improve Korat or Randoop? Research Question How do we improve Korat or Randoop?