50.530: Software Engineering Sun Jun SUTD
Week 2: Automatic Testing
A Big View: Testing the initial state C A B the behaviors we wanted the behaviors we have
A Big View: Testing a test which shows a bug the initial state C A the behaviors we wanted the behaviors we have
Testing Methods: white-box testing, black-box testing, grey-box testing Levels: unit testing, integration testing, system testing, etc. Types: installation testing, compatibility testing, smoke and sanity testing, regression testing, acceptance testing, alpha testing, beta testing, function/non-functional testing, combinatorial testing, performance testing, security testing, etc.
Research Question Isn’t jUnit good enough? How do we automatically generate test cases so as to reveal bugs?
A Big View: Systematic Testing the initial state C A B the behaviors we wanted the behaviors we have
A Big View: Random Testing a test which shows a bug the initial state C A the behaviors we wanted the behaviors we have
Korat: Automated Testing Based on Java Predicates Boyapati et al., ISSTA 2002, ACM SIGSOFT Distinguished Paper Award Korat: Automated Testing Based on Java Predicates
Motivation It is important to be able to generate test cases automatically. It is important to generate test cases which are representative. Korat is merely a sample approach for systematic test case generation, however, it is similar in spirit to many systematic testing techniques (e.g., combinatorial testing, parameterized testing).
Example public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; public void remove (Node n) { //some code … How do we test remove(node n)?
Example How do we test remove(Node n)? We need a valid BinaryTree object bt. We need a valid Node object nd. We need to know what is expected after executing bt.remove(nd) public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; public void remove (Node n) { //some code …
Vocabulary Class invariant: an invariant used to define what are valid objects of the class e.g., size == 0 if root == null and size equals to the number of nodes in the tree public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; public void remove (Node n) { //some code …
Vocabulary Pre-condition (of a method) a condition which must be true prior to the execution of the method e.g., n must not be null. The class invariant is always part of the pre-condition. public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; public void remove (Node n) { //some code …
Vocabulary Post-condition (of a method) a condition which must be true after the execution of the method e.g., after remove, size is decremented by 1. The class invariant is always part of the post-condition. public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; public void remove (Node n) { //some code …
Karat: Assumption A class invariant is encoded as a method repOk(), which return true if and only if the object is in a state which satisfies the class invariant. public boolean repOK() { if (root == null) return size == 0; Set<Node> visited = new HashSet<Node>(); visited.add(root); LinkedList<Node> workList = new LinkedList<Node>(); workList.add(root); while (!workList.isEmpty()) { Node current = (Node) workList.removeFirst(); if (current.left != null) { if (!visited.add(current.left)) return false; workList.add(current.left); } if (current.right != null) { if (!visited.add(current.right)) workList.add(current.right); return (visited.size() == size);
Korat: Assumption Pre-condition and post-condition are encoded in Java Modeling Language //@ public invariant repOk(); // class invariant // for BinaryTree /*@ public normal_behavior // specification for remove @ requires has(n); // precondition @ ensures !has(n); // postcondition @*/ public void remove(Node n) { // ... method body } This is probably too harsh a pre-condition?
Generate a BinaryTree bt and a Node n Karat: Approach Generate a BinaryTree bt and a Node n if repOk() and pre-condition is true otherwise Execute bt.remove(n) if post-condition is true otherwise
Finitization There are infinitely many candidates for bt and n. For each variable in the class, define its domain all possible bt interesting bt
Finitization public static Finitization finBinaryTree(int NUM_Node) { Finitization f = new Finitization (BinaryTree.class); ObjSet nodes = f.createObjSet(“Node”, NUM_Node); nodes.add(null); f.set("root", nodes); f.set("Node.left", nodes); f.set("Node.right", nodes); return f; } public class BinaryTree { public static class Node { Node left; Node right; } private Node root; private int size; …
Finitization public static Finitization finBinaryTree(int NUM_Node) { Finitization f = new Finitization (BinaryTree.class); ObjSet nodes = f.createObjSet(“Node”, NUM_Node); nodes.add(null); f.set("root", nodes); f.set("Node.left", nodes); f.set("Node.right", nodes); return f; } translation nodes = {null, N0, N1, N2} BinaryTree.root is a member of nodes Node.left is a member of nodes Node.right is a member of nodes
Example Trees With finBinaryTree(3), there are 4 objects: one BinaryTree object, three Node objects, which could be set up as follows.
Finitization: the Space How many bt are there with finBinaryTree(3), assume that bt.size is always set to the right value? 4^7 How many bt are there with finBinaryTree(n)? (n+1)^(2n+1) all possible bt interesting bt
Filtering 1 For each candidate bt and n, check the pre-condition of remove. If the pre-condition is not satisfied, ignore that tree. all possible bt interesting bt invalid bt
Is the following bt valid? public boolean repOK() { if (root == null) return size == 0; Set<Node> visited = new HashSet<Node>(); visited.add(root); LinkedList<Node> workList = new LinkedList<Node>(); workList.add(root); while (!workList.isEmpty()) { Node current = (Node) workList.removeFirst(); if (current.left != null) { if (!visited.add(current.left)) return false; workList.add(current.left); } if (current.right != null) { if (!visited.add(current.right)) workList.add(current.right); return (visited.size() == size); Is the following bt valid?
Korat: Search Algorithm Order all the elements in every class domain and every field domain Node class ordering: <null, N0, N1, N2> Assume domain of size: <3> Generate a candidate as a vector of field domain indices, e.g., [1,0,2,2,0,0,0,0]
Korat: Search Algorithm Invoke repOk() to check if the candidate is valid, e.g., [1,0,2,2,0,0,0,0] is invalid Backtrack to generate the next candidate in line, e.g., [1,0,2,2,0,0,0,1]
Optimization 1 During the execution of repOk, Korat monitors the fields that repOk accesses. e.g., [0, 2, 3] for the following example If repOk() results in false, backtrack until the accessed fields are different e.g., try [1,0,2,3,0,0,0,0] after [1,0,2,2,0,0,0,0] Is this justified?
Theory For non-deterministic repOk methods, All candidates for which repOk() always returns true are generated Candidates for which repOk() always returns false are never generated; Candidates for which repOk() sometimes returns true and sometimes false may or may not be generated.
Optimization 2 If we generated the above, we may not want to generate [1, 0, 3, 2, 0, 0, 0, 0]. Is this justified? N2 N1
Vocabulary object graph: Isomorphic: two object graphs C and C’ are isomorphic iff there is a permutation per such that per(C) = C’ and per(C’) = C e.g., per = {N1->N2, N2->N1} N2 N1
Optimization 2 interesting bt representative all candidates in the same region are isomorphic
Representative Given the two graphs below [1, 0, 2, 3, 0, 0, 0, 0] and [1, 0, 3, 2, 0, 0, 0, 0], Korat takes the latter as a representative, as it is “bigger”. N2 N1
Implementation: Op 2 When backtracking from [a, b, c, …,k, …], Korat tries [a, b, c, …,k+1, …] if k+1 is smaller than or equal to any number in the vector which has the same associated type. Korat tries [a, b, c, …, j+1, …] otherwise.
Example When backtrack from [1, 0, 2, 2, 0, 0, 0, 0], Korat skips [1, 0, 2, 3, 0, 0, 0, 0] (since there is a “bigger” representative [1,0,3,2,0,0,0,0]), and continues with [1,0,3,0,0,0,0,0]
Result Only 5 bt are generated – assuming size is set to 3 always.
Evaluation Is this biased?
Experiment I
Experiment II
Experiment III
Conclusion Korat generates test cases from a specified domain and correctness specification. Korat reduces test cases based on pre-condition a simple learning symmetry reduction
Exercise 1 Apply Korat to java.util.Stack by answering the following questions. What is the repOk()? What is the pre-condition and post-condition of method push and pop? How would you track which fields are accessed in repOk()? When are two stack objects isomorphic?
Discussion Any thought on Korat?
Feedback-directed Random Test Generation Pacheco et. al. ICSE 2007, cited 440+ Feedback-directed Random Test Generation
Random Testing Easy to implement Yields lots of test cases Finds errors 1990: Unix utilities 1998: OS services 2000: GUI applications 2000: functional programs 2005: object-oriented programs 2007: flash memory, file systems Perhaps simply got lucky?
Research Question Which one is better: systematic testing or random testing?
Random vs Systematic Theoretical work suggests that random testing is as effective as more systematic input generation techniques Duran et al. 1984 and Hamlet et al. 1990 Some empirical studies suggest systematic is more effective than random Ferguson et al. 1996: vs. chaining Marinov et al. 2003: vs. bounded exhaustive Visser et al. 2006: vs. model checking and symbolic execution small benchmarks; no measurement on error revealing effectiveness
Contributions Propose feedback-directed random test generation Randomized creation of new test inputs is guided by feedback about the execution of previous inputs Goal is to avoid redundant and illegal inputs Empirical evaluation Evaluate coverage and error-detection ability on a large number of widely-used, well-tested libraries (780KLOC) Compare against systematic input generation Compare against undirected random input generation
Sample Test Case public static void test1 () { LinkedList l1 = new LinkedList(); Object o1 = new Object(); l1.addFirst(o1); TreeSet t1 = new TreeSet(l1); Set s1 = Collections.unmodifiableSet(t1); Assert.assertTrue(s1.equals(s1)); }
Randoop Input: a class with multiple public methods. Output: a set of test cases (sequences of method calls) Main idea: Build test inputs incrementally: New test inputs extend previous ones As soon as a test input is created, execute it Use execution results to guide generation
The Oracle Problem If we are to do automatic testing, we must know what are the correct results, but how?
Specification How to get a better specification in general?
Algorithm
“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.” 1980, C.A.R.Hoare
assertTrue(s.equals(s)); Randoop Example Date s = new Date(2006, 2, 14); Assert specification assertTrue(s.equals(s)); How do we randomly generate construct parameter values like 2006, 2, 14?
assertTrue(s.equals(s)); Randoop Example HashSet s = new HashSet(); Randomly pick a public method s.add(“”); Assert specification assertTrue(s.equals(s)); The default value for String “” is used since there is no other String in the system.
assertTrue(s.equals(s)); Randoop Example HashSet s = new HashSet(); Randomly pick a public method s.add(“”); Randomly pick a public method s.isEmpty(); Assert specification assertTrue(s.equals(s)); A method is probably an observer method if it has no parameters; it is public and non-static; it returns primitive values; and its name is size, count, length, toString, or begins with get or is.
Randoop Example Date d = new Date(2006, 2, 14); Randomly pick a public method d.setMonth(-1); // pre: argument >= 0 A sequence of method calls result in an exception is added to errSeqs.
assertTrue(s.equals(s)); Randoop Example Date d = new Date(2006, 2, 14); Randomly pick a public method d.setMonth(-1); // pre: argument >= 0 d.setDay(5); Assert specification assertTrue(s.equals(s));
Classifying a sequence contract violated? execute and check contracts minimize sequence yes start no components sequence redundant? no contract- violating test case yes discard sequence
Redundancy Checking Randoop maintains a set of objects for each type. A sequence (of method calls) is redundant if the objects created during its execution are members of the above set. Use equals() to compare Or user-defined more sophisticated checking
Some Randoop options Avoid use of null Biased random selection Favor smaller sequences Favor methods that have been less covered Use constants mined from source code statically… …and dynamically Object o = new Object(); LinkedList l = new LinkedList(); l.add(null); Object o = returnNull(); LinkedList l = new LinkedList(); l.add(o);
Research Question How effective would Randoop be? How do we judge whether one set of random test cases are better than another set?
Coverage Code block coverage: a set of random test cases are better if it covers more code blocks. For instance, consider each branch as a block Predicate coverage: given a set of predicates, a set of random test cases are better if it covers more valuations of the predicates. For instance, consider the predicates to be the propositions in the program.
Coverage Achieved by Randoop data structure time (s) branch cov. Bounded stack (30 LOC) 1 100% Unbounded stack (59 LOC) BS Tree (91 LOC) 96% Binomial heap (309 LOC) 84% Linked list (253 LOC) Tree map (370 LOC) 81% Heap array (71 LOC) Is this representative?
Predicate Coverage feedback-directed best systematic feedback-directed best systematic undirected random undirected random On binary tree and fibonacci heap, randoop achieves higher coverage than both systematic and undirected generation. On binomial heap and tree map, randoop achieves the same coverage as systematic generation, but does so faster. feedback-directed feedback-directed best systematic best systematic undirected random undirected random
Bug Detection LOC Classes JDK (2 libraries) 53K 272 (java.util, javax.xml) 53K 272 Apache commons (5 libraries) (logging, primitives, chain jelly, math, collections) 114K 974 .Net framework (5 libraries) 582K 3330 A C How would Korat perform on these examples?
Methodology Ran Randoop on each library Contracts: Used default time limit (2 minutes) Contracts: o.equals(o)==true o.equals(o) throws no exception o.hashCode() throws no exception o.toString() throw no exception No null inputs and: Java: No NullPointerEexceptions .NET: No NPEs, out-of-bounds, of illegal state exceptions
Results JDK 32 29 8 Apache commons 187 6 .Net framework 192 Total 411 test cases output error-revealing tests cases distinct errors JDK 32 29 8 Apache commons 187 6 .Net framework 192 Total 411 250 206
Errors found: examples JDK Collections classes have 4 methods that create objects violating o.equals(o) contract Javax.xml creates objects that cause hashCode and toString to crash, even though objects are well-formed XML constructs Apache libraries have constructors that leave fields unset, leading to NPE on calls of equals, hashCode and toString (this only counts as one bug) Many Apache classes require a call of an init() method before object is legal—led to many false positives .Net framework has at least 175 methods that throw an exception forbidden by the library specification (NPE, out-of-bounds, of illegal state exception) .Net framework has 8 methods that violate o.equals(o) .Net framework loops forever on a legal but unexpected input
Regression testing Randoop can create regression oracles Generated test cases using JDK 1.5 Randoop generated 41K regression test cases Ran resulting test cases on JDK 1.6 Beta 25 test cases failed Sun’s implementation of the JDK 73 test cases failed Failing test cases pointed to 12 distinct errors These errors were not found by the extensive compliance test suite that Sun provides to JDK developers Object o = new Object(); LinkedList l = new LinkedList(); l.addFirst(o); l.add(o); assertEquals(2, l.size()); // expected to pass assertEquals(false, l.isEmpty()); // expected to pass
Evaluation: summary Feedback-directed random test generation: Is effective at finding errors Discovered several errors in real code (e.g. JDK, .NET framework core libraries) Can outperform systematic input generation On previous benchmarks and metrics (coverage), and On a new, larger corpus of subjects, measuring error detection Can outperform undirected random test generation
Conclusion Feedback-directed random test generation Randoop: Finds errors in widely-used, well-tested libraries Can outperform systematic test generation Can outperform undirected test generation Randoop: Easy to use—just point at a set of classes Has real clients: used by product groups at Microsoft A mid-point in the systematic-random space of input generation techniques
Exercise 2 Apply Randoop, manually, to OrderSet.java To create 2 valid tests, one redundant test and one illegal sequence. Create a test case to expose the bug.
How do we improve Korat or Randoop? Research Question How do we improve Korat or Randoop?