Tao Xie North Carolina State University. 2 Why benchmarks in testing research? Before 2000, a testing tool/technique paper without serious evaluation.

Slides:



Advertisements
Similar presentations
Java Programming: Advanced Topics 1 Collections and Utilities.
Advertisements

Transparency No. 1 Java Collection API : Built-in Data Structures for Java.
Feedback-directed Random Test Generation (to appear in ICSE 2007) Carlos Pacheco Shuvendu Lahiri Michael Ernst Thomas Ball MIT Microsoft Research January.
Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.
Cristian Cadar, Peter Boonstoppel, Dawson Engler RWset: Attacking Path Explosion in Constraint-Based Test Generation TACAS 2008, Budapest, Hungary ETAPS.
Korat Automated Testing Based on Java Predicates Chandrasekhar Boyapati, Sarfraz Khurshid, Darko Marinov MIT ISSTA 2002 Rome, Italy.
1 of 24 Automatic Extraction of Object-Oriented Observer Abstractions from Unit-Test Executions Dept. of Computer Science & Engineering University of Washington,
1 Advanced Data Structures. 2 Topics Data structures (old) stack, list, array, BST (new) Trees, heaps, union-find, hash tables, spatial, string Algorithm.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved L15 (Chapter 22) Java Collections.
1 State-Based Testing of Ajax Web Applications A. Marchetto, P. Tonella and F. Ricca CMSC737 Spring 2008 Shashvat A Thakor.
Liang, Introduction to Java Programming, Tenth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 20 Lists, Stacks, Queues, and Priority.
EFFICIENT EXPLICIT-STATE MODEL CHECKING FOR PROGRAMS WITH DYNAMICALLY ALLOCATED DATA Marcelo d’Amorim Ph.D. thesis committee: Gul Agha Jennifer Hou Darko.
Chapter 19 Java Data Structures
Korat: Automated Testing Based on Java Predicates Chandrasekhar Boyapati 1, Sarfraz Khurshid 2, and Darko Marinov 3 1 University of Michigan Ann Arbor.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 22 Lists, Stacks, Queues, and Priority.
Java Collections. Collections, Iterators, Algorithms CollectionsIteratorsAlgorithms.
CSE373 Optional Section Java Collections 11/12/2013 Luyi Lu.
Automated Diagnosis of Software Configuration Errors
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
M1G Introduction to Programming 2 4. Enhancing a class:Room.
Feed Back Directed Random Test Generation Carlos Pacheco1, Shuvendu K. Lahiri2, Michael D. Ernst1, and Thomas Ball2 1MIT CSAIL, 2Microsoft Research Presented.
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 15, 2011.
Java Programming: Advanced Topics 1 Collections and Wealth of Utilities.
Tao Xie North Carolina State University Supported by CACC/NSA Related projects supported in part by ARO, NSF, SOSI.
Symbolic Execution with Mixed Concrete-Symbolic Solving (SymCrete Execution) Jonathan Manos.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 22 Java Collections.
Collections F The limitations of arrays F Java Collection Framework hierarchy  Use the Iterator interface to traverse a collection  Set interface, HashSet,
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 22 Java Collections.
Java™ How to Program, 9/e Presented by: Dr. José M. Reyes Álamo © Copyright by Pearson Education, Inc. All Rights Reserved.
1 Java's Collection Framework By Rick Mercer with help from The Java Tutorial, The Collections Trail, by Joshua BlockThe Collections Trail.
JAVA COLLECTIONS LIBRARY School of Engineering and Computer Science, Victoria University of Wellington COMP T2, Lecture 2 Marcus Frean.
Java Collections An Introduction to Abstract Data Types, Data Structures, and Algorithms David A Watt and Deryck F Brown © 2001, D.A. Watt and D.F. Brown.
CIS 644 Aug. 25, 1999 tour of Java. First … about the media lectures… we are experimenting with the media format please give feedback.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 6: Exhaustive Bounded Testing and Feedback-Directed Random Testing.
111 © 2002, Cisco Systems, Inc. All rights reserved.
COLLECTIONS Byju Veedu s/Collection.html.
CSS446 Spring 2014 Nan Wang.  Java Collection Framework ◦ Set ◦ Map 2.
Code Contracts Parameterized Unit Tests Tao Xie. Example Unit Test Case = ? Outputs Expected Outputs Program + Test inputs Test Oracles 2 void addTest()
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 9, 2010.
Chapter 18 Java Collections Framework
Data structures and algorithms in the collection framework 1.
Mining Gigabytes of Dynamic Traces for Test Generation Suresh Thummalapenta North Carolina State University Peli de Halleux and Nikolai Tillmann Microsoft.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Feedback-directed Random Test Generation Carlos Pacheco Shuvendu Lahiri Michael Ernst Thomas Ball MIT Microsoft Research January 19, 2007.
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
Directed Random Testing Evaluation. FDRT evaluation: high-level – Evaluate coverage and error-detection ability large, real, and stable libraries tot.
Cooperative Developer Testing: Tao Xie North Carolina State University In collaboration with Xusheng ASE and Nikolai Tillmann, Peli de
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
Collections Data structures in Java. OBJECTIVE “ WHEN TO USE WHICH DATA STRUCTURE ” D e b u g.
Collections Mrs. C. Furman April 21, Collection Classes ArrayList and LinkedList implements List HashSet implements Set TreeSet implements SortedSet.
Improving Structural Testing of Object-Oriented Programs via Integrating Evolutionary Testing and Symbolic Execution Kobi Inkumsah Tao Xie Dept. of Computer.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Combining Static and Dynamic Reasoning for Bug Detection Yannis Smaragdakis and Christoph Csallner Elnatan Reisner – April 17, 2008.
JAVA COLLECTIONS LIBRARY School of Engineering and Computer Science, Victoria University of Wellington COMP T2, Lecture 2 Marcus Frean.
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 7, 2010.
Random Test Generation of Unit Tests: Randoop Experience
Symstra: A Framework for Generating Object-Oriented Unit Tests using Symbolic Execution Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin University.
Collections ABCD ABCD Head Node Tail Node array doubly linked list Traditional Arrays and linked list: Below is memory representation of traditional.
Using the Java Collection Libraries COMP 103 # T2
Some Collections: BAGS, SETS, and STACKS
Chapter 19 Java Data Structures
JAVA COLLECTIONS LIBRARY
Marcelo d’Amorim (UIUC)
CS240: Advanced Programming Concepts
Eclat: Automatic Generation and Classification of Test Inputs
Random Unit-Test Generation with MUT-aware
Software Testing: A Research Travelogue
50.530: Software Engineering
CS 240 – Advanced Programming Concepts
Presentation transcript:

Tao Xie North Carolina State University

2 Why benchmarks in testing research? Before 2000, a testing tool/technique paper without serious evaluation could appear in ICSE/FSE/ISSTA/ASE but NOT now! – a healthy trend Need benchmarks to justify the benefits of the proposed technique (often comparing with existing techniques)

3 Outline Survey of incomplete OO testing tool literature after 2000 for benchmarks Discussion on history Discussion on future

4 TestEra [Marinov and Khurshid ASE 2001] Data structures singly linked lists.mergeSort java.util.TreeMap.remove INS 3 methods Alloy-alpha 1 method Require Alloy models for class invariants Success criterion: # n-bounded exhaustive tests

5 Korat [Boyapati et al. ISSTA 2002] Data structures korat.examples.BinaryTree.remove korat.examples.HeapArray.extractMax java.util.LinkedList.reverse java.util.TreeMap.put java.util.HashSet.add ins.namespace.AVTree.lookup Require repOk for class invariants, finitization Success criterion: # n-bounded exhaustive tests

6 JCrasher [ Csallner and Smaragdakis S&P 2004]  Canvas 6 methods  P1 16 methods  P1 15 methods  P1 16 methods  P1 18 methods  P1 15 methods  BSTree 24 methods  UB-Stack 11 methods Success criterion: #real bugs among uncaught exceptions

7 Check ’n’ Crash [ Csallner and Smaragdakis ICSE 2005]  Canvas 6 methods  P1 16 methods  P1 15 methods  P1 16 methods  P1 18 methods  P1 15 methods  BSTree 24 methods  UB-Stack 11 methods  jaba 17.9 k LOC  jboss.jms 5.1 k LOC Success criterion: #real bugs among uncaught exceptions

8 DSD-Crasher [ Csallner and Smaragdakis ISSTA 2006]  jboss.jms 5K LOC  groovy 34 classes 2K LOC Success criterion: #real bugs among uncaught exceptions

9 eToc [ Tonella ISSTA 2004] Data structures  StringTokenizer 6 methods  BitSet 26 methods  HashMap 13 methods  LinkedList 23 methods  Stack 5 methods  TreeSet 15 methods Require a configuration file (so far manually prepared) Success criterion: branch coverage and % found seeded faults (5 seeded faults each class)

10 JPF [Visser et al. ISSTA 2004] Data structure  java.util.TreeMap 3 methods  deleteEntry  fixAfterDeletion  fixAfterInsertion Require a driver to close the environment Success criterion: branch coverage

11 JPF [Visser et al. ASE 2005] Data structure  java.util.TreeMap Require a driver to close the environment Success criterion: basic block coverage

12 JPF [Visser et al. ISSTA 2006] Data structures  BinTree 154 LOC  BinomialHeap 355 LOC  FibHeap 286 LOC  Partial java.util.TreeMap 580 LOC Require a driver to close the environment Success criterion: basic block coverage &predicate coverage

13 Rostra [Xie et al. ASE 2004]  Data structures  IntStack 4 methods  UBStack 10 methods  BSet 9 methods  BBag 8 methods  ShoppingCart 7 methods  BankAccount 6 methods  BinarySearchTree 10 methods  LinkedList 10 methods Require existing tests to provide method arguments Success criterion: branch coverage

14 Symstra [Xie et al. TACAS 2005]  Data structures  IntStack push,pop  UBStack push,pop  BinSearchTree insert,remove  BinomialHeap insert,extractMin, delete  LinkedList add,remove, removeLast  TreeMap put,remove  HeapArray insert,extractMax Require a driver to close the environment Success criterion: branch coverage

15 Symclat [d'Amorim et al. ASE 2006]  Data structures  UBStack8 11m, UBStack12 11m, UtilMDE 69m, BinarySearchTree 9m, StackAr 8m, StackLi 9m, IntegerSetAsHashSet 4m, Meter 3m, DLList 12m, OneWayList 10m, SLList 11m, OneWayList 12m, OneWayNode 10m, SLList 12m, TwoWayList 9m, RatPoly (46 versions) 17 Require initial tests Success criterion: #real bugs

16 Evacon [Inkumsah and Xie ASE 2008]  Data structures  BankAccount 6m, BinarySearchTree 16m, BinomialHeap 10m, BitSet 25m, DisjSet 6m, FibonacciHeapm, HashMap 10m, LinkedList 29m, ShoppingCart 6m, Stack 5m, StringTokenizer 5m, TreeMap 47m, TreeSet 13m Require a configuration file (so far manually prepared) Success criterion: branch coverage

17 Nighthawk [Andrews et al. ASE 2007] Data structures java.util.BitSet 16 methods java.util.HashMap 8 methods java.util.TreeMap9 methods BinTree, BHeap, FibHeap, TreeMap Java Collection and Map classes: ArrayList, EnumMap, HashMap, HashSet, Hashtable, LinkedList, Pqueue, Propeties, Stack, TreeMap, TreeSet, Vector Success criterion: block, line, condition coverage

18 Random Test Run Length and Effectiveness [ Andrews et al. ASE 2008] Other system test subjects Real buffer overflow bugs Data structures JUnit MoneyBag TreeMap Success criterion: a real bug in MoneyBag, a seeded bug in TreeMap

19 Delta Execution [d’Amorim et al. ISSTA 2007] 9 data structures with manually written drivers binheap, bst, deque, fibheap, heaparray, queue, stack, treemap, ubstack 4 classes in filesystem (the Daisy code) Require a driver to close the environment Success criterion: # n-bounded exhaustive tests

20 Incremental State-Space Exploration [Lauterburg et al. ICSE 2008] 9 data structures with manually written drivers binheap, bst, deque, fibheap, heaparray, queue, stack, treemap, ubstack 4 classes in filesystem (the Daisy code) Aodv: routing protocol for wireless ad hoc networks Require a driver to close the environment Success criterion: % time reduction over versions

21 ARTOO [Ciupa et al. ISSTA 2007] Eiffel Data structures  STRING 175 methods  PRIMES 75 methods  BOUNDED STACK 66 methods  HASH TABLE 135 methods  FRACTION144 methods  FRACTION2 45 methods  UTILS 32 methods  BANK ACCOUNT 35 methods Success criterion: #real bugs

22 ARTOO [Ciupa et al. ICSE 2008 ] Eiffel Data structures  ACTION SEQUENCE 156 methods  ARRAY 86 methods  ARRAYED LIST 39 methods  BOUNDED STACK 62 methods  FIXED TREE 125 methods  HASH TABLE 122 methods  LINKED LIST 106 methods  STRING 171 methods Success criterion: #real bugs

23 Randoop [ Pacheco et al. ICSE 2007 ] Data structures  BinTree, Bheap, FibHeap, TreeMap, BinTree, BHeap, FibHeap, TreeMap Java JDK 1.5  java.util 39KLOC 204 classes 1019 methods  javax.xml 14KLOC 68 classes 437 methods Jakarta Commons  chain 8KLOC 59 classes 226 methods  collections 61KLOC 402 classes 2412 methods  See next slide Success criterion: #real bugs

24 Randoop [ Pacheco et al. ICSE 2007 ] – cont. Jakarta Commons  …  jelly 14K 99c 724m, logging 4K 9c 1 40m, math 21K 111c 910m, primitives 6K 294c 1908m.NET libraries  ZedGraph 33K 125c 3096m .NET Framework: Mscorlib 185K 1439c 17763m, System.Data 196K 648c 11529m, System.Security 9K 128c 1175m, System.Xml 150K 686c 9914m, Web.Services 42K 304c 2527m

25 Randoop [ Pacheco et al. ISSTA 2008]  A core component of the.NET Framework  > 100KLOC Success criterion: #real bugs

26 Pex [ Tillmann and HalleuxTAP 2008]  A core component of the.NET Framework  > 10,000 public methods  Selected results being presented  9 classes (>100 blocks ~ >500 blocks) Success criterion: block, arc coverage, #real bugs

27 MSeqGen [Thummalapenta et al. ESEC/FSE2009]  QuickGraph  165 classes and interfaces with 5 KLOC  Facebook  285 classes and interfaces with 40 KLOC Success criterion: branch coverage

28 Dynamic Symbolic Execution Tools  How about DART, CUTE/jCUTE, Crest, EXE, EGT, KLEE, SAGE, Smart, Splat, Pex...  Non-OO vs. OO

29 Summary of Benchmarks  (Mostly) data-structure (DS) classes only  TestEra [ASE01], Korat [ISSTA02], JCrasher [S&E04], eToc [ISSTA04], JPF [ISSTA04, ASE05, ISSTA06], Rostra [ASE04], Symstra [TACAS05], Symclat [ASE06], Evacon [ASE08], Nighthawk [ASE07, ASE08], UI JPF extensions [ISSTA07, ICSE08], ARTOO [ISSTA07, ICSE08]  Non-DS classes  Check ’n’ Crash [ICSE05], DSD-Crasher[ISSTA06]  Randoop [ICSE07, ISSTA08]  Pex [TAP08]  MSeqGen [ESEC/FSE 09]

30 Open Questions – history  Why/how do authors select their benchmarks used in their evaluation? Or Why not select other benchmarks? (Your answers here!)  Reason 1:  Reason 2:  …

31 Open Questions – history (cont.)  Are data structures mandatory?  Like Siemens programs in fault localization as sanity check?  Are data structures sufficient?  How much would the results generalize to broader types of real-world applications?  How about libraries (in contrast to applications)?  High payoff in terms of testing efforts? More logics/challenging?

32 Open Questions – future  Shall we have categorized benchmarks? What categories (general, DS, GUI, DB, Web, network, embedded, string- intensive, pointer-intensive, state-dependent-intensive, ML, …,)?  What criteria shall we use to include/exclude benchmarks?  Where to contribute and share? (UNL SIR, a new repository?)  How to provide cross-language benchmarks?  How about test oracles (if we care more than coverage)?  How about evaluation criteria (structural coverage, seeded faults, uncaught exceptions, …)?

33 Open Questions – cont.  Caveats in benchmarking  Tools can be tuned to work best on well-accepted benchmarks but don’t work/generalize on other applications  …