Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.

Slides:



Advertisements
Similar presentations
Object Oriented Analysis And Design-IT0207 iiI Semester
Advertisements

Automating Software Module Testing for FAA Certification Usha Santhanam The Boeing Company.
Feedback-directed Random Test Generation (to appear in ICSE 2007) Carlos Pacheco Shuvendu Lahiri Michael Ernst Thomas Ball MIT Microsoft Research January.
Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.
Leonardo de Moura Microsoft Research. Z3 is a new solver developed at Microsoft Research. Development/Research driven by internal customers. Free for.
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Feedback-Directed Random Test Generation Automatic Testing & Validation CSI5118 By Wan Bo.
Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.
1 Testing the OPN Language: Rule Coverage and Fuzz Testing Wujie Zheng.
Software Quality Assurance Inspection by Ross Simmerman Software developers follow a method of software quality assurance and try to eliminate bugs prior.
Software testing.
Fuzzing Dan Fleck CS 469: Security Engineering Sources:
Documentation Testing
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Software Testing and Quality Assurance
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
Software Testing. Overview Definition of Software Testing Problems with Testing Benefits of Testing Effective Methods for Testing.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
SE 450 Software Processes & Product Metrics 1 Defect Removal.
Finding Errors in.NET with Feedback-Directed Random Testing By Carlos Pacheco, Shuvendu K. Lahiri and Thomas Ball Presented by Bob Mazzi 10/7/08.
Swami NatarajanJuly 14, 2015 RIT Software Engineering Reliability: Introduction.
Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State Model Checking -Shreyas Ravindra.
1 Joe Meehean. 2 Testing is the process of executing a program with the intent of finding errors. -Glenford Myers.
Operational Java for Technical Committee.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
Feed Back Directed Random Test Generation Carlos Pacheco1, Shuvendu K. Lahiri2, Michael D. Ernst1, and Thomas Ball2 1MIT CSAIL, 2Microsoft Research Presented.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 23 Slide 1 Software testing Slightly adapted by Anders Børjesson.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
CompSci 230 Software Design and Construction
© 2012 IBM Corporation Rational Insight | Back to Basis Series Chao Zhang Unit Testing.
University of Coimbra, DEI-CISUC
Michael Ernst, page 1 Collaborative Learning for Security and Repair in Application Communities Performers: MIT and Determina Michael Ernst MIT Computer.
Chapter 8 – Software Testing Lecture 1 1Chapter 8 Software testing The bearing of a child takes nine months, no matter how many women are assigned. Many.
Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.
From Quality Control to Quality Assurance…and Beyond Alan Page Microsoft.
16 October Reminder Types of Testing: Purpose  Functional testing  Usability testing  Conformance testing  Performance testing  Acceptance.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
(1) A beginners guide to testing Philip Johnson Collaborative Software Development Laboratory Information and Computer Sciences University of Hawaii Honolulu.
jFuzz – Java based Whitebox Fuzzing
Feedback-directed Random Test Generation Carlos Pacheco Shuvendu Lahiri Michael Ernst Thomas Ball MIT Microsoft Research January 19, 2007.
Week 14 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.
Directed Random Testing Evaluation. FDRT evaluation: high-level – Evaluate coverage and error-detection ability large, real, and stable libraries tot.
Software Engineering Saeed Akhtar The University of Lahore.
Testing and inspecting to ensure high quality An extreme and easily understood kind of failure is an outright crash. However, any violation of requirements.
G.Govi CERN/IT-DB 1 September 26, 2003 POOL Integration, Testing and Release Procedure Integration  Packages structure  External dependencies  Configuration.
Properties Incompleteness Evaluation by Functional Verification IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO. 4, APRIL
Computer and Programming. Computer Basics: Outline Hardware and Memory Programs Programming Languages and Compilers.
PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.
Automated Test Generation CS Outline Previously: Random testing (Fuzzing) – Security, mobile apps, concurrency Systematic testing: Korat – Linked.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
( = “unknown yet”) Our novel symbolic execution framework: - extends model checking to programs that have complex inputs with unbounded (very large) data.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 7, 2010.
Random Test Generation of Unit Tests: Randoop Experience
PROGRAMMING FUNDAMENTALS INTRODUCTION TO PROGRAMMING. Computer Programming Concepts. Flowchart. Structured Programming Design. Implementation Documentation.
Lecturer: Eng. Mohamed Adam Isak PH.D Researcher in CS M.Sc. and B.Sc. of Information Technology Engineering, Lecturer in University of Somalia and Mogadishu.
Cs498dm Software Testing Darko Marinov January 24, 2012.
Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.
Regression Testing with its types
Chapter 8 – Software Testing
Eclat: Automatic Generation and Classification of Test Inputs
IMPACTED TESTS BASED ON
Testing and Test-Driven Development CSC 4700 Software Engineering
Software testing.
Chapter 1 Introduction(1.1)
Chapter 7 Software Testing.
Presentation transcript:

Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008

Feedback-directed random testing (FDRT) classes under test properties to check feedback-directed random test generator failing test cases

Feedback-directed random testing (FDRT) classes under test properties to check feedback-directed random test generator failing test cases java.util.Collections java.util.ArrayList java.util.TreeSet java.util.LinkedList...

Feedback-directed random testing (FDRT) classes under test properties to check feedback-directed random test generator failing test cases java.util.Collections java.util.ArrayList java.util.TreeSet java.util.LinkedList... Reflexivity of equality:  o != null : o.equals(o) == true

Feedback-directed random testing (FDRT) classes under test properties to check feedback-directed random test generator failing test cases java.util.Collections java.util.ArrayList java.util.TreeSet java.util.LinkedList... Reflexivity of equality:  o != null : o.equals(o) == true public void test() { Object o = new Object(); ArrayList a = new ArrayList(); a.add(o); TreeSet ts = new TreeSet(a); Set us = Collections.unmodifiableSet(ts); // Fails at runtime. assertTrue(us.equals(us)); }

Technique overview Creates method sequences incrementally Uses runtime information to guide the generation Avoids illegal inputs 5 Feedback-Directed Random Test Generation Pacheco, Lahiri, Ball and Ernst ICSE 2007 normalerror revealing exceptio n throwing output as tests used to create larger sequences discarded

Prior experimental evaluation ( ICSE 2007 ) 6 Compared with other techniques −Model checking, symbolic execution, traditional random testing On collection classes (lists, sets, maps, etc.) −FDRT achieved equal or higher code coverage in less time On a large benchmark of programs (750KLOC) −FDRT revealed more errors

Goal of the Case Study Evaluate FDRT’s effectiveness in an industrial setting −Error-revealing effectiveness −Cost effectiveness −Usability These are important questions to ask about any test generation technique 7

Case study structure Asked engineers from a test team at Microsoft to use FDRT on their code base over a period of 2 months. We provided −A tool implementing FDRT −Technical support for the tool (bug fixes bugs, feature requests) We met on a regular basis (approx. every 2 weeks) −Asked team for experience and results 8

Randoop FDRT.NET assembly Failing C# Test Cases Properties checked: −sequence does not lead to runtime assertion violation −sequence does not lead to runtime access violation −executing process should not crash 9

Subject program Test team responsible for a critical.NET component 100KLOC, large API, used by all.NET applications Highly stable, heavily tested −High reliability particularly important for this component −200 man years of testing effort (40 testers over 5 years) −Test engineer finds 20 new errors per year on average −High bar for any new test generation technique Many automatic techniques already applied 10

Discussion outline Results overview Error-revealing effectiveness −Kinds of errors, examples −Comparison with other techniques Cost effectiveness −Earlier/later stages 11

Case study results: overview 12 Human time spent interacting with Randoop 15 hours CPU time running Randoop150 hours Total distinct method sequences 4 million New errors revealed30

Error-revealing effectiveness Randoop revealed 30 new errors in 15 hours of human effort. (i.e. 1 new per 30 minutes) This time included: interacting with Randoop inspecting the resulting tests discarding redundant failures A test engineer discovers on average 1 new error per 100 hours of effort. 13

Example error 1: memory management Component includes memory-managed and native code If native call manipulates references, must inform garbage collector of changes Previously untested path in native code reported a new reference to an invalid address This error was in code for which existing tests achieved 100% branch coverage 14

Example error 2: missing resource string When exception is raised, component finds message in resource file Rarely-used exception was missing message in file Attempting lookup led to assertion violation Two errors: −Missing message in resource file −Error in tool that verified state of resource file 15

Errors revealed by expanding Randoop's scope Test team also used Randoop’s tests as input to other tools Used test inputs to drive other tools Expanded the scope of the exploration and the types of errors revealed beyond those that Randoop could find. For example, team discovered concurrency errors this way 16

Discussion outline Results overview Error-revealing effectiveness −Kinds of errors, examples −Comparison with other techniques Cost effectiveness −Earlier/later stages 17

Traditional random testing Randoop found errors not caught by fuzz testing Fuzz testing’s domain is files, stream, protocols Randoop’s domain is method sequences Think of Randoop as a smart fuzzer for APIs 18

Symbolic execution Concurrently with Randoop, test team used a method sequence generator based on symbolic execution −Conceptually more powerful than FDRT Symbolic tool found no errors over the same period of time, on the same subject program Symbolic approach achieved higher coverage on classes that −Can be tested in isolation −Do not go beyond managed code realm 19

Discussion outline Results overview Error-revealing effectiveness −Kinds of errors, examples −Comparison with other techniques Cost effectiveness −Earlier/later stages 20

The Plateau Effect Randoop was cost effective during the span of the study After this initial period of effectiveness, Randoop ceased to reveal errors After the study, test team made a parallel run of Randoop −Dozens of machines, hundreds of machine hours −Each machine with a different random seed −Found fewer errors than it first 2 hours of use on a single machine 21

Overcoming the plateau Reasons for the plateau −Spends majority of time on subset classes −Cannot cover some branches Work remains to be done on new random strategies Hybrid techniques show promise −Random/symbolic −Random/enumerative 22

Conclusion Feedback-directed random testing −Effective in an industrial setting Randoop used internally at Microsoft −Added to list of recommended tools for other product groups −Has revealed dozens more errors in other products Random testing techniques are effective in industry −Find deep and critical errors −Scalability yields impact 23

Randoop for Java 24 Google “randoop” Has been used in research projects and courses Version 1.2 just released