Finding Errors in.NET with Feedback-Directed Random Testing By Carlos Pacheco, Shuvendu K. Lahiri and Thomas Ball Presented by Bob Mazzi 10/7/08.

Slides:

Advertisements

Similar presentations

Object Oriented Analysis And Design-IT0207 iiI Semester

Advertisements

Operating-System Structures

Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.

Configuration management

Configuration management

GUI Testing By Norbert Haché. Contents b What is GUI testing b Elements of GUI testing b Old Approach (TRUMP Project) b Scripting b Capture / Replay b.

Test process essentials Riitta Viitamäki,

PC Encryption installation progress/password screen Includes comments from: Encryption team Sarah Deane Tony Stieber Selected people who took part in the.

Lectures on File Management

An Empirical Study of the Reliability in UNIX Utilities Barton Miller Lars Fredriksen Brysn So Presented by Liping Cai.

Difference Engine: Harnessing Memory Redundancy in Virtual Machines by Diwaker Gupta et al. presented by Jonathan Berkhahn.

Tutorial 8: Developing an Excel Application

Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.

Programming Types of Testing.

Software Quality Assurance Inspection by Ross Simmerman Software developers follow a method of software quality assurance and try to eliminate bugs prior.

Software Testing. Overview Definition of Software Testing Problems with Testing Benefits of Testing Effective Methods for Testing.

© The McGraw-Hill Companies, 2006 Chapter 9 Software quality.

03/09/2007CSCI 315 Operating Systems Design1 Memory Management Notice: The slides for this lecture have been largely based on those accompanying the textbook.

Testing - an Overview September 10, What is it, Why do it? Testing is a set of activities aimed at validating that an attribute or capability.

Maintaining and Updating Windows Server 2008

Desktop Security: Worms and Viruses Brian Arkills, C&C NDC-Sysmgt.

Terms: Test (Case) vs. Test Suite

Presenter: Shant Mandossian EFFECTIVE TESTING OF HEALTHCARE SIMULATION SOFTWARE.

Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.

SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.

REPETITION STRUCTURES. Topics Introduction to Repetition Structures The while Loop: a Condition- Controlled Loop The for Loop: a Count-Controlled Loop.

Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.

University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.

Problem Determination Your mind is your most important tool!

Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.

1 Quality Center 10.0 NOTE: Uninstall the current version of QC before downloading QC All QC 10.0 documents can be located on the BI Shared Services.

Testing Methods Carl Smith National Certificate Year 2 – Unit 4.

From Quality Control to Quality Assurance…and Beyond Alan Page Microsoft.

Grey Box testing Tor Stålhane. What is Grey Box testing Grey Box testing is testing done with limited knowledge of the internal of the system. Grey Box.

How to Run a Scenario In HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>

TESTING LEVELS Unit Testing Integration Testing System Testing Acceptance Testing.

4/5/20071 The LAW (Linux Applications on Windows) Project Sudhamsh Reddy University of Texas at Arlington.

BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.

Software Configuration Management (SCM). Product Developer Disciplines One view of the world is that there are three types of activities are required.

Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.

Directed Random Testing Evaluation. FDRT evaluation: high-level – Evaluate coverage and error-detection ability large, real, and stable libraries tot.

WinCvs. WinCVS WinCvs is a window based version control system. Use WinCvs when  You want to save every version of your file you have ever created. CVS.

Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.

Chapter 8 Testing. Principles of Object-Oriented Testing Å Object-oriented systems are built out of two or more interrelated objects Å Determining the.

CSC 480 Software Engineering Test Planning. Test Cases and Test Plans A test case is an explicit set of instructions designed to detect a particular class.

Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.

PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.

HNDIT23082 Lecture 09:Software Testing. Validations and Verification Validation and verification ( V & V ) is the name given to the checking and analysis.

CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 7, 2010.

Software testing techniques Software testing techniques REGRESSION TESTING Presentation on the seminar Kaunas University of Technology.

How to Fix Missing WMVCore.dll Error in Windows 10

Maintaining and Updating Windows Server 2008 Lesson 8.

CACI Proprietary Information | Date 1 PD² v4.2 Increment 2 SR13 and FPDS Engine v3.5 Database Upgrade Name: Semarria Rosemond Title: Systems Analyst, Lead.

REGRESSION TESTING Audrius Čėsna IFM-0/2. Regression testing is any type of software testing that seeks to uncover new errors, or regressions, in existing.

Wildfire Costing (WFCST) for Recorders A component of Wildfire One Fire Season 2016.

CMPT 275 TEAM DIRECTORIES. One Sentence Summary The Study Buddy is: a tool to help users study to improve their grades by simulating a multiple choice.

M M Waseem Iqbal.  Cause: Unverified/unsanitized user input  Effect: the application runs unintended SQL code.  Attack is particularly effective if.

Topics Introduction to Repetition Structures

Introduction to Operating System (OS)

Lecture 09:Software Testing

Chapter 2: System Structures

Different Testing Methodology

Please use speaker notes for additional information!

CSC-682 Advanced Computer Security

Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.

Lab 8: GUI testing Software Testing LTAT

Presentation transcript:

Finding Errors in.NET with Feedback-Directed Random Testing By Carlos Pacheco, Shuvendu K. Lahiri and Thomas Ball Presented by Bob Mazzi 10/7/08

Introduction Testing software is expensive –Testing is 50% of total software cost –Microsoft has about one tester for each developer –Time consuming part of development cycle –Limited by testers' ability to determine what to test Random Testing –Overall Effectiveness still unproven –Individual effectiveness seems promising

This Paper’s Contributions Goals of this research –Studies needed in real world situations –Industrial Environment vs. Research Environment –Are errors found significant enough to justify testing and correction? Case Study –This study confirms Feedback-Directed Random testing can find errors that other forms of testing had not found in an Industrial Environment. –New information developed to compare different types of testing. Specific Results –30 new errors found –Errors found in other testing tools –Error detection rate diminished to nil after 150 hours of testing.

Overview of.Net Component under test Critical component 100K Lines of code 5 Years of development Currently about 40 Developers working on this component Currently 40 Test Engineers responsible for testing this component Component is used as a component of many MS Applications

Current Testing Approach Developers –Minimal testing of units as work is performed –Some design for testing Test Engineers –Manual testing –Internally developed testing tools Beta testing –Thousands of users within Microsoft use this component as part of the projects that they develop. –A large number of end users who install Beta versions of software that rely upon this component also perform beta and production testing.

Implications of Current Testing Approach Errors found Rate was high earlier in cycle Errors found Rate using existing testing team has diminished to 20 per man year Component is mature and stable One of the stated goals of this study as proposed was to determine if this was “Good Enough”, or if Feedback-Directed Random testing could improve this component further.

This Paper: Feedback-Directed Random testing Addresses Automatic Generation of Unit tests Randoop = RANDom tester for Object Oriented Programs

How Randoop works Inputs –Module to develop test cases for –How long to run Randoop –Optional Configuration files ( areas to not test as they contain known errors not yet fixed ) Process –Randomly determine method call to test –Apply test inputs to method –Review outputs looking for Error-Revealing sequences –Save as method sequence Outputs –Test cases that should not fail –Test cases that should fail

Insulating Randoop from the Operating System The problem –Because Randoop is testing a component that accesses the OS directly at a low level some of the test runs will cause a crash by generating a method sequence interacting inappropriately with the OS. The solution –Insulate Randoop from the OS by placing it in a “wrapper”. –The wrapper acts as a virtual machine so that a crash case can be terminated. –Once this happens an new instance can be started and testing can continue.

Here’s How Randoop Works What Randoop does The Randoop wrapper is designed to spawn a new instance of Randoop As Randoop creates a sequence to explore it records the sequence before attempting to execute it When a crash occurs the process starts again Methods that might cause a crash appropriately can be excluded from being explored again. Methods that should NOT have crashed can be explored further as they may be part of an Error-Revealing sequence.

Using Randoop Randoop was provided to test team along with instructions Initial use was using default settings, with no configuration files As errors were identified they were added to configuration files to prevent generation of unproductive test cases Regular meetings were held to discuss use of Randoop Some of these meetings resulted in suggestions for Randoop changes which were implemented during the study As the Case Study progressed the test team started to use it in more sophisticated ways to target specific areas and use longer test generation runs.

Overall Results 30 serious errors were detected These were previously unknown errors. Time spent over the test period included reviews of Error Revealing tests Each error found - Used 5 hours of CPU time - ½ hour of tester time. Prior testing averaged approximately 100 hours of tester time per error found.

Error Characteristics New errors found in previously well tested code. –A previously untested path was explored that caused an error. –Analysis revealed that prior testing was not looking for illegal reference addresses –Additional testing was implemented to look for this type of error in other places Output of Randoop ( test cases ) used as input to other testing tools. –During the test study a request was made to modify Randoop to optionally allow all test cases to be output –Test cases used as input to other tools –Other errors were found using these tools –Test cases from Randoop did reveal errors when run in Stress and Concurrency tests

Error Characteristics Errors were found that pointed to errors in other testing tools –Testing revealed that an output message describing a specific error was missing in the production build –The tool that was being used did not test correctly for some missing –By correcting this issue the other test tool was corrected Corner Cases and other testing –Unexpected tests were generated that found areas that were not covered by existing tests –New ways to apply other tests were found –Test cases were generated that identified errors that revealed unsuspected lacks in the program design –Additional manual tests were developed –Manual testing policies were updated

General Comparisons to other Test Generation tools. Randoop was able to find previously unknown errors not found by other test tools Randoop was not able to find some known errors Randoop did not have a clear stopping point Randoop has good performance allowing it to be applied to a larger component and develop test cases quicker than another existing similar tool Randoop as a Random Test Case generation tool has a basically unbiased approach

The Plateau effect? During testing it was noticed that the rate of errors being found appeared to diminish in steps as the testing proceeded Once the final Plateau was reached the found error rate appeared to drop to zero This “Plateau” was first noticed on a single PC –First 2 hours found 10 errors, a rate of 5 errors per hour –Next 10 hours found 20 additional errors, a rate of 2 errors per hour. –Further testing did not reveal additional errors Later this effect was noticed on a large cluster of PC’s as well –Additional new errors detected –Decreasing rate of errors found

Related Work Random testing has been used in many other testing areas –Unix Utilities - Miller et al 90 –Windows GUI applications - Forrester and Miller 00 –Object-oriented code - Csallner et al 04, Pacheco and Ernst 05, Jartege 05 Some approaches combine Random testing with some form of direction –Ferguson and Korel 96, Godefroid et al ( Dart ) 05, Sen et al ( Cute ) 06 There is not a consensus on the validity of Random testing –Meyers et al 04, Hamlet and Taylor 90, Ntafos 98, Groce et al 07, Pacheco et al 07 There is relatively little research on true random testing as compared to forms of directed testing –Ferguson and Korel 96, Marinov et al 03, Visor et al 25 06

Conclusions This case study was designed to Test in a Production Environment Test on a Mature, Well tested product Implementing Randoop finds errors that prior testing did not Randoop did NOT find some errors that other test tools did Some errors found point to other issues Faults in other tests Omissions in prior testing process

Comments – Future Work Comparisons to other Test Generation tools –Performance was stated as superior to other tools –Randoop is cited as being unbiased in comparison to other tools Coverage – Lack of stopping point –Because of the particulars of this component the number of possible tests becomes extremely large –This number itself may not be readily calculable –Is there a valid stopping point? Plateau effect –Rate of errors found is not consistent –Rate drops at two points 5 per hr. > 2 per hr. > No additional? –Is this valid and why?

Discussion

Comments Error in number of Errors? –There also seems to be an error in the paper regarding number of errors found. In the initial comments on the Plateau effect they stated that they found 30 errors ( 10 errors in first 2 hours plus 20 errors in the next 10 hours ). Later when they installed Randoop on a cluster they found an unspecified number of additional errors. Claim that Randoop is Unbiased –As Randoop was not able to find other known errors I find this item less than convincing. Is it possible that Randoop is indeed biased internally in some fashion such that it missed the other known issues? I would be much more comfortable with stating that it is “relatively unbiased” or “differently biased” or even “significantly less biased” than stating unbiased outright when I do not know why it could not find the other errors. Regardless of the terminology, it is obvious that it is either less or differently biased as it found errors that were previously unknown

Comments Plateau effect –In this type of production environment the best approach may be to dedicate a CPU to run Randoop continuously. By only stopping the execution to modify configuration files to exclude found and unfixed errors or when a fresh build of the component is available it can be stated that Randoop has found all of the errors that it is capable of finding at any specific point in time. The cost of running this tool continuously should be compared to the cost of a test engineer. If the burdened cost of the engineer is 120k per year that relates to 6k per error found. If Randoop costs the same per year as a person and finds a new error every two weeks it will still find more in a year than the person. I find it unlikely that in 150 hours of CPU time ( less than a week of real time ) where it found 30 errors, that it suddenly hit a plateau where it had found all of the detectable errors that it was capable of finding. –If the Plateau was truly “flat”, the cluster testing should not have revealed any additional errors. In practice, it did reveal some additional errors. This indicates that all findable errors had not been found. They go on to state that this was done using different seeds and configuration files. This leads me to question just how random Randoop is. It would be interesting to try multiple runs with a single seed and different configuration files as well as runs with a single configuration file and different seeds. Another question is to do with the Randoop Wrapper. Early in the paper it states that the Wrapper respawns Randoop using a different seed, yet the cluster testing states that a different seed was entered. Does the Wrapper generate a truly random seed to start Randoop?