Effectively Prioritizing Tests in Development Environment

Slides:



Advertisements
Similar presentations
Analyzing Regression Test Selection Techniques
Advertisements

Defect testing Objectives
Test Yaodong Bi.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Mahadevan Subramaniam and Bo Guo University of Nebraska at Omaha An Approach for Selecting Tests with Provable Guarantees.
CS527: Advanced Topics in Software Engineering (Software Testing and Analysis) Darko Marinov September 18, 2008.
Prioritizing User-session-based Test Cases for Web Applications Testing Sreedevi Sampath, Renne C. Bryce, Gokulanand Viswanath, Vani Kandimalla, A.Gunes.
CMSC 345, Version 11/07 SD Vick from S. Mitchell Software Testing.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
SOA for An Empirical Study of Regression Test Selection Techniques. by TODD L. GRAVES Los Alamos National Laboratory MARY JEAN HARROLD Georgia Institute.
Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
TEST CASE DESIGN Prepared by: Fatih Kızkun. OUTLINE Introduction –Importance of Test –Essential Test Case Development A Variety of Test Methods –Risk.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
Topics in Software Dynamic White-box Testing Part 2: Data-flow Testing
Data Flow Testing Data flow testing(DFT) is NOT directly related to the design diagrams of data-flow-diagrams(DFD). It is a form of structural testing.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
AMOST Experimental Comparison of Code-Based and Model-Based Test Prioritization Bogdan Korel Computer Science Department Illinois Institute of Technology.
1 BTEC HNC Systems Support Castle College 2007/8 Systems Analysis Lecture 9 Introduction to Design.
Objectives Understand the basic concepts and definitions relating to testing, like error, fault, failure, test case, test suite, test harness. Explore.
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
Software Testing.
RUP Implementation and Testing
Chapter 14: Inspection  Basic Concept and Generic Process  Fagan Inspection  Other Inspection and Related Activities.
Testing : A Roadmap Mary Jean Harrold Georgia Institute of Technology Presented by : Navpreet Bawa.
Testing Basics of Testing Presented by: Vijay.C.G – Glister Tech.
Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.
Regression Testing. 2  So far  Unit testing  System testing  Test coverage  All of these are about the first round of testing  Testing is performed.
1 C++ Plus Data Structures Nell Dale Chapter 1 Software Engineering Principles Modified from the Slides made by Sylvia Sorkin, Community College of Baltimore.
Chapter 13: Regression Testing Omar Meqdadi SE 3860 Lecture 13 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta
Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Comparing model-based and dynamic event-extraction based GUI testing techniques : An empirical study Gigon Bae, Gregg Rothermel, Doo-Hwan Bae The Journal.
CS451 Lecture 10: Software Testing Yugi Lee STB #555 (816)
CAPP: Change-Aware Preemption Prioritization Vilas Jagannath, Qingzhou Luo, Darko Marinov Sep 6 th 2011.
Software Quality Assurance and Testing Fazal Rehman Shamil.
WHAT IS THIS? Clue…it’s a drink SIMPLE SEQUENCE CONTROL STRUCTURE Introduction A computer is an extremely powerful, fast machine. In less than a second,
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
1 Iterative Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program.
1PLDI 2000 Off-line Variable Substitution for Scaling Points-to Analysis Atanas (Nasko) Rountev PROLANGS Group Rutgers University Satish Chandra Bell Labs.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
CS223: Software Engineering Lecture 21: Unit Testing Metric.
Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.
Defect testing Testing programs to establish the presence of system defects.
Testing (final thoughts). equals() and hashCode() Important when using Hash-based containers class Duration { public final int min; public final int sec;
Testing Integral part of the software development process.
 Problem Analysis  Coding  Debugging  Testing.
Regression Testing with its types
C++ Plus Data Structures
Regression Testing.
Capriccio – A Thread Model
Optimization of PHEV/EV Battery Charging
It is great that we automate our tests, but why are they so bad?
Software testing strategies 2
Data Science with Python
Mary Jean Harrold Georgia Institute of Technology
Software visualization and analysis tool box
IMPORTANT NOTICE TO STUDENTS:
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Software Testing COM /12/2019 Testing/Spring 98.
CUTE: A Concolic Unit Testing Engine for C
Instruction Scheduling Hal Perkins Autumn 2011
Presentation transcript:

Effectively Prioritizing Tests in Development Environment Hi, I am Jay Thiagarajan. I work in a group called Programmers Productivity Research Center at Microsoft Research Today, I will be presenting work done by Amitabh Srivastava and myself while trying to develop a test case prioritization system for use in large scale development environment. 15 seconds Amitabh Srivastava Jay Thiagarajan PPRC, Microsoft Research

Outline Context and Motivation Test Prioritization Systems Measurements and Results Summary Let me give a outline of the talk first. I will describe the scenarios we tried to attack by this system. Then I will describe the existing test prioritization systems and why we diverged from them to use a simple heuristic algorithm. I will also present some results on MS product binaries and conclude my talk with a Question/Answer session. 15 seconds The talk is divided into 4 parts. Why we did this system ? What we did ? How well we did ? Why ? What ? How well we did ?

Motivation: Scenarios Pre-checkin tests Developers before check-in Build Verification tests (BVT) Build system after each build Regression tests Testing team after each build Hot fixes for critical bugs For the sake of simplicity, let us assume a simple model of development process. Developers write the code depending on specification and check the code into the source control system. Before checking the code, they run a few tests to make sure that their changes does not keep the program from being built and also to catch a moderate number of defects. The build system, which builds the program (daily or after every check-in), runs a few verification tests after the program is built. If the tests pass thru, then the build is passed to the testing team which takes responsibility to verify the validity of the build thru regression testing. Even during full testing it is advantageous to detect the defects as early as possible, e.g. on day 1 rather than day 21 Swat Teams which respond to a customer problems by responding with hot-fixes also do testing To address these scenarios effectively, developers and testers must be able to run the rights tests at the right time. New defects recently introduced into the system are most likely to be from recent code changes. It is effective to focus testing efforts on parts of program affected by changes. (90 seconds) Most important motto is to Catch the bugs as early as possible. Day 1 than day 21.

Using program changes Source code differencing S. Elbaum, A. Malishevsky & G. Rothermel “Test case prioritization: A family of empirical studies”, Feb. 2002 S. Elbaum, A. Malishevsky & G. Rothermel “Prioritizing test cases for regression testing” Aug. 2000 F. Vokolos & P. Frankl, “Pythia: a regression test selection tool based on text differencing”, May 1997 Program changes have been used for test selection and for test prioritization. Techniques used were source code differencing, data and control flow analysis, and coarse grained code entities to identify which parts of program might be affected by the changes. 10 seconds

Using program changes Data and control flow analysis Code entities T. Ball, “On the limit of control flow analysis for regression test selection” Mar. 1998 G. Rothermel and M.J. Harrold, “A Safe, Efficient Regression Test Selection Technique” Apr. 1997 Code entities Y. F. Chen, D.S. Rosenblum and K.P. Vo “TestTube: A System for Selective Regression Testing” May 1994 10 seconds

Analysis of various techniques Source code differencing Simple and fast Can be built using commonly available tools like “diff” Simple renaming of variable will trip off Will fail when macro definition changes To avoid these pitfalls, static analysis is needed Data and control flow analysis Flow analysis is difficult in languages like C/C++ with pointers, casts and aliasing Interprocedural data flow techniques are extremely expensive and difficult to implement in complex environment Problems : speed, different parsers for different languages. Solutions: working on a binary 30 seconds

Our Solution Focus on change from previous version Determine change at very fine granularity – basic block/instruction Operates on binary code Easier to integrate in production environment Scales well to compute results in minutes Simple heuristic algorithm to predict which part of code is impacted by the change 30 seconds Real world application. Time & effectiveness is critical. 1.8 million lines in 210 seconds All new bugs are due to new changes.

Test Effectiveness Infrastructure … TEST Coverage Tools Repository Coverage Magellan Old Build New Build Test Prioritization ECHELON Binary Diff This slide here explains the infrastructure we leverage to build this tool. We have a system called Magellan, which is a coverage collection and storage repository. As shown here testers conduct their tests on various systems and the coverage data is collected in a repository. Echelon uses the existing coverage data collected in the repository and uses the change information between two builds to list a prioritized set of tests 30 seconds BMAT/VULCAN Coverage Impact Analysis

Echelon : Test Prioritization Old Build New Build Block Change Analysis Magellan Repository (*link with symbol server for symbols) Binary Differences Coverage Impact Analysis Coverage for new build 15 seconds Test Prioritization Leverage what has already been tested Prioritized list of test cases

Block Change Analysis: Binary Matching Old Build New Build New Blocks Old Blocks (not changed) (changed) 15 seconds Accuracy and heuristic based. BMAT – Binary Matching [Wang, Pierce and McFarling JILP 2000]

Coverage Impact Analysis Terminology Trace: collection of one or more test cases Impacted Blocks: old modified and new blocks Compute the coverage of traces for the new build Coverage for old (unchanged and modified) blocks are same as the coverage for the old build Coverage for new nodes requires more analysis 15 seconds

Coverage Impact Analysis Predecessor Blocks (P) A Trace may cover a new block N if it covers at least one Predecessor block and at least one Successor Block Interprocedural edge New Block (N) If P or S is a new block, then its Predecessors or successors are used (iterative process) 30 seconds Successor Blocks (S)

Coverage Impact Analysis Limitations - New node may not be executed If there is a path from successor to predecessor If there are changes in control path due to data changes 15 seconds

Echelon : Test Case Prioritization Detects minimal sets of test cases that are likely to cover the impacted blocks (old changed and new blocks) Input is traces (test cases) and a set of impacted blocks Uses a greedy iterative algorithm for test selection 15 seconds

Echelon: Test Selection Impacted Block Map Set 1 T1 Weights T1 T2 T3 T4 T5 T2 5 2 4 1 3 2 4 1 3 1 1 Set 2 T3 T5 Set 3 30 seconds Weight is the number of impacted blocks each trace covers T4 Denotes that a trace T covers the impacted block

Echelon: Test Selection Output Ordered List of Traces Each set contains test cases that will give maximum coverage of Impacted nodes Trace T1 SET1 Trace T2 Trace T3 Gracefully handles the “main” modification case Trace T4 SET2 Trace T5 15 seconds Talk about options Trace T7 If all the test can be run, tests should be run in this order to maximize the chances of detecting failures early SET3 . Trace T8 . Trace Tm SETn

Analysis of results Three measurements of interest How many sequences of tests were formed ? How effective is the algorithm in practice ? How accurate is the algorithm in practice ? The first measure gives us a sense of how many tests remain in each sequences and so on. The second measure tries to figure out the accuracy of the heuristic algorithm in practice. The basic premise behind this is Echelon has to run fast and be fairly accurate (more than 80% accurate to be useful in practice) 15 seconds The user only cares about 1st and 2nd measure.

Details of BinaryE Version 1 Version 2 Date 12/11/2000 01/29/2001 Functions 31,020 31,026 Blocks 668,068 668,274 Arcs 1,097,294 1,097,650 File size 8,880,128 PDB size 22,602,752 22,651,904 Impacted Blocks 378 (220 N, 158 OC) Number of Traces 3128 # Source Lines ~1.8 Million 10 seconds Echelon takes ~210 seconds for this 8MB binary

10 seconds

10 seconds

10 seconds

Effectiveness of Echelon Important Measure of effectiveness is early defect detection Measured % of defects vs. % of unique defects in each sequence Unique defects are defects not detected by the previous sequence To measure the effectiveness, we took some sample binaries from development process. We measured the percentage of defects detected by each sequences. We also measured the percentage of unique defects detected by each sequence. Unique defects are defects not detected by the previous sequence 15 seconds

Effectiveness of Echelon Binary 410 kb, functions =2308 blocks 22446, 157 tests, 148 sequences (sequence 1 – 6 tests, 2-4 tests, 3 -2 tests, 4 – 1 test …) First sequence 81% of defects, second sequence, 62%, 6% unique. Rest of the sequences did not detect any unique defects 15 seconds

Effectiveness of Echelon Binary 400 kb, functions = 3675 blocks 23153, 221 tests, 176 sequences (sequence 1 – 4 tests, 2-4 tests, 3 -3 tests, 4 – 5 tests …) First sequence 85% of defects, second sequence 71%, 9% unique. Third sequence 57% of defects, 4% unique, The rest did not detect any unique defects. None of the tests detected the remaining 2%of the defects. 15 seconds

Blocks predicted hit that were not hit Limitations - New node may not be executed If there is a path from successor to predecessor If there are changes in control path due to data changes 10 seconds Blocks predicted hit that were not hit

Blocks predicted not hit that were actually hit 10 seconds Blocks predicted not hit that were actually hit (Blocks were target of indirect calls are being predicted as not hit)

Echelon Results: BinaryK Build 2470 Build 2480 Date 05/01/2001 05/23/2001 Functions 1,761 1,774 Blocks 32,012 32,135 Arcs 47,131 47,323 File size 882,688 894,464 Impacted Blocks 589 (350 N, 239 OC) Traces 56 10 seconds

Echelon Results: BinaryU Build 2470 Build 2480 Date 05/01/2001 05/23/2001 Functions 1,967 1,970 Blocks 30,916 31,003 Arcs 46,638 46,775 File size 528,384 528,896 Impacted Blocks 270 (190 N, 80 OC) Traces 56 10 seconds

Summary Binary based test prioritization approach can effectively prioritize tests in large scale development environment Simple heuristic with program change in fine granularity works well in practice Currently integrated into Microsoft Development process

Coverage Impact Analysis Echelon provides a number of options Control branch prediction Indirect calls : if N is target of an indirect call a trace needs to cover at least one of its successor block Future improvements include heuristic branch prediction Branch Prediction for Free [Ball, Larus] 15 seconds

Additional Motivation And of course to attend ISSTA and meet some great people

Echelon: Test Selection Options Calculations of weights can be extended, e.g. traces with great historical fault detection can be given additional weights Include time each test takes into calculation Print changed (modified or new) source code that may not be covered by any trace Print all source code lines that may not be covered by any trace 15 seconds