Effectively Prioritizing Tests in Development Environment

Effectively Prioritizing Tests in Development Environment
Hi, I am Jay Thiagarajan. I work in a group called Programmers Productivity Research Center at Microsoft Research Today, I will be presenting work done by Amitabh Srivastava and myself while trying to develop a test case prioritization system for use in large scale development environment. 15 seconds Amitabh Srivastava Jay Thiagarajan PPRC, Microsoft Research

Outline Context and Motivation Test Prioritization Systems
Measurements and Results Summary Let me give a outline of the talk first. I will describe the scenarios we tried to attack by this system. Then I will describe the existing test prioritization systems and why we diverged from them to use a simple heuristic algorithm. I will also present some results on MS product binaries and conclude my talk with a Question/Answer session. 15 seconds The talk is divided into 4 parts. Why we did this system ? What we did ? How well we did ? Why ? What ? How well we did ?

Motivation: Scenarios
Pre-checkin tests Developers before check-in Build Verification tests (BVT) Build system after each build Regression tests Testing team after each build Hot fixes for critical bugs For the sake of simplicity, let us assume a simple model of development process. Developers write the code depending on specification and check the code into the source control system. Before checking the code, they run a few tests to make sure that their changes does not keep the program from being built and also to catch a moderate number of defects. The build system, which builds the program (daily or after every check-in), runs a few verification tests after the program is built. If the tests pass thru, then the build is passed to the testing team which takes responsibility to verify the validity of the build thru regression testing. Even during full testing it is advantageous to detect the defects as early as possible, e.g. on day 1 rather than day 21 Swat Teams which respond to a customer problems by responding with hot-fixes also do testing To address these scenarios effectively, developers and testers must be able to run the rights tests at the right time. New defects recently introduced into the system are most likely to be from recent code changes. It is effective to focus testing efforts on parts of program affected by changes. (90 seconds) Most important motto is to Catch the bugs as early as possible. Day 1 than day 21.

Using program changes Source code differencing
S. Elbaum, A. Malishevsky & G. Rothermel “Test case prioritization: A family of empirical studies”, Feb. 2002 S. Elbaum, A. Malishevsky & G. Rothermel “Prioritizing test cases for regression testing” Aug. 2000 F. Vokolos & P. Frankl, “Pythia: a regression test selection tool based on text differencing”, May 1997 Program changes have been used for test selection and for test prioritization. Techniques used were source code differencing, data and control flow analysis, and coarse grained code entities to identify which parts of program might be affected by the changes. 10 seconds

Using program changes Data and control flow analysis Code entities
T. Ball, “On the limit of control flow analysis for regression test selection” Mar. 1998 G. Rothermel and M.J. Harrold, “A Safe, Efficient Regression Test Selection Technique” Apr. 1997 Code entities Y. F. Chen, D.S. Rosenblum and K.P. Vo “TestTube: A System for Selective Regression Testing” May 1994 10 seconds

Analysis of various techniques
Source code differencing Simple and fast Can be built using commonly available tools like “diff” Simple renaming of variable will trip off Will fail when macro definition changes To avoid these pitfalls, static analysis is needed Data and control flow analysis Flow analysis is difficult in languages like C/C++ with pointers, casts and aliasing Interprocedural data flow techniques are extremely expensive and difficult to implement in complex environment Problems : speed, different parsers for different languages. Solutions: working on a binary 30 seconds

Our Solution Focus on change from previous version
Determine change at very fine granularity – basic block/instruction Operates on binary code Easier to integrate in production environment Scales well to compute results in minutes Simple heuristic algorithm to predict which part of code is impacted by the change 30 seconds Real world application. Time & effectiveness is critical. 1.8 million lines in 210 seconds All new bugs are due to new changes.

Test Effectiveness Infrastructure
… TEST Coverage Tools Repository Coverage Magellan Old Build New Build Test Prioritization ECHELON Binary Diff This slide here explains the infrastructure we leverage to build this tool. We have a system called Magellan, which is a coverage collection and storage repository. As shown here testers conduct their tests on various systems and the coverage data is collected in a repository. Echelon uses the existing coverage data collected in the repository and uses the change information between two builds to list a prioritized set of tests 30 seconds BMAT/VULCAN Coverage Impact Analysis

Echelon : Test Prioritization
Old Build New Build Block Change Analysis Magellan Repository (*link with symbol server for symbols) Binary Differences Coverage Impact Analysis Coverage for new build 15 seconds Test Prioritization Leverage what has already been tested Prioritized list of test cases

Block Change Analysis: Binary Matching
Old Build New Build New Blocks Old Blocks (not changed) (changed) 15 seconds Accuracy and heuristic based. BMAT – Binary Matching [Wang, Pierce and McFarling JILP 2000]

Coverage Impact Analysis
Terminology Trace: collection of one or more test cases Impacted Blocks: old modified and new blocks Compute the coverage of traces for the new build Coverage for old (unchanged and modified) blocks are same as the coverage for the old build Coverage for new nodes requires more analysis 15 seconds

Predecessor Blocks (P) A Trace may cover a new block N if it covers at least one Predecessor block and at least one Successor Block Interprocedural edge New Block (N) If P or S is a new block, then its Predecessors or successors are used (iterative process) 30 seconds Successor Blocks (S)

Limitations - New node may not be executed If there is a path from successor to predecessor If there are changes in control path due to data changes 15 seconds

Echelon : Test Case Prioritization
Detects minimal sets of test cases that are likely to cover the impacted blocks (old changed and new blocks) Input is traces (test cases) and a set of impacted blocks Uses a greedy iterative algorithm for test selection 15 seconds

Echelon: Test Selection
Impacted Block Map Set 1 T1 Weights T1 T2 T3 T4 T5 T2 5 2 4 1 3 2 4 1 3 1 1 Set 2 T3 T5 Set 3 30 seconds Weight is the number of impacted blocks each trace covers T4 Denotes that a trace T covers the impacted block

Echelon: Test Selection Output
Ordered List of Traces Each set contains test cases that will give maximum coverage of Impacted nodes Trace T1 SET1 Trace T2 Trace T3 Gracefully handles the “main” modification case Trace T4 SET2 Trace T5 15 seconds Talk about options Trace T7 If all the test can be run, tests should be run in this order to maximize the chances of detecting failures early SET3 . Trace T8 . Trace Tm SETn

Analysis of results Three measurements of interest
How many sequences of tests were formed ? How effective is the algorithm in practice ? How accurate is the algorithm in practice ? The first measure gives us a sense of how many tests remain in each sequences and so on. The second measure tries to figure out the accuracy of the heuristic algorithm in practice. The basic premise behind this is Echelon has to run fast and be fairly accurate (more than 80% accurate to be useful in practice) 15 seconds The user only cares about 1st and 2nd measure.

Details of BinaryE Version 1 Version 2 Date 12/11/2000 01/29/2001
Functions 31,020 31,026 Blocks 668,068 668,274 Arcs 1,097,294 1,097,650 File size 8,880,128 PDB size 22,602,752 22,651,904 Impacted Blocks 378 (220 N, 158 OC) Number of Traces 3128 # Source Lines ~1.8 Million 10 seconds Echelon takes ~210 seconds for this 8MB binary

10 seconds

Effectiveness of Echelon
Important Measure of effectiveness is early defect detection Measured % of defects vs. % of unique defects in each sequence Unique defects are defects not detected by the previous sequence To measure the effectiveness, we took some sample binaries from development process. We measured the percentage of defects detected by each sequences. We also measured the percentage of unique defects detected by each sequence. Unique defects are defects not detected by the previous sequence 15 seconds

Binary 410 kb, functions =2308 blocks 22446, 157 tests, 148 sequences (sequence 1 – 6 tests, 2-4 tests, 3 -2 tests, 4 – 1 test …) First sequence 81% of defects, second sequence, 62%, 6% unique. Rest of the sequences did not detect any unique defects 15 seconds

Binary 400 kb, functions = 3675 blocks 23153, 221 tests, 176 sequences (sequence 1 – 4 tests, 2-4 tests, 3 -3 tests, 4 – 5 tests …) First sequence 85% of defects, second sequence 71%, 9% unique. Third sequence 57% of defects, 4% unique, The rest did not detect any unique defects. None of the tests detected the remaining 2%of the defects. 15 seconds

Blocks predicted hit that were not hit
Limitations - New node may not be executed If there is a path from successor to predecessor If there are changes in control path due to data changes 10 seconds Blocks predicted hit that were not hit

Blocks predicted not hit that were actually hit
10 seconds Blocks predicted not hit that were actually hit (Blocks were target of indirect calls are being predicted as not hit)

Echelon Results: BinaryK
Build 2470 Build 2480 Date 05/01/2001 05/23/2001 Functions 1,761 1,774 Blocks 32,012 32,135 Arcs 47,131 47,323 File size 882,688 894,464 Impacted Blocks 589 (350 N, 239 OC) Traces 56 10 seconds

Echelon Results: BinaryU
Build 2470 Build 2480 Date 05/01/2001 05/23/2001 Functions 1,967 1,970 Blocks 30,916 31,003 Arcs 46,638 46,775 File size 528,384 528,896 Impacted Blocks 270 (190 N, 80 OC) Traces 56 10 seconds

Summary Binary based test prioritization approach can effectively prioritize tests in large scale development environment Simple heuristic with program change in fine granularity works well in practice Currently integrated into Microsoft Development process

Echelon provides a number of options Control branch prediction Indirect calls : if N is target of an indirect call a trace needs to cover at least one of its successor block Future improvements include heuristic branch prediction Branch Prediction for Free [Ball, Larus] 15 seconds

Additional Motivation
And of course to attend ISSTA and meet some great people

Echelon: Test Selection
Options Calculations of weights can be extended, e.g. traces with great historical fault detection can be given additional weights Include time each test takes into calculation Print changed (modified or new) source code that may not be covered by any trace Print all source code lines that may not be covered by any trace 15 seconds

Effectively Prioritizing Tests in Development Environment

Similar presentations

Presentation on theme: "Effectively Prioritizing Tests in Development Environment"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Effectively Prioritizing Tests in Development Environment

Similar presentations

Presentation on theme: "Effectively Prioritizing Tests in Development Environment"— Presentation transcript:

Similar presentations

About project

Feedback