1 Improving Cluster Selection Techniques of Regression Testing by Slice Filtering Yongwei Duan, Zhenyu Chen, Zhihong Zhao, Ju Qian and Zhongjun Yang Software Institute, Nanjing University, Nanjing, China
2 Outline Introduction Our Approach Experiment and Evaluation Future Work
3 Introduction Test selection techniques Cluster selection techniques Problems
4 Test selection techniques Rerunning all of the existing test cases is costly in regression testing Test selection techniques : choose a subset of test cases to rerun
5 Cluster Selection Run Test Cases Collection Execution Profiles (Basic block level) Clusters of Test Cases A reduced test suite Cluster selection overview Clustering Sampling
Problems Too much data to cluster – Huge amount of execution traces – Always a high dimension 6 Just focus on the code fragments that are actually relevant to the program modification!!!
Our approach Overview Slice filtering Clustering analysis Sampling 7
Our approach Overview 8 Running test cases Execution traces Trace filtering traces Cluster analysis clusters Reduced test suite sampling
Slice filtering The execution traces are too detailed to be used in clustering analysis We use program slice to filter out fragments that are irrelevant to program modification. 9
Slice filtering contd Statement 2 is changed from if(m<n) to if(m<=n) We compute a program slice with respect to statement 2 and intersect it with each execution trace. Given 3 test cases, we compare their execution traces and filtered execution traces. 10 if(m<=n){
11 Slice filtering contd Test cases InputExecution trace (Statement no.) Statement no. by filtering mn t1t1 101,2,4,5,6,7,8,9,10, 11,12,13,14 2,4,5,6,7,8 t2t2 01,2,3,5,6,7,8,9,10, 11,12,13,14 2,3,5,6,7,8 t3t3 11,2,3,5,6,7,8,92,3,5,6,7,8 Execution traces are much smaller after program slice filtering. Traces of t2 and t3 are the same by filtering while the difference between t1 and t2 is magnified. To condense the traces further, adjacent statements within a basic block is combined into one statement. Patterns are easy to reveal with simple execution traces.
12 Slice filtering contd But the amount of test cases is still large. If a trace is too small (below a threshold) after intersection with the program slice, it is unlikely to be a fault- revealing test case, so we remove it from the test suite.
13 Slice filtering contd Filtering rate – We define filtering rate FR as: if the threshold is M and the size of the program slice is N, then the filtering rate FR = M / N * 100%. – When FR gets lower, the effect of filtering diminishes i.e. fewer features can be eliminated.
14 Slice filtering contd Why not just use Dynamic slicing – The computing of dynamic slicing is complex and time consuming – Effective dynamic slicing tools are hard to come by
15 Clustering analysis Distance measure – For a filtered trace f i =, where a ij is the execution count of a basic block. The distance between two filtered trace f i and f j is:
16 Sampling We use adaptive sampling in our approach –We first sample a certain number of test cases. If a test case is fault-revealing, the entire cluster from which the test cases are sampled is selected. This strategy favors small clusters and has high probability to select fault-revealing test cases.
17 Experiment & Evaluation Subject program – space, from SIR(Software-artifact Infrastructure Repository ) – 5902 LOC – 1533 basic-blocks – 38 modified versions (a real fault is augmented for each version ) – test cases
18 Experiment & Evaluation Subject program Measurements Experimental results Observations
19 Experiment & Evaluation 3 measurements – Precision – Reduction – Recall
20 Experiment & Evaluation Precision – if in a certain run the technique selects a subset of N test cases, in which M test cases are fault-revealing. The precision of the technique is: M / N * 100%. – Precision measures the extent to which a selection method omits non-fault- revealing test cases in a run
21 Experiment & Evaluation Reduction – if a selection technique selects M test cases out of all N existing test cases in a certain run, the reduction of the technique is: M / N * 100%. – Reduction measures the extent to which a technique can reduce the size of the original test suite. – A low reduction means a selection technique greatly reduce the original test suite.
22 Experiment & Evaluation Recall – if a selection technique selects M fault- revealing test cases out of N existing fault- revealing test cases in a certain run, the recall of the technique is: M / N * 100%. – Recall measures the extent to which a selection technique can include fault- revealing test cases. – Recall indicates the fault detecting capability of a technique. A safe selection technique achieves 100% recall.
23 Experiment & Evaluation Experimental results – A comparison between our approach and Dejavu. Dejavu is known as an effective algorithm in its high precision of test selection. – A comparison between 2 different filtering rate: FR = 0.3 and FR = 0.5
24 Experiment & Evaluation 24 Comparison of precision between our approach when FR=0.3 and Dejavu
25 Experiment & Evaluation 25 Comparison of reduction between our approach when FR=0.3 and Dejavu
26 Experiment & Evaluation 26 Comparison of recall between our approach when FR=0.3 and Dejavu We achieve certain improvement except version 13, 25, 26, 35, 37, 38.
Experiment & Evaluation Analysis – The key to our approach is to isolate the fault- revealing test cases into small clusters – Failures detected on version 13, 25, 26, 35, 37, 38 are mostly memory access violation failures. Those failures cause premature termination of the execution flows. – Program slicing cannot predict runtime execution flow changes and therefore cannot provide enough information to differentiate these test cases and lump them into different clusters. 27
28 Experiment & Evaluation 28 Comparison of precision between FR=0.3 and FR=0.5
29 Experiment & Evaluation 29 Comparison of reduction between FR=0.3 and FR=0.5
30 Experiment & Evaluation 30 Comparison of recall between FR=0.3 and FR=0.5 If we raise FR to 0.5, certain improvement on precision, reduction and recall can be achieved
Experiment & Evaluation Observations – for most versions, our approach has higher precision and lower reduction (lower is better) than Dejavu. It means that we can select fault-revealing test cases from the original test suite and select relatively few non-fault-revealing test cases 31
Experiment & Evaluation Observations – the effectiveness of our approach depends largely on the level of isolations of fault-revealing test cases. By choosing appropriate parameters such as filtering rate, sampling rate, initial cluster number etc., we can enhance the level of isolation. 32
Future work We will try to answer the following questions in our future work – How do distance metrics and cluster algorithms affect the result of cluster selection techniques? – Given a program, how to find the best filtering rate and other parameters? 33
34 Q & A