Regression Testing
2 So far Unit testing System testing Test coverage All of these are about the first round of testing Testing is performed time to time during the software life cycle Test cases / oracles can be reused in all rounds Testing during the evolution phase is regression testing
3 Regression Testing When we try to enhance the software We may also bring in bugs The software works yesterday, but not today, it is called “regression” Numbers Empirical study on eclipse 2005 11% of commits are bug-inducing 24% of fixing commits are bug-inducing
4 Regression Example public int[] reverse(int[] origin){ int[] target = new int[origin.length]; int index = 0; while(index < origin.length - 1){ index++; target[origin.length-index] = origin[index]; } return target; } //bug, missing origin[0] public int[] reverse(int[] origin){ int[] target = new int[origin.length]; int index = 0; while(index < origin.length - 1){ index++; target[origin.length-index] = origin[index]; } target[origin.length-1] = origin[0] return target; } Regression, now crash when length of origin is 0
5 Regression Testing Run old test cases on the new version of software It will cost a lot if we run the whole suite each time Try to save time and cost for new rounds of testing Test prioritization Test relevant code Record and replay
6 Test prioritization Rank all the test cases Run test cases according to the ranked sequence Stop when resources are used up How to rank test cases To discover bugs sooner Or approximation: to achieve higher coverage sooner
7 APFD: Measurement of Test Prioritization Average Percentage of Fault Detected (APFD) Compare two test case sequences A number of faults (bugs) are detected after each test case The following two sequences, which is better? S1: T1 (2), t2(3), t3(5) S2: T2(1), t1(3), t3(5) APFD is the average of these numbers (normalized with the total number of faults), and 0 for initial state APFD (S1) = (0/5 + 2/5 + 3/5 + 5/5) / 4 = 0.5 APFD (S2) = (0/5 + 1/5 + 3/5 + 5/5) / 4 = 0.45
8 APFD: Illustration APFD can be deemed as the area under the TestCase-Fault curve Consider t1(f1, f2), t2(f3), t3(f3), t4(f1, f2, f3, f4)
9 Coverage-based test case prioritization Code coverage based Require recorded code-coverage information in previous testing Combination coverage based Require input model Mutation coverage based Require recorded mutation-killing stats
10 Total Strategy The simplest strategy Always select the unselected test case that has the best coverage
11 Example Consider code coverage on five test cases: T1: s1, s3, s5 T2: s2, s3, s4, s5 T3: s3, s4, s5 T4: s6, s7 T5: s3, s5, s8, s9, s10 Ranking: T5, T2, T1 / T3, T4
12 Additional Strategy An adaption of total strategy Instead of always choosing the test case with highest coverage Choose the test case that result in most extra coverage Starts from the test case with highest coverage
13 Example Consider code coverage on five test cases: T1: s1, s3, s5 T2: s2, s3, s4, s5 T3: s3, s4, s5 T4: s6, s7 T5: s3, s5, s8, s9, s10 Ranking: T5(5), T2(2, s2, s4) / T4(2, s6, s7), T1(1, s1), T3
14 Combination-coverage based prioritization Use combination coverage instead of code coverage Total strategy does not work for combination coverage, why? Use additional strategy (for n-wise combinations) Example: input model: (coke, sprite), (icy, normal), (receipt, not) Test cases: {coke, icy, not}, {coke, normal, not}, {sprite, icy, receipt}, {sprite, normal, receipt} Ranking for 2-wise prioritization: {coke, icy, not}, {sprite, icy, receipt} (+3), {coke, normal, not} (+2), {sprite, normal, receipt} (+2)
15 Combination-coverage based prioritization Multi-wise coverage based prioritization Problem It may be not reasonable to consider combinations on only certain N-wise, (sprite, normal, receipt) > (sprite, icy, receipt) Multi-wise prioritization Select the test case with best additional 1-wise prioritization If there is a tie, go to 2-wise, and then 3-wise, … Results: {coke, icy, not}, {sprite, normal, receipt} (1-wise + 3), {coke, normal, not} (2-wise + 2, 3-wise + 1), {sprite, icy, receipt} (2-wise + 2, 3-wise + 1)
16 Mutation-coverage based prioritization Similar to code coverage based prioritization Run mutation testing for the test suite Use killed mutants of each test case as criteria Work for both total and additional strategy
17 Setting the threshold Prioritization help us to find bugs earlier Due to resource limit, we do not want to execute all test cases The testing should stop at some place in the prioritized rank list Resource limit Money, time Coverage based Cover all/certain percent of statements Cover all/certain percent of n-wise combinations Cover all/certain percent of mutations
18 Test Relevant Code Basic Idea: Only use test cases that cover the changed code Can be combined with test prioritization Give more priority to the test cases that cover more code affected by the change Determine the affected code with program slicing
19 Which test case is better? Consider the following change and test cases void main() { int sum, i; sum = 0; -> sum = 1; i = read; if(i >= 12){ String rep = report(invalid, i); sendReport(rep) }else{ while ( i<11 ) { sum = add(sum, i); i = add(i, 1); } } } Test case: 0 Test case: 13 Test case: 0 is better because it covers more code in the forward slice
20 Program slicing Observation The more a test case cover code affected by a change, the results of the test case is more likely to be changed Only test the part that are related to the revision Program slicing: Locating all parts in the code base that will be affected by the value of a variable
21 Program slicing Forward slice of variable v at statement s All the code that are either control or data depend on v at statement s Backward slice of variable v at statement s All the code that v at statement s depends on (either control or data dependency)
22 Data Dependencies Data dependencies are the dependency from the usage of a variable to the definition of the variable Example: s1: x = 3; s2: if(y > 5){ s3: y = y + x; //data depend on x in s1 s4: }
23 Control Dependencies Control dependencies are the dependency from the branch basic blocks to the predicate Example: s1: x = 3; s2: if(y > 5){ s3: y = y + x; //control depend on y in s2 s4: }
24 Example: call-site -> actual arguments void main() { int sum, i; sum = 0; i = 1; while ( i<11 ) { sum = add(sum, i); i = add(i, 1); }
25 Example: program slicing static int add(int a, int b){ return a + b; }
26 Example: Inter-Procedure sum = add(sum, i); i = add(i, 1); static int add(int a, int b){ return a + b; }
27 Example: Full dependence graph
28 Program slicing for sum = 0 -> sum = 1
29 Context Sensitivity A property that measures whether an analysis is sensitive to the method-invocation context The actual in / out of a method invocation should match with each other The actual in / out of different invocations should not How to do this Bracket matching Consider actual in / out of $0 to be ‘(’ and ‘)’ Consider actual in / out of $1 to be ‘{’ and ‘}’ A real path should have all brackets matched
30 Program slicing for sum = 0 -> sum = 1
31 Program Slicing based Test Selection Retrieve the forward slice of the changed code Select test cases that will cover more statements in the forward slice void main() { int sum, i; sum = 0; -> sum = 1; i = read; if(i >= 12){ String rep = report(invalid, i); sendReport(rep) } while ( i<11 ) { sum = add(sum, i); i = add(i, 1); } Test case: 0 Test case: 13 Test case: 0 is better because it covers more code in the forward slice
32 Record and Replay A resource waste in regression testing We change the code a little bit We need to run all the unchanged code in the test execution Record and Replay For all/some of the unchanged modules Do not run the modules Use the results of previous test instead
33 Record and Replay Example Testing an expert system for finance Has two components, UI and interest calculator (based on the inputs from UI) In first round of testing, store as a map the results of interest calculator: (a, b) -> 5%, (a, c) -> 10%, (d, e) -> 7.7% In regression testing, if the change is made on UI, you can rerun the software with the data map Recording more objects means saving more time in regression testing, should we record every object???
34 Pros & Cons Pros Saving time in regression testing Cons Be careful when recording non-deterministic components E.g., recording getSystemTime(), may conflict with another call Spend a lot of time for recording data maps Stored data map can be too huge When the stored object is changed, the data map requires updates
35 Selection of recorded modules Rules Record time consuming modules So that you save more time The recorded module should be stable E.g., libraries The interface should contain a small data flow E.g., numeric inputs and return values
36 Selection of recording modules Recording UI Components Recording Internet Components Recording components that will affect real world Sending an Transfer money from credit cards
37 Review of Regression Testing Test Prioritization Try only the most important test cases Test Relevant Code Try the most relevant test cases Record and Replay Reuse the execution results of previous test cases