2 2  So far  Unit testing  System testing  Test coverage  All of these are about the first round of testing  Testing is performed time to time during the software life cycle  Test cases / oracles can be reused in all rounds  Testing during the evolution phase is regression testing

3 3 Regression Testing  When we try to enhance the software  We may also bring in bugs  The software works yesterday, but not today, it is called “regression”  Numbers  Empirical study on eclipse 2005  11% of commits are bug-inducing  24% of fixing commits are bug-inducing

4 4 Regression Example public int[] reverse(int[] origin){ int[] target = new int[origin.length]; int index = 0; while(index < origin.length - 1){ index++; target[origin.length-index] = origin[index]; } return target; } //bug, missing origin[0] public int[] reverse(int[] origin){ int[] target = new int[origin.length]; int index = 0; while(index < origin.length - 1){ index++; target[origin.length-index] = origin[index]; } target[origin.length-1] = origin[0] return target; } Regression, now crash when length of origin is 0

5 5 Regression Testing  Run old test cases on the new version of software  It will cost a lot if we run the whole suite each time  Try to save time and cost for new rounds of testing  Test prioritization  Test relevant code  Record and replay

6 6 Test prioritization  Rank all the test cases  Run test cases according to the ranked sequence  Stop when resources are used up  How to rank test cases  To discover bugs sooner  Or approximation: to achieve higher coverage sooner

7 7 APFD: Measurement of Test Prioritization  Average Percentage of Fault Detected (APFD)  Compare two test case sequences  A number of faults (bugs) are detected after each test case  The following two sequences, which is better?  S1: T1 (2), t2(3), t3(5)  S2: T2(1), t1(3), t3(5)  APFD is the average of these numbers (normalized with the total number of faults), and 0 for initial state  APFD (S1) = (0/5 + 2/5 + 3/5 + 5/5) / 4 = 0.5  APFD (S2) = (0/5 + 1/5 + 3/5 + 5/5) / 4 = 0.45

8 8 APFD: Illustration  APFD can be deemed as the area under the TestCase-Fault curve  Consider t1(f1, f2), t2(f3), t3(f3), t4(f1, f2, f3, f4)

9 9 Coverage-based test case prioritization  Code coverage based  Require recorded code-coverage information in previous testing  Combination coverage based  Require input model  Mutation coverage based  Require recorded mutation-killing stats

10 10 Total Strategy  The simplest strategy  Always select the unselected test case that has the best coverage

11 11 Example  Consider code coverage on five test cases:  T1: s1, s3, s5  T2: s2, s3, s4, s5  T3: s3, s4, s5  T4: s6, s7  T5: s3, s5, s8, s9, s10  Ranking: T5, T2, T1 / T3, T4

12 12 Additional Strategy  An adaption of total strategy  Instead of always choosing the test case with highest coverage  Choose the test case that result in most extra coverage  Starts from the test case with highest coverage

13 13 Example  Consider code coverage on five test cases:  T1: s1, s3, s5  T2: s2, s3, s4, s5  T3: s3, s4, s5  T4: s6, s7  T5: s3, s5, s8, s9, s10  Ranking: T5(5), T2(2, s2, s4) / T4(2, s6, s7), T1(1, s1), T3

14 14 Combination-coverage based prioritization  Use combination coverage instead of code coverage  Total strategy does not work for combination coverage, why?  Use additional strategy (for n-wise combinations)  Example: input model: (coke, sprite), (icy, normal), (receipt, not)  Test cases: {coke, icy, not}, {coke, normal, not}, {sprite, icy, receipt}, {sprite, normal, receipt}  Ranking for 2-wise prioritization: {coke, icy, not}, {sprite, icy, receipt} (+3), {coke, normal, not} (+2), {sprite, normal, receipt} (+2)

15 15 Combination-coverage based prioritization  Multi-wise coverage based prioritization  Problem  It may be not reasonable to consider combinations on only certain N-wise, (sprite, normal, receipt) > (sprite, icy, receipt)  Multi-wise prioritization  Select the test case with best additional 1-wise prioritization  If there is a tie, go to 2-wise, and then 3-wise, …  Results: {coke, icy, not}, {sprite, normal, receipt} (1-wise + 3), {coke, normal, not} (2-wise + 2, 3-wise + 1), {sprite, icy, receipt} (2-wise + 2, 3-wise + 1)

16 16 Mutation-coverage based prioritization  Similar to code coverage based prioritization  Run mutation testing for the test suite  Use killed mutants of each test case as criteria  Work for both total and additional strategy

17 17 Setting the threshold  Prioritization help us to find bugs earlier  Due to resource limit, we do not want to execute all test cases  The testing should stop at some place in the prioritized rank list  Resource limit  Money, time  Coverage based  Cover all/certain percent of statements  Cover all/certain percent of n-wise combinations  Cover all/certain percent of mutations

18 18 Test Relevant Code  Basic Idea:  Only use test cases that cover the changed code  Can be combined with test prioritization  Give more priority to the test cases that cover more code affected by the change  Determine the affected code with program slicing

19 19 Which test case is better?  Consider the following change and test cases void main() { int sum, i; sum = 0; -> sum = 1; i = read; if(i >= 12){ String rep = report(invalid, i); sendReport(rep) }else{ while ( i<11 ) { sum = add(sum, i); i = add(i, 1); } } } Test case: 0 Test case: 13 Test case: 0 is better because it covers more code in the forward slice

20 20 Program slicing  Observation  The more a test case cover code affected by a change, the results of the test case is more likely to be changed  Only test the part that are related to the revision  Program slicing:  Locating all parts in the code base that will be affected by the value of a variable

21 21 Program slicing  Forward slice of variable v at statement s  All the code that are either control or data depend on v at statement s  Backward slice of variable v at statement s  All the code that v at statement s depends on (either control or data dependency)

22 22 Data Dependencies  Data dependencies are the dependency from the usage of a variable to the definition of the variable  Example: s1: x = 3; s2: if(y > 5){ s3: y = y + x; //data depend on x in s1 s4: }

23 23 Control Dependencies  Control dependencies are the dependency from the branch basic blocks to the predicate  Example: s1: x = 3; s2: if(y > 5){ s3: y = y + x; //control depend on y in s2 s4: }

24 24 Example: call-site -> actual arguments void main() { int sum, i; sum = 0; i = 1; while ( i<11 ) { sum = add(sum, i); i = add(i, 1); }

25 25 Example: program slicing static int add(int a, int b){ return a + b; }

26 26 Example: Inter-Procedure sum = add(sum, i); i = add(i, 1); static int add(int a, int b){ return a + b; }

27 27 Example: Full dependence graph

28 28 Program slicing for sum = 0 -> sum = 1

29 29 Context Sensitivity  A property that measures whether an analysis is sensitive to the method-invocation context  The actual in / out of a method invocation should match with each other  The actual in / out of different invocations should not  How to do this  Bracket matching  Consider actual in / out of $0 to be ‘(’ and ‘)’  Consider actual in / out of $1 to be ‘{’ and ‘}’  A real path should have all brackets matched

30 30 Program slicing for sum = 0 -> sum = 1

31 31 Program Slicing based Test Selection  Retrieve the forward slice of the changed code  Select test cases that will cover more statements in the forward slice void main() { int sum, i; sum = 0; -> sum = 1; i = read; if(i >= 12){ String rep = report(invalid, i); sendReport(rep) } while ( i<11 ) { sum = add(sum, i); i = add(i, 1); } Test case: 0 Test case: 13 Test case: 0 is better because it covers more code in the forward slice

32 32 Record and Replay  A resource waste in regression testing  We change the code a little bit  We need to run all the unchanged code in the test execution  Record and Replay  For all/some of the unchanged modules  Do not run the modules  Use the results of previous test instead

33 33 Record and Replay  Example  Testing an expert system for finance  Has two components, UI and interest calculator (based on the inputs from UI)  In first round of testing, store as a map the results of interest calculator: (a, b) -> 5%, (a, c) -> 10%, (d, e) -> 7.7%  In regression testing, if the change is made on UI, you can rerun the software with the data map  Recording more objects means saving more time in regression testing, should we record every object???

34 34 Pros & Cons  Pros  Saving time in regression testing  Cons  Be careful when recording non-deterministic components  E.g., recording getSystemTime(), may conflict with another call  Spend a lot of time for recording data maps  Stored data map can be too huge  When the stored object is changed, the data map requires updates

35 35 Selection of recorded modules  Rules  Record time consuming modules  So that you save more time  The recorded module should be stable  E.g., libraries  The interface should contain a small data flow  E.g., numeric inputs and return values

36 36 Selection of recording modules  Recording UI Components  Recording Internet Components  Recording components that will affect real world  Sending an email  Transfer money from credit cards

37 37 Review of Regression Testing  Test Prioritization  Try only the most important test cases  Test Relevant Code  Try the most relevant test cases  Record and Replay  Reuse the execution results of previous test cases

