Data Dependence Based Testability Transformation in Automated Test Generation Presented by: Qi Zhang
Outline Introduction to test data generation Test data generation methods Data dependence oriented test generation Testability transformation in test data generation Conclusions
Test Data Generation Problem Given: a target Goal: find a program input on which the target is executed
Example F(int a[10], int b[10], int target) { int i; bool fa, fb; i=1; fa=false; fb=false; while (i < 10 { if (a[i] == target) fa=true; i=i + 1; } if (fa == true) { i=1; fb=true; while (i < 10) { if (b[i] != target) fb=false; i=i+1; } if (fb==true) printf(“message1”); else printf(“message2”); } target statement
Target A statement A branch A path A data flow A multiple condition An assertion A specific output value …
Application of Test Data Generation Code-based (white-box) testing Identification of program properties Specification-based testing Testing specification conformance …
Test Data Generation Methods Random test generation Path-oriented test generation Symbolic execution oriented test generation Execution-oriented test generation Goal-oriented test generation Chaining approach of test generation Simulated annealing Evolutionary algorithms …
Path-Oriented Test Generation Select path P to target statement S Target statement S Find input to execute path P Input found? no yes An input to execute path P and target statement S
Example for path oriented test generation 1input (a,n); 2max=a[1]; 3min=a[1]; 4i=2; 5while (i<=n) { 6,7 if (max<a[i]) max=a[i]; 8,9 if (min>a[i]) min=a[i]; 10 i=i+1; } 11output(min,max); en ex
Path-oriented test generation Finding input to execute the selected path Symbolic execution oriented test generation Execution-oriented test generation
Path-Oriented Test Generation Problems: Selected paths are frequently non-executable A lot of search effort is “wasted” on non- executable paths It is considered a restrictive in the presence of loops
Goal-Oriented Test Generation Paths are not selected Based on actual program execution A control graph of the program is used It solves problems (sub-goals) as they occur to reach the target statement Fitness functions are used to guide the search
Goal-Oriented Test Generation execute program on any input this execution does not lead to the target this execution may lead to the target problem node x target statement
Goal-Oriented Test Generation 1input (a,n); 2max=a[1]; 3min=a[1]; 4i=2; 5while (i<=n) { 6,7 if (max<a[i]) max=a[i]; 8,9 if (min>a[i]) min=a[i]; 10 i=i+1; } 11output(min,max); en ex target statement
Goal-Oriented Test Generation 1input (a,n); 2max=a[1]; 3min=a[1]; 4i=2; 5while (i<=n) { 6,7 if (max<a[i]) max=a[i]; 8,9 if (min>a[i]) min=a[i]; 10 i=i+1; } 11output(min,max); en ex a={2, 7}, n=-5 Initial input:
Goal-Oriented Test Generation 1input (a,n); 2max=a[1]; 3min=a[1]; 4i=2; 5while (i<=n) { 6,7 if (max<a[i]) max=a[i]; 8,9 if (min>a[i]) min=a[i]; 10 i=i+1; } 11output(min,max); en ex F=i-n=7 find new value of a and n such that F<=0 a={2, 7}, n=-5 Initial input:
Goal-Oriented Test Generation There are many searching algorithms that can be used to find a new program input based on the fitness function Hill-climbing algorithm Simulated annealing Evolutionary algorithm …
Chaining Approach The chaining approach is an extension of the goal- oriented approach The chaining approach uses: Control flow graph Data flow (data dependence) information
1 void F(int A[], int C[]) { int i, j, top, f_exit; 2i=1; 3j = 1 ; 4top = 0 ; 5f_exit=0; 6while (C[j]<5) { 7 j = j + 1 ; 8 if (C[j] == 1) { 9 i = i + 1 ; 10 if (A[i] > 0) { 11,12 top = top + 1; AR[top] = A[i] ; }; 13 if (C[j] == 2) { 14 if (top>0) { 15,16 write(AR[top]); top = top - 1 ; }; 17 if (C[j]==3) { 18,19 if (top>100) {write(1);} //target statement 20 else write(0); }; }; //endwhile }
data dependence concepts There exists a data dependence between statement S1 and S2 if: S1 is a definition of variable v (assigns value to v) S2 is an use of variable v (references v) There exists a path in the program from S1 to S2 along which v is not modified
1 void F(int A[], int C[]) { int i, j, top, f_exit; 2i=1; 3j = 1 ; 4top = 0 ; 5f_exit=0; 6while (C[j]<5) { 7 j = j + 1 ; 8 if (C[j] == 1) { 9 i = i + 1 ; 10 if (A[i] > 0) { 11,12 top = top + 1; AR[top] = A[i] ; }; 13 if (C[j] == 2) { 14 if (top>0) { 15,16 write(AR[top]); top = top - 1 ; }; 17 if (C[j]==3) { 18,19 if (top>100) {write(1);} 20 else write(0); }; }; //endwhile }
Chaining Approach It may significantly increase chances of finding inputs over the goal-oriented approach It relies on direct data dependences related to problem statements The chaining approach does not have a “global view” of dependences in the program
Data Dependence Based Test Generation We present data dependence based test generation This approach uses a data dependence graph rather than individual data dependences during the search
1 void F(int A[], int C[]) { int i, j, top, f_exit; 2i=1; 3j = 1 ; 4top = 0 ; 5f_exit=0; 6while (C[j]<5) { 7 j = j + 1 ; 8 if (C[j] == 1) { 9 i = i + 1 ; 10 if (A[i] > 0) { 11,12 top = top + 1; AR[top] = A[i] ; }; 13 if (C[j] == 2) { 14 if (top>0) { 15,16 write(AR[top]); top = top - 1 ; }; 17 if (C[j]==3) { 18,19 if (top>100) {write(1);} 20 else write(0); }; }; //endwhile }
Data Dependence Based Test Generation Data dependence based test generation is used when the existing methods fail to find the solution Suppose the existing methods fail at some conditional statement (predicate) which is referred to as a problem node The data dependence based test generation constructs a data dependence graph which contains the statements that influence the problem node
Data Dependence Based Test Generation The data dependence graph is used by the search engine to guide the search The data dependence based test generation identifies different sequences for exploration in the data-dependence graph leading to the problem statement The identified sequences are used in the program to guide the search
Data Dependence Based Test Generation Data-dependence graph Data dependences with respect to variable top
Data Dependence Based Test Generation The data dependence graph is used by the search engine to guide the search The data dependence based test generation identifies different sequences for exploration in the data-dependence graph leading to the problem statement The identified sequences are used in the program to guide the search
Data Dependence Based Test Generation
Data Dependence Based Test Generation en, 4, 11, 16, 11, 18
Data Dependence Based Test Generation Sample sequences generated from the data dependence graph: P1: en, 4, 18 P2: en, 4, 11, 18 P3: en, 4, 16, 18 P4: en, 4, 11, 16, 18 P5: en, 4, 16, 11, 18 …
Data Dependence Based Test Generation The data dependence graph is used by the search engine to guide the search The data dependence based test generation identifies different sequences for exploration in the data-dependence graph leading to the problem statement The identified sequences are used by the search engine to “execute” (explore) them in the program
Data Dependence Based Test Generation For some programs, a large number of different sequences can be generated from the data dependence graph for exploration before the solution is found Many sequences may not lead to the solution It may be expensive to explore sequences in the original program The search engine may require a lot of effort to move from one node to another one as specified by the sequences
Testability transformation The idea is to explore these sequences not in the original program but in a transformed program in which it should be much easier (faster) to determine whether the fitness function associated with the problem node may evaluate to the target value for a given sequence
Testability transformation Original program input x Transformed program Sequence S fitness function F input x
Testability transformation The transformed program is used to identify promising sequences A promising sequence is a sequence for which it is possible to find a program input on which the fitness function at the problem node evaluates to the target value
Testability transformation Transformed program Sequence S fitness function F input x Find input x on which Fitness function F evaluates to the target value during execution of sequence S
Testability transformation It is inexpensive to identify promising/unpromising sequences in the transformed program Identified promising sequences are then explored in the original program to find the solution
Testability transformation A data dependence graph is used to construct a “corresponding (transformed) program”
Testability transformation
float TransFunc(int A[], int C[], int PathSize, int S[], int R[]) { int i, j, top; 2i=1; 3while (i<=PathSize) { 4 switch (S[i]) { 5case 4: {top = 0;// 4 6break; } 7case 11: {top = top + 1;// 11 8 for (j=1;j<R[i];j++) top = top + 1; 9 break; } 10case 16: {top = top - 1;// for (j=1;j<R[i];j++) top = top - 1; 12 break; } 13 } 14 i++; 15}; 16 return 100-top; //computation of the fitness function at node 18 }
float TransFunc(int A[], int C[], int PathSize, int S[], int R[]) { int i, j, top; 2i=1; 3while (i<=PathSize) { 4 switch (S[i]) { 5case 4: {top = 0;// 4 6break; } 7case 11: {top = top + 1;// 11 8 for (j=1;j<R[i];j++) top = top + 1; 9 break; } 10case 16: {top = top - 1;// for (j=1;j<R[i];j++) top = top - 1; 12 break; } 13 } 14 i++; 15}; 16 return 100-top; //computation of the fitness function at node 18 }
float TransFunc(int A[], int C[], int PathSize, int S[], int R[]) { int i, j, top; 2i=1; 3while (i<=PathSize) { 4 switch (S[i]) { 5case 4: {top = 0;// 4 6break; } 7case 11: {top = top + 1;// 11 8 for (j=1;j<R[i];j++) top = top + 1; 9 break; } 10case 16: {top = top - 1;// for (j=1;j<R[i];j++) top = top - 1; 12 break; } 13 } 14 i++; 15}; 16 return 100-top; //computation of the fitness function at node 18 }
Testability transformation
Testability transformation How many times?
float TransFunc(int A[], int C[], int PathSize, int S[], int R[]) { int i, j, top; 2i=1; 3while (i<=PathSize) { 4 switch (S[i]) { 5case 4: {top = 0;// 4 6break; } 7case 11: {top = top + 1;// 11 8 for (j=1;j<R[i];j++) top = top + 1; 9 break; } 10case 16: {top = top - 1;// for (j=1;j<R[i];j++) top = top - 1; 12 break; } 13 } 14 i++; 15}; 16 return 100-top; //computation of the fitness function at node 18 }
Testability transformation Transformed program PathSize Sequence S F R[]A[]C[] Find input A[], C[], and R[] on which F < 0 during execution of sequence S
Testability transformation en, 4, 11*, 18
Testability transformation S = 411 R = ?? Given: PathSize = 2 Find: A = ?? C = ?? Such that F < 0
Testability transformation R = 1101 Solution: A = -- C = --
Testability transformation en, 4, , times
1 void F(int A[], int C[]) { int i, j, top, f_exit; 2i=1; 3j = 1 ; 4top = 0 ; 5f_exit=0; 6while (C[j]<5) { 7 j = j + 1 ; 8 if (C[j] == 1) { 9 i = i + 1 ; 10 if (A[i] > 0) { 11,12 top = top + 1; AR[top] = A[i] ; }; 13 if (C[j] == 2) { 14 if (top>0) { 15,16 write(AR[top]); top = top - 1 ; }; 17 if (C[j]==3) { 18,19 if (top>100) {write(1);} 20 else write(0); }; }; //endwhile } Given a promising sequence: S =
How much saving? With transformed program At most 5 sequence explorations in the transformed program Only one sequence identified as a promising one Without transformed program In the best case, over 100 sequence explorations
Data Dependence Based Test Generation Multiple variables?
Data Dependence Based Test Generation linepos wordlen maxpos
Data Dependence Based Test Generation For data dependence graphs with multiple variables we identify first data dependence execution graphs (rather than sequences) In the next step, sequences for exploration are generated from these data dependence execution graphs
Data Dependence Based Test Generation Data dependence execution graph Each execution graph represents a different way the fitness function associated with a problem node may be computed The execution graph contains all dependences that may occur during program execution The execution graph is derived from the data dependence graph by traversing backwards from the problem node
Data Dependence Based Test Generation A sample data dependence execution graph wordlen maxpos linepos
Data Dependence Based Test Generation For data dependence graphs with multiple variables we identify first data dependence execution graphs (rather than sequences) In the next step, valid sequences for exploration are generated from data dependence graphs
Data Dependence Based Test Generation Valid sequence Represents a possible sequence of executions of nodes in the execution graph All data dependences in the execution graph are preserved
Data Dependence Based Test Generation wordlen maxpos linepos wordlenlinepos wordlen linepos wordlen wordlen A valid sequence:
Testability transformation Generated sequences are explored in the transformed program to identify promising/unpromising sequences Identified promising sequences are then explored in the original program to find the solution
Conclusions Data dependence analysis is used to guide transformations to improve testability The transformations can improve the test data generation The transformations employed do not preserve the meaning of the program, yet this is unimportant in the context of test data generation
Conclusions By using testability transformation The chances of finding a solution are increased It is much easier to explore different data dependence sequences The search may find a solution more efficiently
Questions?