Dataflow Testing G. Rothermel
White Box Adequacy Criteria Statement coverage Decision coverage Condition coverage Path coverage Dataflow coverage
White Box Adequacy Criteria Statement coverage Decision coverage Condition coverage Path coverage Dataflow coverage
Comparing Criteria Analytically Criterion A subsumes criterion B if, for any program P and test suite T for P, T being A-adequate for P implies that T is B-adequate for P. path statement decision condition Can we find a criterion that is stronger than decision but doesn’t have the problems that path has?
Dataflow Testing: Motivation Suppose that a statement assigns a value but the use of that value is never executed under test Need definition-use pairs (du-pairs): associations between definitions and uses of the same variable or memory location a=c+10 d=a+y a not used on this path
Dataflow Testing: Find the Du-Pairs Starting at Statement 1 PROGRAM GCD begin 1 read(x) 2 read(y) 3 while (x <> y) do 4 if (x > y) then 5 x = x – y else 6 y = y – x endif endwhile 7 print x end Entry read(x) Exit read(y) while x <> y if x > y x = x - y y = y - x print x endif endwhile T F
Dataflow Testing: Find the Du-Pairs Starting at Statement 1 PROGRAM GCD begin 1 read(x) 2 read(y) 3 while (x <> y) do 4 if (x > y) then 5 x = x – y else 6 y = y – x endif endwhile 7 print x end Entry read(x) Exit read(y) while x <> y if x > y x = x - y y = y - x print x endif endwhile T F
Introduction Data-flow analysis provides information for dataflow testing and other tasks by computing the flow of data to points in the program For structured programs, data-flow analysis can be performed on an abstract syntax tree; in general, intraprocedural data-flow analysis is performed on the control flow graph
Introduction Entry Compute the flow of data to points in the program --- e.g., Where does the assignment to I in statement 1 reach? Where does the assignment computed in statement 2 reach? Which uses of variable J are reachable from the end of B1? Is the value of variable I used after statement 3? Interesting points before and after basic blocks or statements 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Exit
Data-flow Problems (Reaching Definitions) Entry A definition of a variable or memory location is a point or statement where that variable gets a value --- e.g., a read or assignment statement. A use of a variable or memory location is a point or statement where that variable’s value is fetched and used in a computation A definition of V reaches a point p if there exists a control-flow path in the CFG from the definition to p with no other definitions of V on the path (called a definition-clear path) Such a path may exist in the graph but may not be executable (I.e., there may be no input to the program that will cause it to be executed); such a path is infeasible. 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Exit
Data-flow Problems (Reaching Definitions) Entry Where are the definitions in the program? Of variable I: Of variable J: Which basic blocks (before block) do these definitions reach? Def 1 reaches Def 2 reaches Def 3 reaches Def 4 reaches Def 5 reaches 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Exit
Data-flow Problems (Reaching Definitions) Entry Where are the definitions in the program? Of variable I: 1, 3 Of variable J: 2, 4, 5 Which basic blocks (before block) do these definitions reach? Def 1 reaches Def 2 reaches Def 3 reaches Def 4 reaches Def 5 reaches 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Exit
Data-flow Problems (Reaching Definitions) Entry Where are the definitions in the program? Of variable I: 1, 3 Of variable J: 2, 4, 5 Which basic blocks (before block) do these definitions reach? Def 1 reaches B2 Def 2 reaches B1, B2, B3 Def 3 reaches B1, B3, B4 Def 4 reaches B4 Def 5 reaches Exit 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Exit
Data-flow Problems (Reaching Definitions) Entry Where are the definitions in the program? Of variable I: 1, 3 Of variable J: 2, 4, 5 Which uses do these definitions reach? Def 1 reaches B2 Def 2 reaches B1, B2, B3 Def 3 reaches B1, B3, B4 Def 4 reaches B4 Def 5 reaches Exit 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Exit
Data-flow Problems (Reaching Definitions) Entry Where are the definitions in the program? Of variable I: 1, 3 Of variable J: 2, 4, 5 Which uses do these definitions reach? Def 1 reaches B2 Def 2 reaches B1, B2, B3:4 Def 3 reaches B1, B3, B4 Def 4 reaches B4:5 Def 5 reaches Exit 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Exit
Reaching Definitions Algorithm Entry 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 How can we compute this information? What would be a naïve way? Exit
Reaching Definitions Algorithm Entry Method: Compute two kinds of local information (i.e., within a basic block) GEN[B] is the set of definitions that are created (generated) within B KILL[B] is the set of definitions that, if they reach the point before B (i.e., the beginning of B) won’t reach the end of B 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 <describe the sets for reaching definitions> Now ask what GEN for 1-4 and Kill for 1-4 are. Now ask how you could compute GEN and KILL, given you have the CFG for the program. <can get GEN with one pass over program; must have GEN to get KILL> Exit
Reaching Definitions Algorithm Entry Method (cont’d): 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Now what can we do with these sets to get the reaching definitions? Discuss intuitive methods. Exit
Reaching Definitions Algorithm Entry Method (cont’d): Compute two other sets by propagation IN[B] is the set of definitions that reach the beginning of B OUT[B] is the set of definitions that reach the end of B 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 How can we initialize these sets? Exit
Reaching Definitions Algorithm Entry Method (cont’d): 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Now what? Exit
Reaching Definitions Algorithm Entry Method (cont’d): Propagation method: Initialize the IN[B], OUT[B] sets for all B Iterate over all B until there are no changes to the IN[B], OUT[B] sets On each iteration, visit all B, and compute IN[B], OUT[B] as IN[B] = union OUT[P], for each P that is a predecessor of B OUT[B] = GEN[B] union (IN[B] – Kill[B]) 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Exit
Reaching Definitions Algorithm algorithm ReachingDefinitions Input: CFG w/GEN[B], KILL[B] for all B Output: IN[B], OUT[B] for all B begin ReachingDefinitions IN[B]=empty; OUT[B]=GEN[B], for all B; change = true while change do begin Change = false foreach B do begin In[B] = union OUT[P], for each P that is a predecessor of B Oldout = OUT[B] OUT[B] = GEN[B] union (IN[B] – Kill[B]) if OUT[B] != Oldout then change = true endfor endwhile end Reaching Definitions
Reaching Definitions Algorithm Data-flow for example (set approach) All entries are sets; sets in red indicate changes from last iteration thus, requiring another iteration of the algorithm 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Init GEN KILL IN OUT Iter1 Iter2 1 2 3 4
Reaching Definitions Algorithm Data-flow for example (set approach) 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Init GEN KILL IN OUT Iter1 Iter2 1 1,2 1,2,34,5 -- 3 2,3 2 1,3 4 2,4,5 3,4 5 3,5
Reaching Definitions Algorithm Data-flow for example (bit-vector approach) 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Init GEN KILL IN OUT Iter1 1 2 3 4
Reaching Definitions Algorithm Data-flow for example (bit-vector approach) 1. I := 2 2. J := I + 1 3. I := 1 4. J := J + 1 5. J := J - 4 B1 B2 B3 B4 Init GEN KILL IN OUT Iter1 1 11000 11111 00000 00100 2 10100 01100 3 00010 01011 00110 4 00001 00101
Conservatism and Approximation Exact solutions to most dataflow problems are undecidable. Thus, we compute approximations. Approximate analysis can overestimate the solution: Solution contains actual information plus some spurious information but does not omit information This type of information is safe or conservative Approximate analysis can underestimate the solution: Solution may not contain all actual information This type of information in unsafe For optimization, need conservative, safe analysis For software engineering tasks, we may be able to use unsafe analysis information
Definition-Use Pairs A definition-use pair (DU-pair) consists of a definition D of variable v and a use U of v that D reaches.
Definition-Use Pairs B1 B3 B2 B6 B5 B4 entry Z > 1 X = 1 Z > 2 Y = X + 1 X = 2 Z = X – 3 X = 4 Z = X + 7 exit B1 B3 B2 B6 B5 B4 DU-pairs for (2:X): DU-pairs for (4:X): DU-pairs for (5:X): DU-pairs for (3:Y): DU-pairs for (5:Z): DU-pairs for (6:Z):
Definition-Use Pairs B1 B3 B2 B6 B5 B4 {(2:X,3:X),(2:X,5:X)} entry Z > 1 X = 1 Z > 2 Y = X + 1 X = 2 Z = X – 3 X = 4 Z = X + 7 exit B1 B3 B2 B6 B5 B4 DU-pairs for (2:X): {(2:X,3:X),(2:X,5:X)} DU-pairs for (4:X): {(4:X,5:X)} DU-pairs for (5:X): {(5:X,6:X)} DU-pairs for (3:Y): {} DU-pairs for (5:Z): DU-pairs for (6:Z):
Data Dependence Graph A data dependence graph has nodes for every basic block and edges representing the flow of data between nodes Different types of data dependence Flow: def to use Anti: use to def Out: def to def entry Z > 1 X = 1 Z > 2 Y = X + 1 X = 2 Z = X – 3 X = 4 Z = X + 7 exit B1 B3 B2 B6 B5 B4
Data Dependence Graph B1 B3 B2 B6 B5 B4 B1 B4 B2 B3 B5 B6 entry entry Z > 1 X = 1 Z > 2 Y = X + 1 X = 2 Z = X – 3 X = 4 Z = X + 7 exit B1 B3 B2 B6 B5 B4 B1 Z > 1 B4 X = 1 Z > 2 X = 2 B2 Z = X – 3 X = 4 B3 B5 Y = X + 1 B6 Z = X + 7 exit
Data Dependence Graph B1 B3 B2 B6 B5 B4 B1 B4 B2 B3 B5 B6 entry entry Z > 1 X = 1 Z > 2 Y = X + 1 X = 2 Z = X – 3 X = 4 Z = X + 7 exit B1 B3 B2 B6 B5 B4 B1 Z > 1 B4 X = 1 Z > 2 X = 2 B2 Z = X – 3 X = 4 B3 B5 Y = X + 1 B6 Z = X + 7 exit
Data Flow Testing Data flow testing involves covering du-pairs (or covering data dependence edges in a data dependence graph). To render this stronger than branch coverage we distinguish predicate uses (p-uses) from computation uses (c-uses), and say that to cover a du-pair ending in a p-use, you must exercise all outcomes of the predicate Having done that, which pairs do we need to cover? All-defs coverage: test each def to some use. All-uses coverage: test each def to each use by some path All-paths coverage: test each def to each use by all acyclic paths
Comparing Criteria Analytically Criterion A subsumes criterion B if, for any program P and test suite T for P, T being A-adequate for P implies that T is B-adequate for P. path condition all uses all defs decision statement
Comparing Criteria Empirically (Hutchins et al, ICSE 94) Strategy Mean Cases Faults Found Random Testing 100 79.5% Branch Testing 34 85.5% All Uses 84 90.0%
Dataflow Testing: Find the Du-Pairs Starting at Statement 5 PROGRAM GCD begin 1 read(x) 2 read(y) 3 while (x <> y) do 4 if (x > y) then 5 x = x – y else 6 y = y – x endif endwhile 7 print x end Entry read(x) Exit read(y) while x <> y if x > y x = x - y y = y - x print x endif endwhile T F