A System to Generate Test Data and Symbolically Execute Programs Lori A. Clarke September 1976
Existing Approach Programmer manually generates test data and tests until satisfied that program is correct Proposed alternative methods: Program correctness: formal mathematical proofs used to prove a program is correct Program validation: encompasses wide range of automated tools that analyze and evaluate programs
Existing Approach - Problems Success depends on programmer's expertise and system complexity What criteria do we use to generate tests? Approach inadequate and costly Program correctness: Frequent human intervention required Complex and tedious, infeasible for large systems Program validation Aids in testing, but does not guarantee program is correct
Goals of Proposed System Generate test data that drives execution down a specific path – tester specifies which path Detect non-executable program paths Create a symbolic representation of the program's output variables Detect certain types of program errors
System Overview
System Phases
Phase 1: Preprocessor Uses DAVE (Osterweil and Fosdick), without its sophisticated features
Control Flow Graph
Control Path One way of “going” from one point to another – a path that the Control could take There could be several
Execution Path A control path that can be executed
Phase 2: Symbolic Execution
Path Selection Two methods: Static – designed to accept automatically generated paths Interactive – designed to aid a human user in selecting a path
Symbolic Execution Example Expressions, not values, are assigned. Input Fragment: READ(UNIT) B, C, D A = B + C * D C = A * WRITE C How is it done? B = I1, C = I2, D = I3 A = I1 + I2 * I3 C = ( (I1+I2)*I3 )*3+5 Symbolic Outputs
Why Symbolic Execution? Creates a human-readable symbolic representation Facilitates error-detection Aids in assertion generation Produces path constraints used in test generation
Finding Constraints with Symbolic Execution J = I1, K = I2 J becomes I1 + 1 For control to go through path 1-5, 7, 9: I1 + 1 <= I2 [J becomes I2-(I1+1)] I2-(I1+1) > -1
Finding Constraints with Symbolic Execution J = I1, K = I2 J becomes I1 + 1 For control to go through path 1-5, 7, 9: I1 + 1 <= I2 [J becomes I2-(I1+1)] I2-(I1+1) > -1 These are the Constraints
Error Checking Artificial constraints are created to aid in finding some types of errors For instance, array bounds checking When element X(i) of a 100-element array is referenced, constraints S(i) 100 are created If these constraints are consistent with the existing ones, we have a problem
End: Phase 2: Symbolic Execution Generate Symbolic Representation, Detect some types of errors
Phase 3: Inequality Solver Generate Symbolic Representation, Detect some types of errors
How the Inequality Solver works Constraints from previous phase For example, I Finds values that satisfy the constraints, using linear programming algorithm (Glover) These sets of values are our test data
How the Inequality Solver works Constraints to be satisfied: I1 + 1 <= I2 I2 – (I1 + 1) > -1 Possible to find values? Yes – 0 and 1, for instance. So, constraints are consistent. So, control path 1-5, 7, 9 executable for values that satisfy constraints.
How the Inequality Solver works Constraints to be satisfied: I1 + 1 > I2 I I2 <= -1 Possible to find values? Constraints are inconsistent. So, control path 1-3, 6-9 non-executable for any values of J and K.
End: Phase 3: Inequality Solver Generate Symbolic Representation, Detect some types of errors Generate Test Data, Find Non-executable Paths
Limitations System requires each path to be completely specified Path constraints must be linear Input and output statements are ignored
Related Work DAVE (Osterweil, Fosdick) – analyzes data flow and finds data flow anomalies between subprograms PET (Stucki) – maintains relevant information (execution count, min and max values) about statements ACES (Ramamoorthy et al.) - detects unreliable program constructs EFFIGY (King) – represents a path's computations by symbolically executing a path SELECT (Stanford Research Institute) – attempts to generate test data and verify assertions for program inputs