Putting Static Analysis to Work for Verification A Case Study Tal Lev-Ami Thomas Reps Mooly Sagiv Reinhard Wilhelm
Program Verification Mathematically prove that the program is “partially” correct on all inputs Example: Hoare style verification x n {x := x + 1} x n + 1
Why Use Program Verification? Debugging programs is hard Testing can only show the presence of errors - not their absence Can provide counter examples...
Obstacles to Program Verification Hard to specify software Does not “scale” –Limited program size –Programmer needs to provide loop invariants –Pointers and dynamically allocated objects are not handled
Our Goals Handle pointers and dynamically allocated objects (unbounded memory and/or multi-threading) No loop invariants Input: pre {Procedure} post Output: –A safe approximation to the strongest postcondition p –Issue a warning if p post Conservative: –Never misses an error –May yield false warnings
L insert_sort(L x) { L r, pr, rn, l, pl;r = x; pr = NULL; while (r != NULL) { l = x;rn = r ->n; pl = NULL; while (l != r) { if (l->data > r->data) { pr->n = rn; r->n = l; if (pl == NULL) x = r; else pl->n = r; r = pr; break; } pl = l; l = l->n; } pr = r; r = rn; } return x; } list(x) olist(x) typedef struct node { int data; struct node *n; *L;
int main() { L x, y, z, w; L create(), insert_sort(L); L merge(L,L), reverse(L); x = create(); x = insert_sort(x); y = create(); y = insert_sort(y); z = merge(x,y); w = reverse(z); } olist(x)list(x) olist(y) list(y) olist(z) rolist(w)
Conventional Verification Formulae over program variables express pre- and post-conditions The assignment rule is used to generate the strongest postcondition for non-destructive updates Programmer provides loop invariants Finite set of descriptors express pre- and post-conditions Predicate-update formulae specify safe set of descriptors (abstract semantics) Iteratively explore all the descriptors at every program point (abstract interpretation) The ADT designer can provide domain specific information via instrumentation Our Approach
Outline of the Rest of this Talk Concentrate on sorting Descriptors Compact representation of stores State-space exploration via abstract interpretation Prototype implementation in TVLA Three-Valued Logic Analyzer Conclusions
Logical representation of stores 19796null x data nnn p[x](v) n(v1, v2)dle(v1, v2)predicates p[x]=1 n dle p[x]=0 n dle p[x]=0 dle
Three-Valued Logic 1 - True 0 - False ½ = {1, 0} Unknown A join semi-lattice 0 1 = ½ ½ Information order
Blurred Representation of Stores 19796null x data nnn p[x]=1 n dle p[x]=0 n dle p[x]=0 dle p[x]=1 n dle p[x]=0 n dle
Parametric Abstraction (Blur) Merge all the nodes with the same unary “abstraction” predicate values into a single summary node Join predicate values Convert a structure of arbitrary size into a 3-valued structure of bounded size
Instrumentation Explicitly maintains information about distinctions among cells Leads to less blurring when used as abstraction predicates Unary predicates defined via a first order formula+transitive closure Example “local order” –inOrder[n](v) = v 1 : n(v, v 1 ) dle(v, v 1 ) –inROrder[n](v) = v 1 : n(v, v 1 ) dle(v 1, v)
Blurred Representation of Stores p[x]=1 inOrder[n]=1 n dle p[x]=0 inOrder[n]=1 n dle p[x]=0 inOrder[n]=1 dle inOrder[n](v1)n(v1, v2)dle(v1, v2)p[x](v) dle n n p[x]=1 inOrder[n]=1 dle p[x]=0 inOrder[n]=1 dle
Arbitrary Lists n p[x]=0 inOrder[n]=½ p[x]=1 inOrder[n]=½ dle n n p[x]=1 inOrder[n]=1 dle p[x]=0 inOrder[n]=1 dle n vs.
Abstract Interpretation Iteratively compute a set of structures at every program location Conservatively interpret statements (conditions) on blurred structures Must terminate since the number of blurred structures is finite for a given program Fully automatic Guaranteed to be sound But may be overly conservative
Abstract Interpretation of Insertion Sort p[x]=0 inOrder[n]=½ p[x]=1 inOrder[n]=½ dle n p[x]=1 inOrder[n]=1 dle p[x]=0 inOrder[n]=1 dle n n n
The Key Problem How to interpret statements (conditions) on blurred structures? Difficult to provide a conservative (and reasonably precise) interpretation –It is difficult to show that specific abstractions are conservative (Sagiv, Reps, Wilhelm, TOPLAS 98) –Long and intimidating proofs –Or no proofs (and bugs)
The best conservative interpretation Cousot&Cousot 1979 abstract representation Set of states concretization Abstract semantics statement s abstract representation abstraction Operational semantics statement s Set of states
The 3 Valued-Logic Approach Automatically derives a conservative interpretation of statements and conditions from: –structural operational semantics written using logical formulae global properties –abstraction predicates An experimental system (TVLA) Correct by construction
x->d d v 1, v 2 :p[x] (v 1 ) p[y]( v 2 ) dle (v 1, v 2 ) true p[x]=1 p[y]=0 inOrder[n]=1 dle n p[x]=0 p[y]=1 inOrder[n]=½ n dle p[x]=0 p[y]=0 inOrder[n]=½ dle n p[x]=1 p[y]=0 inOrder[n]=½ n dle p[x]=0 p[y]=1 inOrder[n]=½ n dle p[x]=0 p[y]=0 inOrder[n]=½ dle n
From Local Outlook to Global Outlook (Safety) Every time control reaches a given point: –there are no garbage memory cells –the list is acyclic –each cell is locally ordered (History) The list is a permutation of the original list
Bugs Found Pointer manipulations –null dereferences –memory leaks Forget to sort the first element Swap equal elements in bubble sort (non-termination)
L insert_sort_b2(L x) { L r, pr, rn, l, pl; if (x == NULL) return NULL; pr = x; r = x->n; while (r != NULL) { pl = x; rn = r->n; l = x->n; while (l != r) { if (l->d > r->d) { pr->n = rn; r->n = l ; pl->n = r; r = pr; break } pl = l;l = l->n; } pr = r; r = rn;} return x; } n p[x]=1 inOrder[n]=½ dle p[x]=0 inOrder[n]=1 dle n
Running Times
Properties Not Proved (Liveness) Termination Stability
Related Work Temporal-logic model checking –Manually extracts finite-state machine –Does not handle dynamically allocated data –But proves stronger properties, e.g., liveness Bourdoncle 93 –Handles integer arithmetic –Cannot handle pointers
Further Work Recursive programs (Quicksort) Experiment with other ADTs (AVL trees) Automatically derive predicate-update formulae for instrumentation predicates Scaling to larger programs –User annotations –Class-level analysis –Modular analysis –Space optimizations –Smart front-end that precomputes “cheap” information
Conclusions It is possible to automatically verify non- trivial properties of complex C programs that manipulate dynamically allocated memory w/o providing loop invariant The implementation is automatically generated from TVLA But scaling is an issue
Other Applications of TVLA Verifying “cleanness” properties of C programs (Dor, Rodeh, Sagiv 2000) –null derefernces –memory leaks Verifying safety properties of Mobile Ambients (Nielson, Nielson, Sagiv 2000) Verifying safety programs of multithreaded Java programs (Yahav 2000) –Deadlocks –Nested monitors –Read/Write interference
Boolean Connectives [Kleene]
The Operational Semantics of x = t->n x’(v 1, v 2 ) = v 1 : t (v 1 ) n (v 1, v 2 )
The Operational Semantics of x->n = NULL inOrder’[dle, n](v) = inOrder [dle, n](v) x (v) n’(v 1, v 2 ) = n (v 1, v 2 ) x (v 1 ) inROrder’[dle, n](v) = inROrder [dle, n](v) x (v)
inOrder’[dle, n](v) = (x(v)? v 1 : t(v 1 ) dle(v, v 1 ): InOrder[dle, n](v) ) inROrder’[dle, n](v) = (x(v)? v 1 : t(v 1 ) dle(v 1, v): inROrder[dle, n](v) ) The Operational Semantics of x->n = t n’(v 1, v 2 ) = n (v 1, v 2 ) (x (v 1 ) t(v 2 ))