Program Verification using Templates over Predicate Abstraction Saurabh Srivastava University of Maryland, College Park Sumit Gulwani Microsoft Research, Redmond Saurabh Srivastava University of Maryland, College Park Sumit Gulwani Microsoft Research, Redmond
What the technique will let you do! A. Infer invariants with arbitrary quantification and boolean structure ∀ ∀∃ j j.. A old ∀k∃j∀k∃j.. ∀ k 1,k 2 k1k1 k2k2 for i = 0..n { for j = i..n { find min index } if (min != i) swap A[i] with A[min] } for i = 0..n { for j = i..n { find min index } if (min != i) swap A[i] with A[min] } Selection Sort: k1k1 k2k2 < : E.g. Sortedness : E.g. Permutation B. Infer weakest preconditions A new k Worst case behavior: swap every time it can Weakest conditions on input:
Improves the state-of-art Can infer very expressive invariants Quantification – E.g. ∀ k ∃ j : (k (A[k]=B[j] ∧ j<n) Disjunction – E.g. ∀ k : (k n ∨ A[k]=0) – Previous techniques are too specialized to particular types of quantification or to quantifier-free disjunction Can infer weakest preconditions Good for debugging: can discover worst case inputs Good for analysis: can infer missing precondition facts – No satisfactory solutions known
Key facilitators Templates – Task of inferring conjunctive facts for the holes remains Predicate Abstraction – Allows us to efficiently compute solutions for the holes – E.g., {i j, i≤j, i≥j, i j+1… } Unknown holes Relational operator, ≥ … Relational operator, ≥ … Arithmetic operator +, - … Arithmetic operator +, - … Program variables, array elements Program variables, array elements Limited constants Limited constants
Outline Three fixed-point inference algorithms – Iterative fixed-point Greatest Fixed-Point (GFP) Least Fixed-Point (LFP) – Constraint-based (CFP) Optimal Solutions – Built over a clean theorem proving interface Weakest Precondition Generation Experimental evaluation (LFP) (GFP) (CFP)
Outline Three fixed-point inference algorithms – Iterative fixed-point Greatest Fixed-Point (GFP) Least Fixed-Point (LFP) – Constraint-based (CFP) Optimal Solutions – Built over a clean theorem proving interface Weakest Precondition Generation Experimental evaluation (LFP) (GFP) (CFP)
Analysis Setup post pre I1I1 I2I2 Loop headers (with invariants) split program into simple paths. Simple paths induce program constraints (verification conditions) vc(pre,I 1 ) vc(I 1,post) vc(I 1,I 2 ) vc(I 2,I 2 ) vc(I 2,I 1 ) true ∧ i=0 => I 1 I 1 ∧ i≥n => sorted array I 1 ∧ i I 2 Sorted array i<n true j<n Find min index if (min != i) swap A[i], A[min] i:=0 j:=i I1I1 I2I2 E.g. Selection Sort:
I 1 ∧ i≥n => ∀ k 1,k 2 : 0≤k 1 A[k 1 ] ≤A[k 2 ] I 1 ∧ i≥n => ∀ k 1,k 2 : 0≤k 1 A[k 1 ] ≤A[k 2 ] Analysis Setup post pre I1I1 I2I2 Loop headers (with invariants) split program into simple paths. Simple paths induce program constraints (verification conditions) vc(pre,I 1 ) vc(I 1,post) vc(I 1,I 2 ) vc(I 2,I 2 ) vc(I 2,I 1 ) true ∧ i=0 => I 1 I 1 ∧ i I 2 Simple FOL formulae over I 1, I 2 ! We will exploit this. Simple FOL formulae over I 1, I 2 ! We will exploit this.
Candidate Solution Iterative Fixed-point: Overview post pre I1I1 I2I2 vc(pre,I 1 ) vc(I 1,post) vc(I 1,I 2 ) vc(I 2,I 2 ) vc(I 2,I 1 ) { vc(pre,I 1 ),vc(I 1,I 2 ) } VCs that are not satisfied ✗✓ ✓ ✓ ✗ Values for invariants
Computable because: – Templates make ∨ explicit – Predicate abstraction restricts search to finite space Iterative Fixed-point: Overview post pre I1I1 I2I2 vc(pre,I 1 ) vc(I 1,post) vc(I 1,I 2 ) vc(I 2,I 2 ) vc(I 2,I 1 ) Improve candidate – Pick unsat VC – Use theorem prover to compute optimal change – Results in new candidates { … } Set of candidates Improve candidate Candidate satisfies all VCs
Backwards Iterative (GFP) post pre Backward: Always pick the source invariant of unsat constraint to improve Start with Backward: Always pick the source invariant of unsat constraint to improve Start with Candidate Sols Unsat constraints { vc(I 1,post) } ⊤ ⊤ ⊤ ⊤ ✓ ✓ ✓ ✗ ✓
Backwards Iterative (GFP) post pre ✓ ✓ Backward: Always pick the source invariant of unsat constraint to improve Start with Backward: Always pick the source invariant of unsat constraint to improve Start with Candidate Sols Unsat constraints { vc(I 1,post) } { vc(I 2,I 1 ) } ✓ Optimally strengthen so vc(pre,I 1 ) ok unless no sol n ✗ ✓
Backwards Iterative (GFP) post pre ✓ ✓ Backward: Always pick the source invariant of unsat constraint to improve Start with Backward: Always pick the source invariant of unsat constraint to improve Start with Candidate Sols Unsat constraints { vc(I 1,post) } { vc(I 2,I 1 ) } { vc(I 2,I 2 ) } { vc(I 2,I 2 ),vc(I 1,I 2 ) } Multiple orthogonal optimal sols ✓ Optimally strengthen so vc(pre,I 1 ) ok unless no sol n ✗ ✗
Backwards Iterative (GFP) post pre ✓ ✓ ✓ Backward: Always pick the source invariant of unsat constraint to improve Start with Backward: Always pick the source invariant of unsat constraint to improve Start with Candidate Sols Unsat constraints { vc(I 1,post) } { vc(I 2,I 1 ) } { vc(I 2,I 2 ) } { vc(I 2,I 2 ),vc(I 1,I 2 ) } { vc(I 2,I 2 ) } { vc(I 1,I 2 ) } Multiple orthogonal optimal sols ✓ Optimally strengthen so vc(pre,I 1 ) ok unless no sol n ✗
Backwards Iterative (GFP) post pre ✓ ✓ ✓ ✓ Backward: Always pick the source invariant of unsat constraint to improve Start with Backward: Always pick the source invariant of unsat constraint to improve Start with Candidate Sols Unsat constraints { vc(I 1,post) } { vc(I 2,I 1 ) } { vc(I 2,I 2 ) } { vc(I 2,I 2 ),vc(I 1,I 2 ) } { vc(I 2,I 2 ) } { vc(I 1,I 2 ) } { vc(I 2,I 2 ) } none Multiple orthogonal optimal sols ✓ Optimally strengthen so vc(pre,I 1 ) ok unless no sol n : GFP solution
Forward Iterative (LFP) Forward: Same as GFP except Pick the destination invariant of unsat constraint to improve Start with Forward: Same as GFP except Pick the destination invariant of unsat constraint to improve Start with
Constraint-based over Predicate Abstraction post pre I1I1 I2I2 vc(pre,I 1 ) vc(I 1,post) vc(I 1,I 2 ) vc(I 2,I 2 ) vc(I 2,I 1 )
Constraint-based over Predicate Abstraction vc(pre,I 1 ) vc(I 1,post ) vc(I 1,I 2 ) vc(I 2,I 2 ) vc(I 2,I 1 ) pred(I 1 ) template p 1,p 2 …p r unknown 1 unknown 2 predicates b 1 1,b 2 1 …b r 1 b 1 2,b 2 2 …b r 2 boolean indicators I1I1 Remember: VCs are FOL formulae over I 1, I 2
Constraint-based over Predicate Abstraction vc(pre,I 1 ) vc(I 1,post ) vc(I 1,I 2 ) vc(I 2,I 2 ) vc(I 2,I 1 ) pred(I 1 ) pred(A) : A to unknown predicate indicator variables pred(A) : A to unknown predicate indicator variables Remember: VCs are FOL formulae over I 1, I 2
Constraint-based over Predicate Abstraction vc(pre,I 1 ) vc(I 1,post ) vc(I 1,I 2 ) vc(I 2,I 2 ) vc(I 2,I 1 ) boolc(pred(I 1 ), pred(I 2 )) Optimal solutions from SMT solver to impose minimal constraints Optimal solutions from SMT solver to impose minimal constraints pred(A) : A to unknown predicate indicator variables pred(A) : A to unknown predicate indicator variables boolc(pred(I 1 )) boolc(pred(I 2 ), pred(I 1 )) boolc(pred(I 2 )) Boolean constraint to satisfying sol n (SAT Solver) Boolean constraint to satisfying sol n (SAT Solver) Invariant sol n Invariant sol n ∧ boolc(pred(I 1 )) Program constraint to boolean constraint Program constraint to boolean constraint SAT formulae over predicate indicators Local reasoning Fixed-Point Computation
Outline Three fixed-point inference algorithms – Iterative fixed-point Greatest Fixed-Point (GFP) Least Fixed-Point (LFP) – Constraint-based (CFP) Optimal Solutions – Built over a clean theorem proving interface Weakest Precondition Generation Experimental evaluation (LFP) (GFP) (CFP)
Optimal Solutions Key: Polarity of unknowns in formula Φ – Positive or negative: Value of positive unknown stronger => Φ stronger Value of negative unknown stronger => Φ weaker Optimal Soln: Maximally strong positives, maximally weak negatives Assume theorem prover interface: OptNegSol – Optimal solutions for formula with only negative unknowns – Built using a lattice search by querying SMT Solver ¬ negative u1u1 positive ∨ u 2 ) ∧ u 3 ( positive ∀ x : ∃ y : Φ =
Optimal Solutions using OptNegSol formula Φ contains unknowns: Repeat until set stable Optimal Solutions for formula Φ Optimal Solutions for formula Φ Opt Opt’ Opt’’ Merge … Merge positive tuples Φ[] ] ] u 1 …u P positive u 1 …u N negative … a 1 …a P a’ 1 …a’ P a’’ 1 …a’’ P P-tuple that assigns a single predicate to each positive unknown P-tuple that assigns a single predicate to each positive unknown a 1 …a P,S 1 …S N a’ 1 …a’ P,S’ 1 …S’ N … S i sol n for the negative unknowns S i sol n for the negative unknowns a’’ 1 …a’’ P,S’’ 1 …S’’ N OptNegSol … P x Size of predicate set
Outline Three fixed-point inference algorithms – Iterative fixed-point Greatest Fixed-Point (GFP) Least Fixed-Point (LFP) – Constraint-based (CFP) Optimal Solutions – Built over a clean theorem proving interface Weakest Precondition Generation Experimental evaluation (LFP) (GFP) (CFP)
Computing maximally weak preconditions Iterative: – Backwards algorithm: Computes maximally-weak invariants in each step – Precondition generation using GFP: Output disjuncts of entry fixed-point in all candidates Constraint-based: – Generate one solution for precondition – Assert constraint that ensures strictly weaker pre – Iterate till unsat—last precondition generated is maximally weakest
Outline Three fixed-point inference algorithms – Iterative fixed-point Greatest Fixed-Point (GFP) Least Fixed-Point (LFP) – Constraint-based (CFP) Optimal Solutions – Built over a clean theorem proving interface Weakest Precondition Generation Experimental evaluation (LFP) (GFP) (CFP)
Implementation C Program Templates Predicate Set CFG Invariants Preconditions Invariants Preconditions Microsoft’s Phoenix Compiler Verification Conditions Iterative Fixed-point GFP/LFP Constraint- based Fixed-Point Candidate Solutions Boolean Constraint Z3 SMT Solver
Verifying Sorting Algorithms Benchmarks – Considered difficult to verify – Require invariants with quantifiers – Sorting, ideal benchmark programs 5 major sorting algorithms – Insertion Sort, Bubble Sort (n 2 version and termination checking version), Selection Sort, Quick Sort and Merge Sort Properties: – Sort elements ∀ k : 0≤k A[k] ≤A[k+1] --- output array is non-decreasing – Maintain permutation, when elements distinct ∀ k ∃ j : 0≤k A[k]=A old [j] & 0≤j<n --- no elements gained ∀ k ∃ j : 0≤k A old [k]=A[j] & 0≤j<n --- no elements lost
Runtimes: Sortedness seconds Tool can prove sortedness for all sorting algorithms!
Runtimes: Permutation ∞ ∞ … seconds …Permutations too!
Inferring Preconditions Given a property (worst case runtime or functional correctness) what is the input required for the property to hold Tool automatically infers non-trivial inputs/preconditions Worst case input (precondition) for sorting: – Selection Sort: sorted array except last element is the smallest – Insertion, Quick Sort, Bubble Sort (flag): Reverse sorted array Inputs (precondition) for functional correctness: – Binary search requires sorted input array – Merge in Merge sort requires sorted inputs – Missing initializers required in various other programs
Runtimes (GFP): Inferring worst case inputs for sorting seconds Tool infers worst case inputs for all sorting algorithms! Nothing to infer as all inputs yield the same behavior
Conclusions Powerful invariant inference over predicate abstraction – Can infer quantifier invariants Three algorithms with different strengths – Iterative: Least fixed-point and Greatest fixed-point – Constraint-based Extend to maximally-weak precondition inference – Worst case inputs – Preconditions for functional correctness Techniques builds on SMT Solvers, so exploit their power Successfully verified/inferred preconditions – All major sorting algorithms – Other difficult benchmarks Project Webpage: