Quantified Invariant Generation using an Interpolating Saturation Prover Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A.

Slides:



Advertisements
Similar presentations
Model Checking Base on Interoplation
Advertisements

Model Checking Lecture 4. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
SMELS: Sat Modulo Equality with Lazy Superposition Christopher Lynch – Clarkson Duc-Khanh Tran - MPI.
The behavior of SAT solvers in model checking applications K. L. McMillan Cadence Berkeley Labs.
Exploiting SAT solvers in unbounded model checking
Artificial Intelligence
A practical and complete approach to predicate abstraction Ranjit Jhala UCSD Ken McMillan Cadence Berkeley Labs.
Quantified Invariant Generation using an Interpolating Saturation Prover Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A.
Exploiting SAT solvers in unbounded model checking K. L. McMillan Cadence Berkeley Labs.
Consequence Generation, Interpolants, and Invariant Discovery Ken McMillan Cadence Berkeley Labs.
Applications of Craig Interpolation to Model Checking K. L. McMillan Cadence Berkeley Labs.
Relevance Heuristics for Program Analysis Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A.
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Completeness and Expressiveness
Resolution Proof System for First Order Logic
Software Model Checking with SMT Ken McMillan Microsoft Research TexPoint fonts used in EMF: A A A A A.
Biointelligence Lab School of Computer Sci. & Eng.
Inference Rules Universal Instantiation Existential Generalization
Synthesis, Analysis, and Verification Lecture 04c Lectures: Viktor Kuncak VC Generation for Programs with Data Structures “Beyond Integers”
SLD-resolution Introduction Most general unifiers SLD-resolution
10 October 2006 Foundations of Logic and Constraint Programming 1 Unification ­An overview Need for Unification Ranked alfabeths and terms. Substitutions.
Knowledge & Reasoning Logical Reasoning: to have a computer automatically perform deduction or prove theorems Knowledge Representations: modern ways of.
Primitive Recursive Functions (Chapter 3)
Resolution.
Proofs from SAT Solvers Yeting Ge ACSys NYU Nov
Hoare’s Correctness Triplets Dijkstra’s Predicate Transformers
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 11.
First Order Logic Resolution
Inference and Reasoning. Basic Idea Given a set of statements, does a new statement logically follow from this. For example If an animal has wings and.
We have seen that we can use Generalized Modus Ponens (GMP) combined with search to see if a fact is entailed from a Knowledge Base. Unfortunately, there.
For Friday No reading Homework: –Chapter 9, exercise 4 (This is VERY short – do it while you’re running your tests) Make sure you keep variables and constants.
Interpolation and Widening Ken McMillan Microsoft Research TexPoint fonts used in EMF: A A A A A.
Logic as the lingua franca of software verification Ken McMillan Microsoft Research TexPoint fonts used in EMF: A A A A A Joint work with Andrey Rybalchenko.
Interpolants from Z3 proofs Ken McMillan Microsoft Research TexPoint fonts used in EMF: A A A A A.
INFINITE SEQUENCES AND SERIES
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Axiomatic Semantics.
SAT and Model Checking. Bounded Model Checking (BMC) A.I. Planning problems: can we reach a desired state in k steps? Verification of safety properties:
Revisiting Generalizations Ken McMillan Microsoft Research Aws Albarghouthi University of Toronto.
Formal Logic Proof Methods Direct Proof / Natural Deduction Conditional Proof (Implication Introduction) Reductio ad Absurdum Resolution Refutation.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Inference and Resolution for Problem Solving
Winter 2004/5Pls – inductive – Catriel Beeri1 Inductive Definitions (our meta-language for specifications)  Examples  Syntax  Semantics  Proof Trees.
Formal Verification Group © Copyright IBM Corporation 2008 IBM Haifa Labs SAT-based unbounded model checking using interpolation Based on a paper “Interpolation.
Knoweldge Representation & Reasoning
Invisible Invariants: Underapproximating to Overapproximate Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A.
1 First order theories. 2 Satisfiability The classic SAT problem: given a propositional formula , is  satisfiable ? Example:  Let x 1,x 2 be propositional.
By: Pashootan Vaezipoor Path Invariant Simon Fraser University – Spring 09.
Proof Systems KB |- Q iff there is a sequence of wffs D1,..., Dn such that Dn is Q and for each Di in the sequence: a) either Di is in KB or b) Di can.
MATH 224 – Discrete Mathematics
SAT and SMT solvers Ayrat Khalimov (based on Georg Hofferek‘s slides) AKDV 2014.
Leonardo de Moura and Nikolaj Bjørner Microsoft Research.
1 Inference Rules and Proofs (Z); Program Specification and Verification Inference Rules and Proofs (Z); Program Specification and Verification.
Quantified Invariant Generation using an Interpolating Saturation Prover Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A.
11.2 Series In this section, we will learn about: Various types of series. INFINITE SEQUENCES AND SERIES.
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
Lazy Annotation for Program Testing and Verification Speaker: Chen-Hsuan Adonis Lin Advisor: Jie-Hong Roland Jiang November 26,
Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.
CS6133 Software Specification and Verification
From Hoare Logic to Matching Logic Reachability Grigore Rosu and Andrei Stefanescu University of Illinois, USA.
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
SMT and Its Application in Software Verification (Part II) Yu-Fang Chen IIS, Academia Sinica Based on the slides of Barrett, Sanjit, Kroening, Rummer,
Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding Combined Theories.
CS357 Lecture 13: Symbolic model checking without BDDs Alex Aiken David Dill 1.
Logical Agents Chapter 7. Outline Knowledge-based agents Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules and theorem.
1 Simulating Reachability using First-Order Logic with Applications to Verification of Linked Data Structures Tal Lev-Ami 1, Neil Immerman 2, Tom Reps.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Introduction to Logic for Artificial Intelligence Lecture 2
Great Theoretical Ideas in Computer Science
Lifting Propositional Interpolants to the Word-Level
Axiomatic semantics Points to discuss: The assignment statement
Presentation transcript:

Quantified Invariant Generation using an Interpolating Saturation Prover Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A

Introduction Interpolants derived from proofs can provide an effective relevance heuristic for constructing inductive invariants –Provides a way of generalizing proofs about bounded behaviors to the unbounded case Exploits a provers ability to focus on relevant facts –Used in various applications, including Hardware verification (propositional case) Predicate abstraction (quantifier-free) Program verification (quantifier-free) This talk –Moving to the first-order case, including FO(TC) –Modifying SPASS to create an interpolating FO prover –Apply to program verification with arrays, linked lists

Invariants from unwindings Consider this very simple approach: –Partially unwind a program into a loop-free, in-line program –Construct a Floyd/Hoare proof for the in-line program –See if this proof contains an inductive invariant proving the property Example program: x = y = 0; while(*) x++; y++; while(x != 0) x--; y--; assert (y == 0); {x == y} invariant:

{x = 0 ^ y = 0} {x = y} {x = 0 ) y = 0} {False} {True} {y = 0} {y = 1} {y = 2} {y = 1} {y = 0} {False} {True} Unwind the loops Proof of inline program contains invariants for both loops Assertions may diverge as we unwind A practical method must somehow prevent this kind of divergence! x = y = 0; x++; y++; [x!=0]; x--; y--; [x!=0]; x--; y--; [x == 0] [y != 0]

Interpolation Lemma If A B = false, there exists an interpolant A' for (A,B) such that: –A implies A –A is inconsistent with B –A is expressed over the common vocabulary of A and B [Craig,57] A variety of techniques exist for deriving an interpolant from a refutation of A B, generated by a theorem prover.

Interpolants as Floyd-Hoare proofs False x 1 =y 0 True y 1 >x 1 ) ) ) 1. Each formula implies the next 2. Each is over common symbols of prefix and suffix 3. Begins with true, ends with false Proving in-line programs SSA sequence Prover Interpolation Hoare Proof proof x=y; y++; [x=y] x 1 = y 0 y 1 =y 0 +1 x 1 y 1 {False} {x=y} {True} {y>x} x = y y++ [x == y]

Need for quantified interpolants Existing interpolating provers cannot produce quantified interpolants Problem: how to prevent the number of quantifiers from diverging in the same way that constants diverge when we unwind the loops? For linked structures we also require a theory of reachability (in effect, transitive closure) for(i = 0; i < N; i++) a[i] = i; for(j = 0; j < N; j++) assert a[j] = j; { 8 x. 0 · x ^ x < i ) a[x] = x} invariant: Can we build an interpolating prover for full FOL than that handles reachability, and avoids divergence?

Clausal provers A clausal refutation prover takes a set of clauses and returns a proof of unsatisfiability (i.e., a refutation) if possible. A prover is based on inference rules of this form: P 1... P n C where P 1... P n are the premises and C the conclusion. A typical inference rule is resolution, of which this is an instance: p(a) p(U) ! q(U) q(a) This was accomplished by unifying p(a) and P(U), then dropping the complementary literals.

Superposition calculus Modern FOL provers based on the superposition calculus –example superposition inference: – –this is just substitution of equals for equals – –in practice this approach generates a lot of substitutions! – –use reduction order to reduce number of inferences Q(a) P ! (a = c) P ! Q(c)

Reduction orders A reduction order  is: –a total, well founded order on ground terms –subterm property: f(a)  a –monotonicity: a  b implies f(a)  f(b) Example: Recursive Path Ordering (with Status) (RPOS) –start with a precedence on symbols: a  b  c  f –induces a reduction ordering on ground terms: f(f(a)  f(a)  a  f(b)  b  c  f

These terms must be maximal in their clauses Ordering Constraint Constrains rewrites to be downward in the reduction order: Q(a) P ! (a = c) P ! Q(c) example: this inference only possible if a  c Thm: Superposition with OC is complete for refutation in FOL with equality. So how do we get interpolants from these proofs?

Local Proofs A proof is local for a pair of clause sets (A,B) when every inference step uses only symbols from A or only symbols from B. From a local refutation of (A,B), we can derive an interpolant for (A,B) in linear time. This interpolant is a Boolean combination of formulas in the proof

Reduction orders and locality A reduction order is oriented for (A,B) when: –s  t for every s L (B) and t 2L (B) Intuition: rewriting eliminates first A variables, then B variables. oriented: x  y  c  d  f x = y AB f(x) = c f(y) = d c d x = y f(x) = c ` f(y) = c f(y) = c f(y) = d ` c = d c = d c d ` ? Local!!

Orientation is not enough Local superposition gives only c=c. Solution: replace non-local superposition with two inferences: Q(a) : Q(b) A B Q  a  b  c a = c b = c Q(a) a = c Q(c) Q(a) a = U ! Q(U) This procrastination step is an example of a reduction rule, and preserves completeness. a = c Q(c) Second inference can be postponed until after resolving with : Q(b)

Completeness of local inference Thm: Local superposition with procrastination is complete for refutation of pairs (A,B) such that: –(A,B) has a universally quantified interpolant –The reduction order is oriented for (A,B) This gives us a complete method for generation of universally quantified interpolants for arbitrary first-order formulas! This is easily extensible to interpolants for sequences of formulas, hence we can use the method to generate Floyd/Hoare proofs for inline programs.

Avoiding Divergence As argued earlier, we still need to prevent interpolants from diverging as we unwind the program further. Idea: stratify the clause language Example: Let L k be the set of clauses with at most k variables and nesting depth at most k. Note that each L k is a finite language. Stratified saturation prover: – –Initially let k = 1 – –Restrict prover to generate only clauses in L k – –When prover saturates, increase k by one and continue The stratified prover is complete, since every proof is contained in some L k.

Completeness for universal invariants Lemma: For every safety program M with a 8 safety invariant, and every stratified saturation prover P, there exists an integer k such that P refutes every unwinding of M in L k, provided: – The reduction ordering is oriented properly This means that as we unwind further, eventually all the interpolants are contained in L k, for some k. Theorem: Under the above conditions, there is some unwinding of M for which the interpolants generated by P contain a safety invariant for M. This means we have a complete procedure for finding universally quantified safety invariants whenever these exist!

In practice We have proved theoretical convergence. But does the procedure converge in practice in a reasonable time? Modify SPASS, an efficient superposition-based saturation prover: –Generate oriented precedence orders –Add procrastination rule to SPASSs reduction rules –Drop all non-local inferences –Add stratification (SPASS already has something similar) Add axiomatizations of the necessary theories –An advantage of a full FOL prover is we can add axioms! –As argued earlier, we need a theory of arrays and reachability (TC) Since this theory is not finitely axiomatizable, we use an incomplete axiomatization that is intended to handle typical operations in list- manipulating programs

Simple example for(i = 0; i < N; i++) a[i] = i; for(j = 0; j < N; j++) assert a[j] = j; { 8 x. 0 · x ^ x < i ) a[x] = x} invariant:

i = 0; [i < N]; a[i] = i; i++; [i < N]; a[i] = i; i++; [i >= N]; j = 0; [j < N]; j++; [j < N]; a[j] != j; Unwinding simple example Unwind the loops twice i 0 = 0 i 0 < N a 1 = update(a 0,i 0,i 0 ) i 1 = i i 1 < N a 2 = update(a 1,i 1,i 1 ) i 2 = i i ¸ N ^ j 0 = 0 j 0 < N ^ j 1 = j j 1 < N select(a 2,j 1 ) j 1 invariant {i 0 = 0} {0 · U ^ U < i 1 ) select(a 1,U)=U} {0 · U ^ U < i 2 ) select(a 2,U)=U} {j · U ^ U < N ) select(a 2,U)=U} note: stratification prevents constants diverging as 0, succ(0), succ(succ(0)),...

List deletion example Invariant synthesized with 3 unwindings (after some: simplification): a = create_list(); while(a){ tmp = a->next; free(a); a = tmp; } {rea(next,a,nil) ^ 8 x (rea(next,a,x) ! x = nil _ alloc(x))} That is, a is acyclic, and every cell is allocated Note that interpolation can synthesize Boolean structure.

More small examples This shows that divergence can be controlled. But can we scale to large programs?...

A slightly larger program We have to track a, b and c to prove this property –Lets look at what happens with canonical heap abstractions... main(){ node *a = create_list(); node *b = create_list(); node *c = create_list(); node *p = * ? a : * ? b : c while(p){ assert(alloced(p)); p = p->next; }

After creating a a is_null(n) a Pt a (n)is_null(n) a Pt a (n)Re n (n)is_null(n) alloced(n) Predicates: Pt a, Rea a, is_null, alloced Relations: next

After creating b a is_null(n) a Pt a (n)is_null(n) alloced(n) a Pt a (n)Re n (n)is_null(n) alloced(n) a is_null(n) a Pt a (n)is_null(n) alloced(n) a Pt a (n)Re n (n)is_null(n) alloced(n) a is_null(n) a Pt a (n)is_null(n) alloced(n) a Pt a (n)Re n (n)is_null(n) alloced(n) b is_null(n) b Pt a (n)is_null(n) alloced(n) b Pt a (n)Re n (n)is_null(n) alloced(n) b is_null(n) b Pt a (n)is_null(n) alloced(n) b Pt a (n)Re n (n)is_null(n) alloced(n) b is_null(n) b Pt a (n)is_null(n) alloced(n) b Pt a (n)Re n (n)is_null(n) alloced(n)

After creating c [ Picture 27 abstract heaps here ] Problem: abstraction scales exponentially with number of independent data structures.

Independent analyses Suppose we do a Cartesian product of 3 independent analyses for a,b,c. a is_null(n) a Pt a (n)is_null(n) a Pt a (n)Re n (n)is_null(n) alloced(n) b is_null(n) b Pt a (n)is_null(n) b Pt a (n)Re n (n)is_null(n) alloced(n) c is_null(n) c Pt a (n)is_null(n) c Pt a (n)Re n (n)is_null(n) alloced(n) ^^ How do we know we can decompose the analysis in this way and prove the property? – –What if some correlations are needed between the analyses? For non-heap properties, one good answer is to compute interpolants.

Abstraction from interpolants Interpolants contain inductive invariants after unrolling loops 3 times. Interpolant after creating c: main(){ node *a = create_list(); node *b = create_list(); node *c = create_list(); node *p = * ? x : * ? b : c while(p){ assert(alloced(p)); p = p->next; } ( a 0 ) alloced(a) ) ^ ( b 0 ) alloced(b) ) ^ ( c 0 ) alloced(c) ) 8 x. (x 0 ^ alloced(x) ) alloced(next(x)) ^

Shape of the interpolant Invariant says that allocated cells closed under next relation Notice also the size of this formula is linear in the number of lists, not exponential as is the set of shape graphs. ( a 0 ) alloced(a) ) ^ ( b 0 ) alloced(b) ) ^ ( c 0 ) alloced(c) ) 8 x. (x 0 ^ alloced(x) ) alloced(next(x)) ^ abcabc null alloced next

Suggests decomposition Each of these analyses proves one conjunct of the invariant. ( a 0 ) alloced(a) ) ^ ( b 0 ) alloced(a) ) ^ ( c 0 ) alloced(a) ) 8 x. (x 0 ^ alloced(x) ) alloced(next(x)) ^ Predicates Canonical abstract domains Relations a = 0, alloced(n) b = 0, alloced(n) c = 0, alloced(n) n = 0, alloced(n) next

Conclusion Interpolants and invariant generation –Computing interpolants from proofs allows us to generalize from special cases such as loop-free unwindings –Interpolation can extract relevant facts from proofs of these special cases –Must avoid divergence Quantified invariants –Needed for programs that manipulating arrays or heaps –FO equality prover modified to produce local proofs (hence interpolants) Complete for universal invariants –Can be used to construct invariants of simple array- and list-manipulating programs, using partial axiomatization of FO(TC) Language stratification prevents divergence –Might be used as a relevance heuristic for shape analysis, IPA

Expressiveness hierarchy CanonicalHeapAbstractions IndexedPredicateAbstraction PredicateAbstraction 8 FO(TC) QF Parameterized Abstract Domain InterpolantLanguage Expressiveness 8 FO

Interpolants for sequences Let A 1...A n be a sequence of formulas A sequence A 0...A n is an interpolant for A 1...A n when –A 0 = True –A i -1 ^ A i ) A i, for i = 1..n –A n = False –and finally, A i 2 L (A 1...A i ) \ L (A i+1...A n ) A1A1 A2A2 A3A3 AnAn... A' 1 A' 2 A' 3 A n-1... TrueFalse )))) In other words, the interpolant is a structured refutation of A 1...A n

Need for Reachability This condition needed to prove memory safety (no use after free). Cannot be expressed in FO –We need some predicate identifying a closed set of nodes that is allocated We require a theory of reachability (in effect, transitive closure)... node *a = create_list(); while(a){ assert(alloc(a)); a = a->next; }... invariant: 8 x (rea(next,a,x) ^ x nil ! alloc(x)) Can we build an interpolating prover for full FOL than that handles reachability, and avoids divergence?

Partially Axiomatizing FO(TC) Axioms of the theory of arrays (with select and store) 8 (A, I, V) (select(update(A,I,V), I) = V 8 (A,I,J,V) (I J ! select(update(A,I,V), J) = select(A,J)) Axioms for reachability (rea) 8 (L,E,X) (rea(L,select(L,E),X) ! rea(L,E,X)) 8 (L,E) rea(L,E,E) [ if e->link reaches x then e reaches x] 8 (L,E,X) (rea(L,E,X) ! E = X _ rea(L,select(L,E),X)) [ if e reaches x then e = x or e->link reaches x] etc... Since FO(TC) is incomplete, these axioms must be incomplete