Relevance Heuristics for Program Analysis Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A.

Slides:



Advertisements
Similar presentations
Model Checking Base on Interoplation
Advertisements

SMELS: Sat Modulo Equality with Lazy Superposition Christopher Lynch – Clarkson Duc-Khanh Tran - MPI.
Automated abstraction refinement II Heuristic aspects Ken McMillan Cadence Berkeley Labs.
The behavior of SAT solvers in model checking applications K. L. McMillan Cadence Berkeley Labs.
Exploiting SAT solvers in unbounded model checking
A practical and complete approach to predicate abstraction Ranjit Jhala UCSD Ken McMillan Cadence Berkeley Labs.
Quantified Invariant Generation using an Interpolating Saturation Prover Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A.
Exploiting SAT solvers in unbounded model checking K. L. McMillan Cadence Berkeley Labs.
Consequence Generation, Interpolants, and Invariant Discovery Ken McMillan Cadence Berkeley Labs.
SAT, Interpolants and Software Model Checking Ken McMillan Cadence Berkeley Labs.
Applications of Craig Interpolation to Model Checking K. L. McMillan Cadence Berkeley Labs.
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Quantified Invariant Generation using an Interpolating Saturation Prover Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A.
Resolution Proof System for First Order Logic
The Model Evolution Calculus with Built-in Theories Peter Baumgartner MPI Informatik, Saarbrücken
Software Model Checking with SMT Ken McMillan Microsoft Research TexPoint fonts used in EMF: A A A A A.
Inference Rules Universal Instantiation Existential Generalization
Synthesis, Analysis, and Verification Lecture 04c Lectures: Viktor Kuncak VC Generation for Programs with Data Structures “Beyond Integers”
50.530: Software Engineering
Introduction to Formal Methods for SW and HW Development 09: SAT Based Abstraction/Refinement in Model-Checking Roberto Sebastiani Based on work and slides.
SAT Based Abstraction/Refinement in Model-Checking Based on work by E. Clarke, A. Gupta, J. Kukula, O. Strichman (CAV’02)
Proofs from SAT Solvers Yeting Ge ACSys NYU Nov
Hoare’s Correctness Triplets Dijkstra’s Predicate Transformers
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 11.
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Mar, 4, 2015 Slide credit: some slides adapted from Stuart.
Interpolation and Widening Ken McMillan Microsoft Research TexPoint fonts used in EMF: A A A A A.
Logic as the lingua franca of software verification Ken McMillan Microsoft Research TexPoint fonts used in EMF: A A A A A Joint work with Andrey Rybalchenko.
Interpolants from Z3 proofs Ken McMillan Microsoft Research TexPoint fonts used in EMF: A A A A A.
Properties of SLUR Formulae Ondřej Čepek, Petr Kučera, Václav Vlček Charles University in Prague SOFSEM 2012 January 23, 2012.
What Can the SAT Experience Teach Us About Abstraction? Ken McMillan Cadence Berkeley Labs.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Axiomatic Semantics.
SAT and Model Checking. Bounded Model Checking (BMC) A.I. Planning problems: can we reach a desired state in k steps? Verification of safety properties:
Revisiting Generalizations Ken McMillan Microsoft Research Aws Albarghouthi University of Toronto.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Counterexample Generation for Separation-Logic-Based Proofs Arlen Cox Samin Ishtiaq Josh Berdine Christoph Wintersteiger.
Search in the semantic domain. Some definitions atomic formula: smallest formula possible (no sub- formulas) literal: atomic formula or negation of an.
Formal Verification Group © Copyright IBM Corporation 2008 IBM Haifa Labs SAT-based unbounded model checking using interpolation Based on a paper “Interpolation.
Last time Proof-system search ( ` ) Interpretation search ( ² ) Quantifiers Equality Decision procedures Induction Cross-cutting aspectsMain search strategy.
1 Abstraction Refinement for Bounded Model Checking Anubhav Gupta, CMU Ofer Strichman, Technion Highly Jet Lagged.
Invisible Invariants: Underapproximating to Overapproximate Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A.
Ofer Strichman, Technion Deciding Combined Theories.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Deciding a Combination of Theories - Decision Procedure - Changki pswlab Combination of Theories Daniel Kroening, Ofer Strichman Presented by Changki.
By: Pashootan Vaezipoor Path Invariant Simon Fraser University – Spring 09.
Proof Systems KB |- Q iff there is a sequence of wffs D1,..., Dn such that Dn is Q and for each Di in the sequence: a) either Di is in KB or b) Di can.
MATH 224 – Discrete Mathematics
1 A Combination Method for Generating Interpolants Greta Yorsh Madan Musuvathi Tel Aviv University, Israel Microsoft Research, Redmond, US CAV’05.
SAT and SMT solvers Ayrat Khalimov (based on Georg Hofferek‘s slides) AKDV 2014.
Solvers for the Problem of Boolean Satisfiability (SAT) Will Klieber Aug 31, 2011 TexPoint fonts used in EMF. Read the TexPoint manual before you.
Quantified Invariant Generation using an Interpolating Saturation Prover Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A.
Lazy Annotation for Program Testing and Verification Speaker: Chen-Hsuan Adonis Lin Advisor: Jie-Hong Roland Jiang November 26,
Decision methods for arithmetic Third summer school on formal methods Leonardo de Moura Microsoft Research.
CS6133 Software Specification and Verification
Symbolic and Concolic Execution of Programs Information Security, CS 526 Omar Chowdhury 10/7/2015Information Security, CS 5261.
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
Nikolaj Bjørner Microsoft Research DTU Winter course January 2 nd 2012 Organized by Flemming Nielson & Hanne Riis Nielson.
SMT and Its Application in Software Verification (Part II) Yu-Fang Chen IIS, Academia Sinica Based on the slides of Barrett, Sanjit, Kroening, Rummer,
Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding Combined Theories.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
CS357 Lecture 13: Symbolic model checking without BDDs Alex Aiken David Dill 1.
Quantified Data Automata on Skinny Trees: an Abstract Domain for Lists Pranav Garg 1, P. Madhusudan 1 and Gennaro Parlato 2 1 University of Illinois at.
Logical Agents Chapter 7. Outline Knowledge-based agents Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules and theorem.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View Basic Concepts and Background.
© Anvesh Komuravelli Spacer Model Checking with Proofs and Counterexamples Anvesh Komuravelli Carnegie Mellon University Joint work with Arie Gurfinkel,
Satisfiability Modulo Theories and DPLL(T) Andrew Reynolds March 18, 2015.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Inference and search for the propositional satisfiability problem
Introduction to Software Verification
Lecture 5 Floyd-Hoare Style Verification
Presentation transcript:

Relevance Heuristics for Program Analysis Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A

Introduction Program Analysis –Based on abstract interpretation –Useful tool for optimization and verification –Strong tension between precision and cost Relevance heuristics –Tailor abstract domain to property –Key to scaling while maintaining enough information to prove useful properties This talk –General principles underlying relevance heuristics –Applying these ideas to program analysis using Craig interpolation –Some recent research on analysis of heap manipulating programs

Static Analysis Compute the least fixed-point of an abstract transformer –This is the strongest inductive invariant the analysis can provide Inexpensive analyses: –interval analysis –affine equalities, etc. These analyses lose information at a merge: x = y x = z T This analysis is inexpensive, but insufficient if the disjunction is needed to prove the desired property

Predicate abstraction Abstract transformer: –strongest Boolean postcondition over given predicates Advantage: does not lose information at a merge –join is disjunction x = y x = z x=y _ x=z Disadvantage: – –Abstract state is exponential size in number of predicates – –Abstract domain has exponential height Result: – –Must use only predicates relevant to proving property

Relevance Heuristics Iterative refinement approach –Analyze failure of abstraction to prove property Typically use failed program traces (CEGAR) –Add relevant information to abstraction Must be sufficient to rule out failure Key questions –How do we decide what program state information is relevant? –Is relevance even a well defined notion? These questions have been well studied in the context of the Boolean satisfiability problem, and we can actually give some fairly concrete answers.

Principles Relevance: –A relevant predicate is one that is used in a parsimonious proof of the desired property Generalization principle: –Facts used in the proof of special cases tend to be relevant to the overall proof.

Relevance principles and SAT The Boolean Satisfiability Problem (SAT) –Input: A Boolean formula in CNF –Output: A satisfying assignment or UNSAT The DPLL approach: –Branch. (assign values to variables) –Propagate. (make deductions by unit resolution, or BCP) –Learn. (deduce new clauses in response to conflicts) p _ p _ : p _ : p _ _ _ Resolution rule:

DPLL approach ( a b) ( b c d) ( b d) c a Decisions b d Conflict! ( b c ) resolve Learned clause BCP guides clause learning by resolution Learning generalizes failures Learning guides decisions (VSIDS)

Two kinds of deduction Closing this loop focuses solver on relevant deductions –Allows SAT solvers to handle millions of clauses –Generates parsimonious proofs in case of unsatisfiability What lessons can we learn from this architecture for program analysis? Case Splits Propagation case-based lightweight exhaustive Generalization general guided

Invariants from unwindings Consider this very simple approach: –Partially unwind a program into a loop-free, in-line program –Construct a Floyd/Hoare proof for the in-line program –See if this proof contains an inductive invariant proving the property Example program: x = y = 0; while(*) x++; y++; while(x != 0) x--; y--; assert (y == 0); {x == y} invariant:

{x = 0 ^ y = 0} {x = y} {x = 0 ) y = 0} {False} {True} {y = 0} {y = 1} {y = 2} {y = 1} {y = 0} {False} {True} Unwind the loops Proof of inline program contains invariants for both loops Assertions may diverge as we unwind A practical method must somehow prevent this kind of divergence! x = y = 0; x++; y++; [x!=0]; x--; y--; [x!=0]; x--; y--; [x == 0] [y != 0]

Interpolation Lemma Notation: L ( ) is the set of FO formulas using –the ininterpreted symbols of (predicates and functions) –the logical symbols ^, _, :, 9, 8, (),... If A B = false, there exists an interpolant A' for (A,B) such that: A A' A' ^ B = false A' 2 L (A) \ L (B) Example: –A = p q, B = q r, A' = q [Craig,57]

Interpolants for sequences Let A 1...A n be a sequence of formulas A sequence A 0...A n is an interpolant for A 1...A n when –A 0 = True –A i -1 ^ A i ) A i, for i = 1..n –A n = False –and finally, A i 2 L (A 1...A i ) \ L (A i+1...A n ) A1A1 A2A2 A3A3 AnAn... A' 1 A' 2 A' 3 A n-1... TrueFalse )))) In other words, the interpolant is a structured refutation of A 1...A n

Interpolants as Floyd-Hoare proofs False x 1 =y 0 True y 1 >x 1 ) ) ) 1. Each formula implies the next 2. Each is over common symbols of prefix and suffix 3. Begins with true, ends with false Proving in-line programs SSA sequence Prover Interpolation Hoare Proof proof x=y; y++; [x=y] x 1 = y 0 y 1 =y 0 +1 x 1 y 1 {False} {x=y} {True} {y>x} x = y y++ [x == y]

FOCI: An Interpolating Prover Proof-generating decision procedure for quantifier-free FOL –Equality with uninterpreted function symbols –Theory of arrays –Linear rational arithmetic, integer difference bounds SAT Modulo Theories approach –Boolean reasoning performed by SAT solver –Exploits SAT relevance heuristics Quantifier-free interpolants from proofs –Linear-time construction [TACAS 04] –From Q-F interpolants, we can derive atomic predicates for Predicate Abstraction [Henzinger, et al, POPL 04] Allows counterexample-based refinement –Integrated with software verification tools Berkeley BLAST, Cadence IMPACT

But wont we diverge? Programs are infinite state, so convergence to a fixed point is not guaranteed. What would prevent us from computing an infinite sequence of interpolants, say, x=0, x=1, x=2,... as we unwind the loops further? Limited completeness result –Stratify the logical language L into a hierarchy of finite languages –Compute minimal interpolants in this hierarchy –If an inductive invariant proving the property exists in L, you must eventually converge to one Interpolation provides a means of static analysis in abstract domains of infinite height. Though we cannot compute a least fixed point, we can compute a fixed point implying a given property if one exists.

Experiments DriverLOC* Previous Time Time with FOCI Predicates Total Average Pure Interpolation kbfiltr 12k1m12s3m48s s floppy 17k7m10s25m20s s diskperf 14k5m36s13m32s s cdaudio 18k20m18s23m51s s parport 61kDNF74m58s s parclass 138kDNF77m40s s Windows DDK * Pre-processed POPL 04 CAV 06

Relevance heuristics Relevance heuristics are key to managing the precision/cost tradeoff –In general, less information is better –Effective relevance heuristics improve scaling behavior –Based on principle of generalization from special cases Interpolation approach –Yields Floyd-Hoare proofs for loop-free program fragments –Provides an effective relevance heuristic if we can solve the divergence problem –Exploits provers ability to focus on a small set of relevant facts

Expressiveness hierarchy CanonicalHeapAbstractions IndexedPredicateAbstraction PredicateAbstraction 8 FO(TC) QF Parameterized Abstract Domain InterpolantLanguage Expressiveness 8 FO

Need for quantified interpolants Existing interpolating provers cannot produce quantified interpolants Problem: how to prevent the number of quantifiers from diverging in the same way that constants diverge when we unwind the loops? for(i = 0; i < N; i++) a[i] = i; for(j = 0; j < N; j++) assert a[j] = j; { 8 x. 0 · x ^ x < i ) a[x] = x} invariant:

Need for Reachability This condition needed to prove memory safety (no use after free). Cannot be expressed in FO –We need some predicate identifying a closed set of nodes that is allocated We require a theory of reachability (in effect, transitive closure)... node *a = create_list(); while(a){ assert(alloc(a)); a = a->next; }... invariant: 8 x (rea(next,a,x) ^ x nil ! alloc(x)) Can we build an interpolating prover for full FOL than that handles reachability, and avoids divergence?

Clausal provers A clausal refutation prover takes a set of clauses and returns a proof of unsatisfiability (i.e., a refutation) if possible. A prover is based on inference rules of this form: P 1... P n C where P 1... P n are the premises and C the conclusion. A typical inference rule is resolution, of which this is an instance: p(a) p(U) ! q(U) q(a) This was accomplished by unifying p(a) and P(U), then dropping the complementary literals.

Superposition calculus Modern FOL provers based on the superposition calculus –example superposition inference: – –this is just substitution of equals for equals – –in practice this approach generates a lot of substitutions! – –use reduction order to reduce number of inferences Q(a) P ! (a = c) P ! Q(c)

Reduction orders A reduction order  is: –a total, well founded order on ground terms –subterm property: f(a)  a –monotonicity: a  b implies f(a)  f(b) Example: Recursive Path Ordering (with Status) (RPOS) –start with a precedence on symbols: a  b  c  f –induces a reduction ordering on ground terms: f(f(a)  f(a)  a  f(b)  b  c  f

These terms must be maximal in their clauses Ordering Constraint Constrains rewrites to be downward in the reduction order: Q(a) P ! (a = c) P ! Q(c) example: this inference only possible if a  c Thm: Superposition with OC is complete for refutation in FOL with equality. So how do we get interpolants from these proofs?

Local Proofs A proof is local for a pair of clause sets (A,B) when every inference step uses only symbols from A or only symbols from B. From a local refutation of (A,B), we can derive an interpolant for (A,B) in linear time. This interpolant is a Boolean combination of formulas in the proof

Reduction orders and locality A reduction order is oriented for (A,B) when: –s  t for every s L (B) and t 2L (B) Intuition: rewriting eliminates first A variables, then B variables. oriented: x  y  c  d  f x = y AB f(x) = c f(y) = d c d x = y f(x) = c ` f(y) = c f(y) = c f(y) = d ` c = d c = d c d ` ? Local!!

Orientation is not enough Local superposition gives only c=c. Solution: replace non-local superposition with two inferences: Q(a) : Q(b) A B Q  a  b  c a = c b = c Q(a) a = c Q(c) Q(a) a = U ! Q(U) This procrastination step is an example of a reduction rule, and preserves completeness. a = c Q(c) Second inference can be postponed until after resolving with : Q(b)

Completeness of local inference Thm: Local superposition with procrastination is complete for refutation of pairs (A,B) such that: –(A,B) has a universally quantified interpolant –The reduction order is oriented for (A,B) This gives us a complete method for generation of universally quantified interpolants for arbitrary first-order formulas! This is easily extensible to interpolants for sequences of formulas, hence we can use the method to generate Floyd/Hoare proofs for inline programs.

Avoiding Divergence As argued earlier, we still need to prevent interpolants from diverging as we unwind the program further. Idea: stratify the clause language Example: Let L k be the set of clauses with at most k variables and nesting depth at most k. Note that each L k is a finite language. Stratified saturation prover: – –Initially let k = 1 – –Restrict prover to generate only clauses in L k – –When prover saturates, increase k by one and continue The stratified prover is complete, since every proof is contained in some L k.

Completeness for universal invariants Lemma: For every safety program M with a 8 safety invariant, and every stratified saturation prover P, there exists an integer k such that P refutes every unwinding of M in L k, provided: – The reduction ordering is oriented properly This means that as we unwind further, eventually all the interpolants are contained in L k, for some k. Theorem: Under the above conditions, there is some unwinding of M for which the interpolants generated by P contain a safety invariant for M. This means we have a complete procedure for finding universally quantified safety invariants whenever these exist!

In practice We have proved theoretical convergence. But does the procedure converge in practice in a reasonable time? Modify SPASS, an efficient superposition-based saturation prover: –Generate oriented precedence orders –Add procrastination rule to SPASSs reduction rules –Drop all non-local inferences –Add stratification (SPASS already has something similar) Add axiomatizations of the necessary theories –An advantage of a full FOL prover is we can add axioms! –As argued earlier, we need a theory of arrays and reachability (TC)

Partially Axiomatizing FO(TC) Axioms of the theory of arrays (with select and store) 8 (A, I, V) (select(update(A,I,V), I) = V 8 (A,I,J,V) (I J ! select(update(A,I,V), J) = select(A,J)) Axioms for reachability (rea) 8 (L,E,X) (rea(L,select(L,E),X) ! rea(L,E,X)) 8 (L,E) rea(L,E,E) [ if e->link reaches x then e reaches x] 8 (L,E,X) (rea(L,E,X) ! E = X _ rea(L,select(L,E),X)) [ if e reaches x then e = x or e->link reaches x] etc... Since FO(TC) is incomplete, these axioms must be incomplete

Simple example for(i = 0; i < N; i++) a[i] = i; for(j = 0; j < N; j++) assert a[j] = j; { 8 x. 0 · x ^ x < i ) a[x] = x} invariant:

i = 0; [i < N]; a[i] = i; i++; [i < N]; a[i] = i; i++; [i >= N]; j = 0; [j < N]; j++; [j < N]; a[j] != j; Unwinding simple example Unwind the loops twice i 0 = 0 i 0 < N a 1 = update(a 0,i 0,i 0 ) i 1 = i i 1 < N a 2 = update(a 1,i 1,i 1 ) i 2 = i i ¸ N ^ j 0 = 0 j 0 < N ^ j 1 = j j 1 < N select(a 2,j 1 ) j 1 invariant {i 0 = 0} {0 · U ^ U < i 1 ) select(a 1,U)=U} {0 · U ^ U < i 2 ) select(a 2,U)=U} {j · U ^ U < N ) select(a 2,U)=U} note: stratification prevents constants diverging as 0, succ(0), succ(succ(0)),...

List deletion example Invariant synthesized with 3 unwindings (after some: simplification): a = create_list(); while(a){ tmp = a->next; free(a); a = tmp; } {rea(next,a,nil) ^ 8 x (rea(next,a,x) ! x = nil _ alloc(x))} That is, a is acyclic, and every cell is allocated Note that interpolation can synthesize Boolean structure.

More small examples This shows that divergence can be controlled. But can we scale to large programs?...

Conclusion Relevance heuristics are essential for scaling richer program analysis domains to large programs Relevance heuristics are based on a generalization principle: –Relevant facts are those used in parsimonious proofs –Facts relevant to special cases are likely to be useful in the general case Relevance heuristics for program analysis –Special cases can be program paths or loop-free unwindings –Interpolation can extract relevant facts from proofs of special cases –Must avoid divergence Quantified invariants –Needed for programs that manipulating arrays or heaps –FO equality prover modified to produce local proofs (hence interpolants) Complete for universal invariants –May be used as a relevance heuristic for shape analysis, IPA