The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs

What is Symbolic Model Checking? Trading in one difficulty... –The state explosion problem...for another difficulty. –PSPACE completeness of QBF Theoreticians are nonplused...

Another view Abstract interpretation –Compute an approximation of the collecting semantics as a fixed point Symbolic model checking –Compute the exact collecting semantics as a fixed point, using some compact representation As a working definition, well say SMC is computing exact fixed points with compact representations.

Example: CTL model checking Fixed point characterization of operators AG p = Q. p Æ (AX Q) AF p = Q. p Ç (AX Q) EF p = Q. p Ç (EX Q) EG p = Q. p Æ (EX Q) Image operators EX p = V. 9 V. (p(V) Æ T(V,V)) AX p = V. 8 V. (T(V,V) ) p(V)) Trick is to reduce these QBF expressions to some compact normal form (hopeless, but interesting...)

The magic of BDDs BDDs provide the following desirata for SMC: –Efficient Boolean operations –Quantifier elimination (exponential, but efficient in practice) –Efficient reduction to canonical form Note, canonical form is useful, but not necessary for detecting fixed points. Main advantage is that it prevents explosion of the representation as we iterate. BDDs exploit low mutual information of components Component A Component B cut width determined by mutual information

Application domain Symbolic model checking with BDDs is appropriate when –State space is dense, or –Branching factor is high...otherwise explicit state tends to be more efficient Hardware model checking is in the sweet spot –Very fine grain parallelism dense state space –Branching factor exponential in number of inputs Protocol verification is a poor application –Few states reachable –Branching factor linear in number of processes (interleaving)

SMC as Paradigm The idea of computing fixed points with compact representations, and its embodiment as BDD-based SMC have several qualities of Kuhns notion of a paradigm: –Responds to a crisis the state explosion problem –Solves some specific problem previously unsolvable say, verify the Encore Gigamax cache protocols –Shows potential to solve many more problems advantage of magic – cant prove it doesnt work –Leaves many more problems unsolved than solved leaves room for future research Lets follow the history of the elaboration of this paradigm

Development of a paradigm Finally, a research paradigm becomes just a tool in the toolbox BDD-based SMC Better BDD algorithms More compact formsOther applications Abstraction methods, etc.

Image computation An intractable problem spins off many intractable subproblems. –Main problem: transition relation cannot be expressed as one BDD Approaches to image computation –Coudert and Madre Use vector function representation for transition relation Use constrain operation to reduce to range computation Case splitting strategies (over range, domain) –Burch and Long Leave transition relation as implicit conjunction or disjunction Early quantification – push quantifiers inside Æ and Ç Many optimization approaches Each of these creates its own intractable subproblems

Quantification scheduling Push quantifiers inside as we build the conjunction –Try to minimize number of intermediate variables 9 V. (P Æ T 1 Æ T 2... Æ T n ) US Patent Nr. 6,131,078 9 V. ( 9 v 1 (P Æ T 1 ) Æ T 2... Æ T n ) This basic idea spins off interesting optimization problems

Optimizing Quantification This by itself is a hard optimization problem –Many heuristic approaches (greedy, simulated annealing, etc.) Cuts at gate level also possible –How to decompose? Fine grain or coarse grain? Case splitting can improve cut width Find a series of cuts minimizing communication These problems were never fully solved, but a consensus approach developed.

Variable ordering Optimizing BDD variable order also an intractable problem –Structural methods (many authors) Many variations, good for circuits, but not very effective for SMC –Hill-climbing method (Rudell) Many variations (windows, etc.) – very time consuming Very tricky space/time tradeoffs –Optimal methods seem out of reach Search ordering –BFS often leads to large intermediate BDDs –Many heursistic strategies possible Again, we never solved these problems, just declared victory. The standard approaches to variable ordering and image computation bring us essentially to the current state-of-the-art.

More compact representations BDDs were first, but no reason to think only representation –Some simple structures are hard for BDDs (e.g., pointers) –Decision diagrams provide a nice paradigm Result: bewildering array of decision diagrams (*DDs) –Different node interpretations (ZDDs, Kronecker DDs, etc) –Decompositions Conjunctive, disjunctive, disjoint, etc... –Representations base on mimimal automata Tree BDDs, cube sets, word automata, etc. Many of these can be shown to be more compact than BDDs for some motivating class of examples. Does this mean they are useful for SMC?

BDDs and DFAs BDD is, approximately, a minimal DFA over fixed-length words 01010 v1v1 v2v2 v3v3 v4v4 v5v5 Tree BDD – extend by analogy to tree automata, for fixed trees 0 1 1 0 10 v1v1 v2v2 v3v3 v4v4 v5v5 v6v6

Extending the analogy BMDs – word encodes a monomial term 01010 v1v1 v2v2 v3v3 v4v4 v5v5 = v 2 v 4 Useful for binary arithmetic, though limited use for SMC Encode cubes with words –ZDD representation of prime implicants Extension to unbounded/infinite words and trees –Regualar model checking, QDDs, etc. –Breaks paradigm – requires acceleration or widening The right analogy often provides novel generalizations, as well as a unifying view.

Space/Time tradeoffs More compact representations typically require –Greater overhead to reduce to canonical form –Greater difficulty in optimizing representation parameters (e.g., order) BDDs seem to be a sweet spot –Substantial space reduction –Fast reduction to canonical form –Moderate cost to find a good (not best) variable order For this reason, surprisingly, BDDs have remained the representation of choice for finite-state SMC over nearly two decades.

Beyond Decision Diagrams Quantifier elimination using SAT solvers Iterative approach –Find a set of satisfying assignments to free variables –Block that set –Repeat until unsatisfiable Different possible representations –Cubes –Circuit cofactors This approach avoids the difficulty of large intermediate BDDs. For technical reasons, works well only for reverse image. 9 V. (P Æ T 1 Æ T 2... Æ T n )

Note low correlation between the two methods. SAT based method may be a good alternative when BDDs fail. Comparison with BDDs This is typical of algorithms for intractable problems. It opens up another class of research problems – how to efficient combine methods when no one method dominates.

New applications Timed automata –KRONOS, COSPAN, UPPAAL Matrix problems –Probabilistic verification [PRISM] –Worst-case power estimation Parameterized/infinite state systems –Regular model checking -- Real/rational variables –QDDs –Invisible invariants Each new application of the paradigm opens many new research problems. How are we to apply the paradigm in any given case?

DDs and timed automata There are many ways in which DDs might be applied to timed automata: binary timer value 01011010 DBMs at leaves 01011010 t i · t j According to Kuhn, much of normal science is devoted to such puzzles: how to apply a paradigm in a given situation.

Abstraction Crisis: SMC approach fails to scale to large designs Major critique of symbolic model checking –Computing exact fixed points is too eager for many applications –Weaker approximations may be sufficient to prove property May still be appropriate to compute exact fixed points in abstract models chosen in advance –localization abstraction –predicate abstraction –compositional approaches

CEGAR loop Model check abstraction T # Choose initial T # Can extend Cex from T # to T? Refine T # true, done Cex yes, Cex no SMC SMC is typically used in the CEGAR loop, but we no longer view finding the right symbolic representation as the key to scalability.

Interpolants Interpolant-based model checking is SMC, but breaks paradigm –No canonical or reduced representation –No exact fixed point computation Answers the critique that exact image computation is too strong –Here we see the paradigm breaking down in response to a crisis P F TTTTTTT AB t=0 t=k A'

End of a paradigm? New research in symbolic model checking continues, but most breaks the paradigm in some way –SAT-based image computations –Interpolation –Assume/guarantee via machine learning –Infinite-state/probabilistic/etc. SMC is primarily a tool now –Most hardware verification tools apply it in some form –Software model checkers use it SLAM, BLAST, SATABS, FSOFT –Variety of other tools KRONOS, PRISM, TLV

Ballistic trajectory Cadence SMV Downloads Development ended

The new paradigms BMC, clearly –responds to a crisis, solves some problems, leaves many open! CEGAR Hybridization Perhaps the most important paradigm in model checking today is the combination of tools from many disciplines. Few tools today apply just one algorithm or technique. SMC has become primarily a component in more complex hybrid schemes.

Persistent ideas Early quantification Canonical acceptors –Mona Conditional independence

Lazy abstraction and interpolants Lazy abstraction [Henzinger et al., 02] –Refines predicate abstraction locally, as needed –Avoids "big loop" in CEGAR –Avoids computing unnecessary state information Interpolation-based model checking [McMillan, 03] –Avoids expense of image computation –Derives image approximations from refutations of bounded unfoldings In this talk, we will see how to use interpolants as an alternative to predicate abstraction in the lazy abstraction paradigm for softwrae model checking. This avoids the expense of image computation in predicate abstraction, resulting in a large performance improvement.

An example do{ lock(); old = new; if(*){ unlock; new++; } } while (new != old); program fragment L=0 L=1; old=new [L!=0] L=0; new++ [new==old] [new!=old] control-flow graph

1 L=0 T 2 [L!=0] T Unwinding the CFG L=0 L=1; old=new [L!=0] L=0; new++ [new==old] [new!=old] control-flow graph 0 T F L=0 Label error state with false, by refining labels on path

6 [L!=0] T 5 [new!=old] T 4 L=0; new++ T 3 L=1; old=new T Unwinding the CFG L=0 L=1; old=new [L!=0] L=0; new++ [new==old] [new!=old] control-flow graph 0 12 L=0 [L!=0] F L=0 F T Cutoff: state 5 is subsumed by state 1.

T 11 [L!=0] T 10 [new!=old] T 8 T Unwinding the CFG L=0 L=1; old=new [L!=0] L=0; new++ [new==old] [new!=old] control-flow graph 0 12 3 4 5 L=0 L=1; old=new [L!=0] L=0; new++ [new!=old] F L=0 6 [L!=0] F L=0 7 [new==old] T old=new F F T Another cutoff. Unwinding is now complete. 9 T

Comparisons Compared to CEGAR... –Refinements are local –Do not restart model checking after each refinement –More refinements required Compared to lazy predicate abstraction [Henzinger et al. 02]... –Extremely lazy. –Does not require predicate image or "post" computation avoid exponential number of decision procedure calls avoid additional refinement of image approximation Compared to interpolation-based model checking [McMillan 03]... –Exploits sequential control-flow structure –Prover is not applied to full program unwinding.

Interpolation Lemma Notation: L ( ) is the set of FO formulas over the symbols of If A B = false, there exists an interpolant A' for (A,B) such that: A A' A' B = false A' 2 L (A) Å L (B) Example: –A = p q, B = q r, A' = q Interpolants from proofs –in certain quantifier-free theories, we can obtain an interpolant for a pair A,B from a refutation in linear time. [McMillan 05] –in particular, we can have linear arithmetic,uninterpreted functions, and restricted use of arrays (Craig,57)

Interpolants for sequences Let A 1...A n be a sequence of formulas A sequence A 0...A n is an interpolant for A 1...A n when –A 0 = True –A i -1 Æ A i ) A i, for i = 1..n –A n = False –and finally, A i 2 L (A 1...A i ) Å L (A i+1...A n ) A1A1 A2A2 A3A3 AkAk... A' 1 A' 2 A' 3 A' k-1... TrueFalse )))) In other words, the interpolant is a structured refutation of A 1...A n

Path refinements are interpolants x=i,y=j [x!=0] x--, y-- [x==0] [i==j] [y!=0] L 1 = 0 L 2 =1 new 1 =old 0 new 1 old 0 False True new 1 =old 0 ) ) ) 1. Each formula implies the next 2. Each is over common symbols of prefix and suffix 3. Begins with true, ends with false Path refinement procedure SSA sequence Prover Interpolation Path Refinement proof structured proof

Unwinding the CFG An unwinding is a tree with an embedding in the CFG L=0 L=1; old=new [L!=0] L=0; new++ [new==old] [new!=old] 8 0 12 3 4 L=0 L=1; old=new [L!=0] L=0; new++ MvMv MeMe

Expansion Every non-leaf vertex of the unwinding must be fully expanded... L=0 0 1 MvMv MeMe If this is not a leaf......and this exists......then this exists....but we allow unexpanded leaves (i.e., we are building a finite prefix of the infinite unwinding)

Labeled unwinding A labeled unwinding is equiped with... –a lableing function : V ! L (S) –a covering relation B µ V £ V 0 12 3 4 5 L=0 L=1; old=new [L!=0] L=0; new++ [new!=old] 6 [L!=0] 7 [new==old] T F L=0 F T T These two nodes are covered. (have a ancestor at the tail of a covering arc)...

Well-labeled unwinding An unwinding is well-labeled when... – ( ) = True –every edge is a valid Hoare triple –if x B y then y not covered 0 12 3 4 5 L=0 L=1; old=new [L!=0] L=0; new++ [new!=old] 6 [L!=0] 7 [new==old] T F L=0 F T T

Safe and complete An unwinding is –safe if every error vertex is labeled False –complete if every nonterminal leaf is covered T 10 [L!=0] T 9 [new!=old] T 8 T 0 12 3 4 5 L=0 L=1; old=new [L!=0] L=0; new++ [new!=old] F L=0 6 [L!=0] F L=0 7 [new==old] T old=new F F T... Theorem: A CFG with a safe complete unwinding is safe. 9 T

Unwinding steps Three basic operations: –Expand a nonterminal leaf –Cover: add a covering arc –Refine: strengthen labels along a path so error vertex labeled False

Covering step If (x) ) (y)... –add covering arc x B y –remove all z B w for w descendant of y x · y x=y X We restict covers to be descending in a suitable total order on vertices. This prevents covering from diverging.

Refinement step Label an error vertex False by refining the path to that vertex with an interpolant for that path. By refining with interpolants, we avoid predicate image computation. T T T T T T T x = 0 [x=y] [x y] y++ [y=0] y=2 x=0 y=0 y 0 F X Refinement may remove cutoffs

Forced cutoff Try to refine a sub-path to force a cutoff –show that path from nearest common ancestor of x,y proves (x) at y T T T T T T T x = 0 [x=y] [x y] y++ [y=0] y=2 x=0 y=0 y 0 F refine this path y 0 Forced cutoffs allow us to efficiently handle nested control structure

Overall algorithm 1.Do as much covering as possible 2.If a leaf can't be covered, try forced covering 3.If the leaf still can't be covered, expand it 4.Label all error states False by refining with an interpolant 5.Continue until unwinding is safe and complete

Experiments Windows decive driver benchmarks from BLAST benchmark suite –programs flattened to "simple goto programs" Compare performance against BLAST, a lazy predicate abstraction tool namesource LOC SGP LOC BLAST (s) IMPACT (s) BLAST IMPACT kbfiltr12K2.3K26.33.158.3 diskperf14K3.9K10220.05.1 cdaudio44K6.3K31019.116.2 floppy18K8.7K45517.825.6 parclass138K8.8K551126.2210 parport61K13K808437.1224 Almost all BLAST time spent in predicate image operation.

The Saga Continues After these results, Ranjit Jhala modified BLAST –vertices inherit predicates from their parents, reducing refinements –fewer refinements allows more predicate localization Impact also made more eager, using some static analysis namesource LOC SGP LOC BLAST (s) IMPACT (s) BLAST IMPACT kbfiltr12K2.3K11.90.3534 diskperf14K3.9K1172.3749 cdaudio44K6.3K2021.51134 floppy18K8.7K1644.0941 parclass138K8.8K4633.84121 parport61K13K3246.4750

Conclusions Caveats –Comparing different implementations is dangerous –More and better software model checking benchmarks are needed Tentative conclusion –For control-dominated codes, predicate abstraction is too "eager" By lazy abstraction with interpolants, we can –Avoid the expense of the abstract "post" operator –Avoid re-running the model checker with each refinement –Avoid applying decision procedure to full program unwindings Result is an efficient procedure for checking control-dominated software –Three orders of magnitude speedup in lazy model checking in 6 months!

Future work Procedure summaries –Many similar subgraphs in unwinding due to procedure expansions –Cannot handle recursion –Can we use interpolants to compute approximate procedure summaries? Quantified interpolants –Can be used to generate program invariants with quantifiers –Works for simple examples, but need to prevent number of quantifiers from increasing without bound Richer theories –In this work, all program variables modeled by integers –Need an interpolating prover for bit vector theory Concurrency...

The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Similar presentations

Presentation on theme: "The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Similar presentations

Presentation on theme: "The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs."— Presentation transcript:

Similar presentations

About project

Feedback