Swerve: Semester in Review
Topics Symbolic pointer analysis Model checking –C programs –Abstract counterexamples Symbolic simulation and execution Cousot: the Galois connection
Pointer Analysis (in 2001)
P.A. Terminology Context-sensitivity: do we take calling context into account –Doing so leads to very precise but very non- polynomial algorithms Flow-sensitivity: sensitive to control flow –Equality = unification-based = Steensgaard Almost linear, but not very precise –Subset = inclusion-based = Anderson Polynomial but more precise –Sensitive analyses even more expensive
P.A.: Flow-sensitivity
P.A.: Problem Formulation Phase one: find constraints in the code –Depends on sensitivities (context, flow) –Examine stores, loads, etc. Phase two: solve system of constraints for the complete points-to relation –Explicit: Steensgaard using union-find –Implicit: Anderson-style using BDDs
Pointer Analysis Example h 1 : v 1 = new Object(); h 2 : v 2 = new Object(); v 1.f = v 2 ; v 1.f = v 2 ; v 3 = v 1.f; v 3 = v 1.f; Input Relations vPointsTo(v 1,h 1 ) vPointsTo(v 2,h 2 ) Store(v 1,f,v 2 ) Load(v 1,f,v 3 ) Output Relations hPointsTo(h 1,f,h 2 ) vPointsTo(v 3,h 2 ) v1v1 h1h1 v2v2 h2h2 f v3v3
Zhu: Symbolic P.A. Points-to relation can be huge, but BDDs are great at implicitly representing relations
Berndl et al: Symbolic P.A. Subset-based formulation using BDDs Variable ordering experiments –Sets of heap objects (“pointed to”) tend to be large and regular: putting them together at the end of the ordering helps –Interleaving the bits for sets of variables (“pointers”) helps a little –In general, important to partition the bits of the different sets in the relations
Whaley & Lam: Datalog, bddbddb All these symbolic pointer analyses are devoting a lot of implementation time to get the BDD part correct and fast Datalog: a declarative language for expressing (possibly recursive) relations bddbddb: a tool to convert Datalog operations (join, project, rename, recursion) into BDD operations Points-to analyses can now be described much more concisely in Datalog
hPointsTo(h 1, f, h 2 ) :- Store(v 1, f, v 2 ), vPointsTo(v 1, h 1 ), vPointsTo(v 2, h 2 ). v1v1 h1h1 v2v2 h2h2 f Inference Rule in Datalog v 1.f = v 2 ; Stores:
Whaley & Lam: With Context Context sensitive analysis by cloning methods and doing a context insensitive analysis on the new call graph Can use Datalog to express constraints necessary to determine the call graph Cloned call graph is exponentially bigger, but clever encoding lets BDDs handle it well
CBMC: Prototype Tool ANSI-C Model VHDL /Verilog Product convert + * = + * = Parsing and type checking BV Logic (Tree) + * = BV Logic Decision Problem Equivalence reduced to bit vector logic decision problem Tool requires decision procedure for large bit vector problems BV problems are HUGE – directly passed to Chaff in CNF CNF Chaff
Example
Explaining Counterexamples Counterexamples provided by model checkers are often difficult to understand and locate within the code Previous work: find a concrete execution “close to” the counterexample by some distance metric This work: find an abstract execution— provides more meaningful explanations
Distance Metric Execution = (state, action) sequence –State = (control location, predicate) Metric: compare two executions a and b –Don’t just compare a i to b i since small changes in control flow can yield “misalignment” –Distance is defined as the number of changes (in predicates and actions) to convert a to b
Symbolic Execution
Quasi-symbolic simulation Symbolic simulation externally scalar values internally –simulation run requires constant memory. Key ideas –Don’t compute exact value unless necessary. many don’t cares in large designs. –Trade time for memory. Multiple runs to generate exact values. Reliability of directed testing with efficiency closer to that of symbolic methods
Don’t care logic Basic Algorithm & & & & X a a X b b X c c Symbolic variable X -a X a a 0 Obeys law of excluded middle! X Conservative approximation X X X “traditional” X value 0 Don’t care variables
Decision Procedure X ? a=0 a=1 Variable selection heuristic: pick relevant variable by propagating from inputs. & & O X a a X b b X X X X b b X b b 0 ? 0 Test is Unsatisfiable!
BDDs with Approximate Values Generic Approximate BDD apply algorithm. Approx_Apply(F,G) find top variable V compute L=left(F,G), R=right(F,G) if node(V,L,R) exists, return it else if (want_exact(V,L,R)) create node (V,L,R) return node else /* approximate */ return X
Classification Algorithm Simulator’s classification –Care –Don’t Care Algorithm –Initially, all variables are Don’t Care. –Simulate using sub-domain values only. –Re-classify 1 variable as Care. –Repeat until sufficient variables classified.
Review What we’ve done: –Symbolic pointer analysis –Symbolic simulations and executions –Model checking C programs Abstract explanations Where do we go from here?