Partially Disjunctive Heap Abstraction Roman Manevich Mooly Sagiv Tel Aviv University G. Ramalingam John Field IBM T.J. Watson
Motivation Analysis of Object Oriented programs is hard Recursive data structures Unbounded number of objects Destructive update of references Scalable heap analyses exist e.g., flow-insensitive Not precise enough for verification Precise heap analyses exist e.g., SRW shape analysis Scaling is very challenging
Motivating example: verifying mark phase of GC // @Ensures marked == REACH(root) void mark(Node root, NodeSet marked) { Node x; if (root != null) { NodeSet pending = new NodeSet(); pending.add(root); marked.clear(); while (!pending.isEmpty()) { x = pending.selectAndRemove(); marked.add(x); if (x.left != null) if (!marked.contains(x.left)) pending.add(x.left); if (x.right != null) if (!marked.contains(x.right) pending.add(x.right); } This is a reachability-based algorithm
Motivating example: verifying mark phase of GC // @Ensures marked == REACH(root) void mark(Node root, NodeSet marked) { Node x; if (root != null) { NodeSet pending = new NodeSet(); pending.add(root); marked.clear(); while (!pending.isEmpty()) { x = pending.selectAndRemove(); marked.add(x); if (x.left != null) if (!marked.contains(x.left)) pending.add(x.left); if (x.right != null) if (!marked.contains(x.right) pending.add(x.right); }
Motivating example: verifying mark phase of GC // @Ensures marked == REACH(root) void mark(Node root, NodeSet marked) { Node x; if (root != null) { NodeSet pending = new NodeSet(); pending.add(root); marked.clear(); while (!pending.isEmpty()) { x = pending.selectAndRemove(); marked.add(x); if (x.left != null) if (!marked.contains(x.left)) pending.add(x.left); if (x.right != null) if (!marked.contains(x.right) pending.add(x.right); }
Motivating example: verifying mark phase of GC root u6 x left u1 u5 left left right u2 pending = {root} marked = {} right left u3 right u4
Motivating example: verifying mark phase of GC root u6 x left u1 u5 left left right u2 pending = {u3,u2} marked = {u1} right left u3 right u4
Motivating example: verifying mark phase of GC root u6 left u1 u5 left left right u2 pending = {u4,u2} marked = {u1,u3} right left x u3 right u4
Motivating example: verifying mark phase of GC root u6 left u1 u5 left left right u2 pending = {u2} marked = {u1,u3,u4} right left u3 x right u4
Motivating example: verifying mark phase of GC root u6 left x u1 u5 left left right u2 pending = {} marked = {u1,u3,u4,u2} right left u3 right u4
Motivating example: verifying mark phase of GC root u6 left x u1 u5 left left right u2 pending = {} marked = {u1,u3,u4,u2} right left u3 DONE right u4
Motivating example: verifying mark phase of GC root u6 garbage garbage left x u1 u5 left left right u2 pending = {} marked = {u1,u3,u4,u2} right left u3 right u4
Motivating example: verifying mark phase of GC root x u1 left u2 pending = {} marked = {u1,u3,u4,u2} right left u3 right u4
Motivating example: verifying mark phase of GC Powerset heap abstraction 584 seconds, 189,772 abstract heaps Definitely too expensive Can we verify more efficiently? Partially disjunctive heap abstraction 3 seconds, 1,133 abstract heaps TVLA system The same phenomena also happens for many other examples
Overview and main results New (parametric) heap abstraction Uses a heap similarity criterion Merges “similar” heaps Robust implementation Abstraction of choice among TVLA users Suitable for other shape analysis systems Empirical results Significant speedups (2 orders of magnitude) Precise in most cases
Talk outline Shape analysis background Representing heaps via logical structures Disjunctive (powerset) heap abstraction Partially disjunctive heap abstraction Via universe congruence similarity Empirical results Related work Future work Conclusions
Shape analysis via First-Order logic SRW 2002 : Parametric shape analysis via 3-valued logic Concrete heaps represented by 2-valued structures over predicate symbols P A set of individuals (nodes) U Interpretation of predicate symbols in P p0() {0,1} p1(v) {0,1} p2(u,v) {0,1}
Concrete heap unary predicates x root set[marked] set[pending] r[root] left x root set[marked] set[pending] r[root] r[root] set[marked] left left right r[root] set[marked] right left r[root] set[marked] binary predicates x right left right r[root] set[marked]
3-valued structures 2-valued structures abstracted into 3-valued structures by merging individuals p0() {0,1,1/2} p1(v) {0,1,1/2} p2(u,v) {0,1,1/2} Kleene’s partially ordered set of logical values: 0 1 = 1/2 1/2 1
Canonical abstraction Merge individuals with same values for all unary predicates (canonical name) Bounded structure with at most 2|A| individuals A = set of unary predicates In general A is a subset of the unary predicates.
Canonical abstraction root left A = r[root] set[marked] x(v) root(v) set[marked](v) set[pending](v) r[root](v) left left right r[root] set[marked] right left r[root] set[marked] x right r[root] set[marked]
Canonical abstraction root left r[root] set[marked] left left right r[root] set[marked] right x=0,root=0,r[root]=1, set[marked]=1,set[pending]=0 left r[root] set[marked] x right r[root] set[marked]
Canonical abstraction root left r[root] set[marked] left left right r[root] set[marked] right x=0,root=0,r[root]=1, set[marked]=1,set[pending]=0 x=0,root=0,r[root]=1, set[marked]=1,set[pending]=0 left r[root] set[marked] x right r[root] set[marked]
Canonical abstraction root left r[root] set[marked] x=0,root=0,r[root]=0, set[marked]=0,set[pending]=0 left left right r[root] set[marked] right x=0,root=0,r[root]=1, set[marked]=1,set[pending]=0 x=0,root=0,r[root]=1, set[marked]=1,set[pending]=0 left r[root] set[marked] x right r[root] set[marked]
Canonical abstraction root left r[root] set[marked] x=0,root=0,r[root]=0, set[marked]=0,set[pending]=0 x=0,root=0,r[root]=0, set[marked]=0,set[pending]=0 left left right r[root] set[marked] right x=0,root=0,r[root]=1, set[marked]=1,set[pending]=0 x=0,root=0,r[root]=1, set[marked]=1,set[pending]=0 left r[root] set[marked] x right r[root] set[marked]
Canonical abstraction root left r[root] set[marked] left left right r[root] set[marked] right left r[root] set[marked] x right r[root] set[marked]
Bounded number of individuals Abstract heap Bounded number of individuals root left r[root] set[marked] left left right right r[root] set[marked] Retained definite values of unary predicates x left right r[root] set[marked]
Powerset heap abstraction = canonical abstraction pow(X) = {(s) | s X} LUB (join) is set union Worst-case is doubly-exponential in |A| Can make unnecessary distinctions
Partially disjunctive heap abstraction Use a heap-similarity criterion We defined similarity by universe congruence Merge similar heaps Avoid merging dissimilar heaps We define a particular similarity criterion and as a result get a particular kind of a partially disjunctive abstraction.
Universe congruent heaps root root left left r[root] set[marked] r[root] set[marked] x left left left right left r[root] set[marked] right r[root] set[marked] right these structures are not merged by powerset abstraction but they are merged by the partially disjunctive abstraction x right left left right r[root] set[marked] r[root] set[marked] right
Result of merge root left r[root] set[marked] x left left right left right left right left r[root] set[marked] left right
Non-congruent heaps – no merge root root left left r[root] set[marked] r[root] set[marked] x left left left right left r[root] set[marked] right r[root] set[marked] right x right left left right r[root] set[pending] r[root] set[marked] right
Definition of partially disjunctive heap abstraction Two heaps are similar iff they are universe congruent (same canonical names) piC = merge universe congruent heaps pi(X) = {piC | C pow(X)}
Characteristics of the partially disjunctive heap abstraction 3-valued structures partially-ordered No LUB over singleton structure sets if S1 pi S2 pi({S1,S2}) = pi{S1,S2} else pow({S1,S2}) = {S1,S2} Retain definite values of unary predicates Size of set can be reduced exponentially A “single” LUB exists for partially-isomorphic structures.
Running times
Space consumption
Related work Reducing cost of powerset-based analysis Function space domain construction ESP [PLDI 02] Deutsch [PLDI 94] Widening operators [Bagnara et el. VMCAI03]
Future work Experiment with other similarity criteria Structures with different universes Deflating operators Widening operators
Conclusions A new (parametric) heap abstraction Partially disjunctive Merges similar abstract heap descriptors Significantly more efficient than full powerset Essential for many TVLA analyses Often no loss of precision in practice
The End
Parametric partial isomorphism Structures S1=U1,I1 and S2=U2,I2 Isomorphic iff: Exists bijection f : U1U2 Preserves all predicate values Partially-isomorphic relative to R iff: Preserves values of relational predicates A R P
No LUB over singletons p=0 q=1 z=0 p=1 q=1 z=1/2 p=1 q=0 z=1 A p=1 q=0 C is an upper bound D is an upper bound p=1/2 q=1 z=1/2 p=1 q=0 z=1/2 p=1 q=1/2 z=1/2 p=0 q=1 z=1/2 incomparable