Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv
Motivation TVLA is a powerful and general abstract interpretation system Abstract interpretation in TVLA Operational semantics is expressed with first-order logic + TC formulae Program states are represented as sets of Evolving First-Order Structures Efficiency is an issue
Outline Shape Analysis quick intro Compactly representing structures Tuning abstraction to improve performance
What is Shape Analysis Determines Shape Invariants for imperative programs Can be used to verify a wide range of properties over different programming languages
reverse Example /* list.h */ typedef struct node { struct node * n; int data; } * List; /* print.c */ #include “list.h” List reverse (List x) { List y, t; y = NULL; while (x != NULL) { t = y; y = x; x = x n; y n = t; } return y; }
reverse Example ynn... Shape before Shape after xnn...
Definition of a First-Order Logical Structure S = U – a set of individuals (“node set”) – a mapping p (r) (U r {0,1}) the “interpretation” of p
1: True 0: False 1/2: Unknown A join semi-lattice: 0 1 = 1/2 Three-Valued Logic 1/2 Information order
Canonical Abstraction Partition the individuals into equivalence classes based on the values of their unary predicates Collapse other predicates via p S (u ’ 1,..., u ’ k ) = {p B (u 1,..., u k ) | f(u 1 )=u ’ 1,..., f(u ’ k )=u ’ k ) } At most 3 n abstract individuals
Canonical Abstraction Example u 0 r[n,x] u 1 r[n,x] n x u 2 r[n,x] n u 3 r[n,x] n u 0 r[n,x] u r[n,x] n n x
Compactly Representing First-Order Logical Structures Space is a major bottleneck Analysis explores many logical structures Reduce space by sharing information across structures
Desired Properties Sparse data structures Share common sub-structures Inherited sharing Incidental sharing due to program invariants But feasible time performance Phase sensitive data structures
Chapter Outline Background First-order structure representations Base representation (TVLA 0.91) BDD representation Empirical evaluation Conclusion
First-Order Logical Structures Generalize shape graphs Arbitrary set of individuals Arbitrary set of predicates on individuals Dynamically evolving Usually small changes Properties are extracted by evaluating first order formula: ∃ v 1, v: x(v 1 ) ∧ n(v 1, v) Join operator requires isomorphism testing
First-Order Structure ADT Structure : new() /* empty structure */ SetOfNodes : nodeSet(Structure) Node : newNode(Structure) removeNode(Structure, node) Kleene eval(Structure, p (r), ) update(Structure, p (r),, Kleene) Structure copy(Structure)
print_all Example /* list.h */ typedef struct node { struct node * n; int data; } * L; /* print.c */ #include “list.h” void print_all(L y) { L x; x = y; while (x != NULL) { /* assert(x != NULL) */ printf(“elem=%d”, x data); x = x n; } }
print_all Example S0S0 copy(S 0 ) : S 1 x = y x’(v) := y(v) nodeset(S 0 ) : {u 1, u} eval(S 0, y, u 1 ) : 1 update(S 1, x, u 1, 1) eval(S 0, y, u) : 0 update(S 1, x, u, 0) u 1 y=1 u sm=½ n=½ S1S1 u 1 y=1 u sm=½ n=½ x=1
print_all Example x = x n focus : ∃ v 1 x(v 1 ) ∧ n(v 1, v) x’(v) := ∃ v 1 x(v 1 ) ∧ n(v 1, v) S 2.0 u 1 y=1 u sm=½ n=½ S 2.1 u 1 y=1 u x=1 n=1 n=½ S 2.2 u 1 y=1 u.1 x=1 n=1 n=½ S1S1 u 1 x=1 y=1 u sm=½ n=½ u.0 sm=½ while (x != NULL) precondition : ∃ v x(v)
Overview and Main Results 1. Two novel representations of first-order structures New BDD representation New representation using functional maps 2. Implementation techniques 3. Empirical evaluation Comparison of different representations Space is reduced by a factor of 4 – 10 New representations scale better
Base Representation (Tal Lev-Ami SAS 2000) Two-Level Map : Predicate (Node Tuple Kleene) Sparse Representation Limited inherited sharing by “ Copy-On-Write ”
fx3x3 x2x2 x1x x3x3 x3x3 x3x3 x3x3 x2x2 x2x2 x1x BDDs in a Nutshell (Bryant 86) Ordered Binary Decision Diagrams Data structure for Boolean functions Functions are represented as (unique) DAGs
x3x3 x3x3 x3x3 x3x3 x2x2 x2x2 x1x1 01 x3x3 x3x3 x2x2 x2x2 x1x1 01 x3x3 x2x2 x1x1 01 Duplicate TerminalsDuplicate NonterminalsRedundant Tests BDDs in a Nutshell (Bryant 86) Ordered Binary Decision Diagrams Data structure for Boolean functions Functions are represented as (unique) DAGs Also achieve sharing across functions
Encoding Structures Using Integers Static encoding of Predicates Kleene values Dynamic encoding of nodes 0, 1, …, n-1 Encode predicate p ’ s values as e p (p).e n (u 1 ). e n (u 2 ). …. e n (u n ). e k (Kleene)
BDD Representation of Integer Sets Characteristic function S={1,5} 1= 5= S = ( ¬ x 1 ¬ x 2 x 3 ) (x 1 ¬ x 2 x 3 ) 10 x2x2 x1x1 x3x3 x2x2
BDD Representation of Integer Sets Characteristic function S={1,5} 1= 5= S = ( ¬ x 1 ¬ x 2 x 3 ) (x 1 ¬ x 2 x 3 ) 1 x2x2 x1x1 x3x3 x2x2
1 S0S0 BDD Representation Example S0S0 u 1 y=1 u sm=½ n=½
1 S0S0 S1S1 BDD Representation Example x=y S1S1 u 1 x=1 y=1 u sm=½ n=½ S0S0 u 1 y=1 u sm=½ n=½
1 S0S0 S1S1 S 2.2 BDD Representation Example x=y x=x n S 2.2 u 1 y=1 u.1 x=1 n=1 n=½ u.0 sm=½ S1S1 u 1 x=1 y=1 u sm=½ n=½ S0S0 u 1 y=1 u sm=½ n=½
1 S0S0 S1S1 S 2.2 BDD Representation Example x=y x=x n S 2.2 u 1 y=1 u.1 x=1 n=1 n=½ u.0 sm=½ S1S1 u 1 x=1 y=1 u sm=½ n=½ S0S0 u 1 y=1 u sm=½ n=½
Improved BDD Representation Using this representation directly doesn ’ t save space – canonicity doesn ’ t carry over from propositional to first-order logic Observation Node names can be arbitrarily remapped without affecting the ADT semantics Our heuristics Use canonic node names to encode nodes and obtain a canonic representation Increases incidental sharing Reduces isomorphism test to pointer comparison 4-10 space reduction
Reducing Time Overhead Current implementation not optimized Expensive formula evaluation Hybrid representation Distinguish between phases: mutable phase Join immutable phase Dynamically switch representations
Functional Representation Alternative representation for first-order structures Structures represented by maps from integers to Kleene values Tailored for representing first-order structures Achieves better results than BDDs Techniques similar to the BDD representation More details in the thesis
Introduction to Functional Maps A mapping N {0,½,1} ½ 3 Nodes contain a fixed number of values Hierarchical maps
Introduction to Functional Maps Sparse maps ½ size = ½ size = 27
Introduction to Functional Maps Share unique sub-maps ½ size = ½ size = 27
Introduction to Functional Maps Share unique sub-maps ½ size = 9 size = 27
Functional Representation Example yxsm 100 yx 00½ n ½ size=9 size=27 S0S0 binaryunarynullary u 1 y=1 u sm=½ n=½
Functional Representation Example yxsm 100 yx 00½ yx 110 n ½ size=9 size=27 S0S0 binaryunarynullary S1S1 binaryunarynullary u 1 y=1 u sm=½ n=½ u 1 x=1 y=1 u sm=½ n=½
Functional Representation Example yxsm 100 yx 00½ yx 010 yx 110 n ½ n 1 size=9 size=27 size=81 S0S0 binaryunarynullary S 2.2 binaryunarynullary S1S1 binaryunarynullary u 1 y=1 u.1 x=1 n=1 n=½ u.0 sm=½ u 1 y=1 u sm=½ n=½ u 1 x=1 y=1 u sm=½ n=½
Reducing Time Overhead “ Lazy ” normalization is used to balance time/space performance
Empirical Evaluation Benchmarks: Cleanness Analysis (SAS 2000) Garbage Collector CMP (PLDI 2002) of Java Front-End and Kernel Benchmarks Mobile Ambients (ESOP 2000) Stress testing the representations We use “ relational analysis ” Save structures in every CFG location
Space Results
Abstract Counters Ignore language/implementation details A more reliable measurement technique Count only crucial space information Independent of C/Java
Abstract Counters Results
Trends in the Cleanness Analysis Benchmark
Conclusions Two novel representations of first-order structures New BDD representation New representation using functional maps Implementation techniques Substantially better than inherited sharing Structure canonization is crucial Normalization via hash-consing is the key technique
Conclusions The use of BDDs for static analysis is not a panacea for space saving Domain-specific encoding crucial for saving space Failed attempts Original implementation of Veith ’ s encoding PAG
Tuning Abstraction for Improved Performance Analysis can be very costly Explores many structures GC example explores >180,000 structures
Existing Analysis Modes Relational analysis Doubly-exponential in worst case Our most precise method Single-structure analysis (Tal Lev-Ami SAS 2000) Singly-exponential in worst case Can be very efficient Can be very imprecise Sometimes very inefficient
Single-Structure Analysis u1u1 x u n u1u1 x u1u1 x u n S1S1 S0S0 S 0 S 1 May exist
Single-Structure Analysis Active property ac=0 doesn ’ t exist in every concrete structure ac=1 exists in every concrete structure ac=1/2 may exist in some concrete structure u 1 ac=1 x u ac=1 n u 1 ac=1 x x u ac=1/2 n S1S1 S0S0 S 0 S 1
Single-Structure Analysis Sometimes overly imprecise Refine analysis by using nullary predicates to distinguish between different structures
Is there a “ sweet spot ” ? Relational Analysis Efficiency Precision
Chapter Outline Removing embedded structures Merging structures with same set of canonical names Staged analysis to localize abstraction Merging pseudo-embedded structures
Order Relations on Structures and Sets of Structures S, S ’ 3-STRUCT S ƒ S ’ if for every predicate p 1. p s (u 1, …,u k ) p s ’ ( ƒ (u 1 ), …, ƒ (u k ) ) 2. ( { u | ƒ (u)=u ’ } > 1) sm s ’ (u ’ ) X, X ’ 2 3-STRUCT X X ’ Every S X has S ’ X ’ and S S ’
Compacting Transformations We look for transformation T: 2 3-STRUCT 2 3-STRUCT with the following properties: 1. Compacting – |T(x)| |x| 2. Conservative – T(x) x Without sacrificing precision
Removing Embedded Structures u 2 r[n,t] r[n,y] u 1 r[n,t] r[n,y] n y t u 0 r[n,x] x S0S0 u 2 r[n,t] r[n,y] n y t u 0 r[n,x] x S1S1 u 1 r[n,t] r[n,y] n ƒ ƒ ƒ
Removing Embedded Structures u 2 r[n,t] r[n,y] u 1 r[n,t] r[n,y] n y t u 0 r[n,x] x S0S0 u 2 r[n,t] r[n,y] n y t u 0 r[n,x] x S1S1 u 1 r[n,t] r[n,y] n Reversing a list with exactly 3 cells Reversing a list with at least 3 cells
Detecting Embedding is hard In general, as hard as GRAPH ISOMORPHISM Conditions for a unique mapping: Canonical abstraction Definite values Polynomial time check
Results (#structures explored)
Canonical Names Method Canonical abstraction merges individuals with same canonical names (unary abstraction predicate values) Merge structures with same set of canonical names Both transformations preserve “ definity ” of abstraction predicates But ignores precision of non-abstraction predicates
Canonical Abstraction Example u 0 r[n,x] u 1 r[n,x] n x u 2 r[n,x] n u 3 r[n,x] n u 0 r[n,x] u r[n,x] n n x
Merging Structures with Same Canonical Names Example u 0 r[n,x] u r[n,x] n n x u 0 r[n,x] u r[n,x] n x u 0 r[n,x] u r[n,x] n n x S1S1 S0S0 S 0 S 1
Merging Structures with Same Canonical Names Example u0u0 u n x S1S1 S0S0 S 0 S 1 u0u0 ux u0u0 u n x
Results (#structures explored)
Localizing Abstraction Find an appropriate subset of abstraction predicates for every CFG node Observation: programs contain dead variables – exploit to make corresponding predicates “ dead ” Compute “ predicate liveness ” to determine subset of abstraction predicates
reverse Example List reverse (List x) { L0: List y, t; L1: y = NULL; L2: while (x != NULL) { L3: t = y; L4: y = x; L5: x = x n; L6: y n = t; } L7: return y; } y dead t dead all dead
Results (#structures explored)
Compaction via Pseudo-Embedding Pseudo-Embedding – similar to embedding with respect to abs. predicates S, S ’ 3-STRUCT S ’ ƒ S ’ if for every abstract predicate p 1. p s (u) p s ’ ( ƒ (u ) ) 2. ( { u | ƒ (u)=u ’ } > 1) sm s ’ (u ’ )
Modified blur Order relation on nodes: u 1 u 2 if for every abstraction predicate p p s (u 1 ) p s ’ (u 2 ) blur ’ merges u 1 with u 2 if u 1 u 2
blur ’ Example u 0 r[n,x] u r[n,x] n x n x blur’
Merging Pseudo-Embedded Structures Example u 0 r[n,x] u r[n,y] r[n,x] n x S1S1 S0S0 S 0 S 1 x y n y u r[n,y] r[n,x] x y n u r[n,y] =1/2 r[n,x] Abstraction predicates = {x,y} Non-abstraction predicates = {r[n,x], r[n,y], n}
Results (#structures explored)
Empirical Evaluation Benchmarks: Garbage Collector Mobile Ambients (ESOP 2000) Sorting procedures (ISSTA 2000) MA + J2 : completed without instrumentation predicates and without messages
Results (#structures explored) False alarms Out of memory Out of time
Conclusion New method is usually much more efficient (by orders of magnitude) Doesn ’ t lose precision on benchmarks Performance more stable than other methods
Future and Ongoing Work Time optimizations Symbolic (BDD) execution of TVLA operations Compactly represent sets of structures Improving abstraction locality Truly live predicates Analyzing liveness for core predicates and deriving for instrumentation predicates Experiment with other compacting transformations Achieve polynomial complexity
The End