Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv.

Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv

Motivation TVLA is a powerful and general abstract interpretation system Abstract interpretation in TVLA Operational semantics is expressed with first-order logic + TC formulae Program states are represented as sets of Evolving First-Order Structures Efficiency is an issue

Outline Shape Analysis quick intro Compactly representing structures Tuning abstraction to improve performance

What is Shape Analysis Determines Shape Invariants for imperative programs Can be used to verify a wide range of properties over different programming languages

reverse Example /* list.h */ typedef struct node { struct node * n; int data; } * List; /* print.c */ #include “list.h” List reverse (List x) { List y, t; y = NULL; while (x != NULL) { t = y; y = x; x = x  n; y  n = t; } return y; }

reverse Example ynn... Shape before Shape after xnn...

Definition of a First-Order Logical Structure S = U – a set of individuals (“node set”)  – a mapping p (r)  (U r  {0,1}) the “interpretation” of p

1: True 0: False 1/2: Unknown A join semi-lattice: 0  1 = 1/2 Three-Valued Logic   1/2 Information order

Canonical Abstraction Partition the individuals into equivalence classes based on the values of their unary predicates Collapse other predicates via  p S (u ’ 1,..., u ’ k ) =  {p B (u 1,..., u k ) | f(u 1 )=u ’ 1,..., f(u ’ k )=u ’ k ) } At most 3 n abstract individuals

Canonical Abstraction Example u 0 r[n,x] u 1 r[n,x] n x u 2 r[n,x] n u 3 r[n,x] n u 0 r[n,x] u r[n,x] n n x

Compactly Representing First-Order Logical Structures Space is a major bottleneck Analysis explores many logical structures Reduce space by sharing information across structures

Desired Properties Sparse data structures Share common sub-structures Inherited sharing Incidental sharing due to program invariants But feasible time performance Phase sensitive data structures

Chapter Outline Background First-order structure representations Base representation (TVLA 0.91) BDD representation Empirical evaluation Conclusion

First-Order Logical Structures Generalize shape graphs Arbitrary set of individuals Arbitrary set of predicates on individuals Dynamically evolving Usually small changes Properties are extracted by evaluating first order formula: ∃ v 1, v: x(v 1 ) ∧ n(v 1, v) Join operator requires isomorphism testing

First-Order Structure ADT Structure : new() /* empty structure */ SetOfNodes : nodeSet(Structure) Node : newNode(Structure) removeNode(Structure, node) Kleene eval(Structure, p (r), ) update(Structure, p (r),, Kleene) Structure copy(Structure)

print_all Example /* list.h */ typedef struct node { struct node * n; int data; } * L; /* print.c */ #include “list.h” void print_all(L y) { L x; x = y; while (x != NULL) { /* assert(x != NULL) */ printf(“elem=%d”, x  data); x = x  n; } }

print_all Example S0S0 copy(S 0 ) : S 1 x = y x’(v) := y(v) nodeset(S 0 ) : {u 1, u} eval(S 0, y, u 1 ) : 1 update(S 1, x, u 1, 1) eval(S 0, y, u) : 0 update(S 1, x, u, 0) u 1 y=1 u sm=½ n=½ S1S1 u 1 y=1 u sm=½ n=½ x=1

print_all Example x = x  n focus : ∃ v 1 x(v 1 ) ∧ n(v 1, v) x’(v) := ∃ v 1 x(v 1 ) ∧ n(v 1, v) S 2.0 u 1 y=1 u sm=½ n=½ S 2.1 u 1 y=1 u x=1 n=1 n=½ S 2.2 u 1 y=1 u.1 x=1 n=1 n=½ S1S1 u 1 x=1 y=1 u sm=½ n=½ u.0 sm=½ while (x != NULL) precondition : ∃ v x(v)

Overview and Main Results 1. Two novel representations of first-order structures New BDD representation New representation using functional maps 2. Implementation techniques 3. Empirical evaluation Comparison of different representations Space is reduced by a factor of 4 – 10 New representations scale better

Base Representation (Tal Lev-Ami SAS 2000) Two-Level Map : Predicate  (Node Tuple  Kleene) Sparse Representation Limited inherited sharing by “ Copy-On-Write ”

fx3x3 x2x2 x1x1 0000 0100 0010 1110 0001 1101 0011 1111 x3x3 x3x3 x3x3 x3x3 x2x2 x2x2 x1x1 10000101 BDDs in a Nutshell (Bryant 86) Ordered Binary Decision Diagrams Data structure for Boolean functions Functions are represented as (unique) DAGs

x3x3 x3x3 x3x3 x3x3 x2x2 x2x2 x1x1 01 x3x3 x3x3 x2x2 x2x2 x1x1 01 x3x3 x2x2 x1x1 01 Duplicate TerminalsDuplicate NonterminalsRedundant Tests BDDs in a Nutshell (Bryant 86) Ordered Binary Decision Diagrams Data structure for Boolean functions Functions are represented as (unique) DAGs Also achieve sharing across functions

Encoding Structures Using Integers Static encoding of Predicates Kleene values Dynamic encoding of nodes 0, 1, …, n-1 Encode predicate p ’ s values as e p (p).e n (u 1 ). e n (u 2 ). …. e n (u n ). e k (Kleene)

BDD Representation of Integer Sets Characteristic function S={1,5} 1= 5=  S = ( ¬ x 1  ¬ x 2  x 3 )  (x 1  ¬ x 2  x 3 ) 10 x2x2 x1x1 x3x3 x2x2

BDD Representation of Integer Sets Characteristic function S={1,5} 1= 5=  S = ( ¬ x 1  ¬ x 2  x 3 )  (x 1  ¬ x 2  x 3 ) 1 x2x2 x1x1 x3x3 x2x2

1 S0S0 BDD Representation Example S0S0 u 1 y=1 u sm=½ n=½

1 S0S0 S1S1 BDD Representation Example x=y S1S1 u 1 x=1 y=1 u sm=½ n=½ S0S0 u 1 y=1 u sm=½ n=½

1 S0S0 S1S1 S 2.2 BDD Representation Example x=y x=x  n S 2.2 u 1 y=1 u.1 x=1 n=1 n=½ u.0 sm=½ S1S1 u 1 x=1 y=1 u sm=½ n=½ S0S0 u 1 y=1 u sm=½ n=½

Improved BDD Representation Using this representation directly doesn ’ t save space – canonicity doesn ’ t carry over from propositional to first-order logic Observation Node names can be arbitrarily remapped without affecting the ADT semantics Our heuristics Use canonic node names to encode nodes and obtain a canonic representation Increases incidental sharing Reduces isomorphism test to pointer comparison 4-10 space reduction

Reducing Time Overhead Current implementation not optimized Expensive formula evaluation Hybrid representation Distinguish between phases: mutable phase  Join  immutable phase Dynamically switch representations

Functional Representation Alternative representation for first-order structures Structures represented by maps from integers to Kleene values Tailored for representing first-order structures Achieves better results than BDDs Techniques similar to the BDD representation More details in the thesis

Introduction to Functional Maps A mapping N  {0,½,1} 210 10½ 3 Nodes contain a fixed number of values Hierarchical maps

Introduction to Functional Maps Sparse maps 210 10½ size = 9 543 000 876 10½ size = 27

Introduction to Functional Maps Share unique sub-maps 210 10½ size = 9 876 10½ size = 27

Introduction to Functional Maps Share unique sub-maps 210 10½ size = 9 size = 27

Functional Representation Example yxsm 100 yx 00½ n ½ size=9 size=27 S0S0 binaryunarynullary u 1 y=1 u sm=½ n=½

Functional Representation Example yxsm 100 yx 00½ yx 110 n ½ size=9 size=27 S0S0 binaryunarynullary S1S1 binaryunarynullary u 1 y=1 u sm=½ n=½ u 1 x=1 y=1 u sm=½ n=½

Functional Representation Example yxsm 100 yx 00½ yx 010 yx 110 n ½ n 1 size=9 size=27 size=81 S0S0 binaryunarynullary S 2.2 binaryunarynullary S1S1 binaryunarynullary u 1 y=1 u.1 x=1 n=1 n=½ u.0 sm=½ u 1 y=1 u sm=½ n=½ u 1 x=1 y=1 u sm=½ n=½

Reducing Time Overhead “ Lazy ” normalization is used to balance time/space performance

Empirical Evaluation Benchmarks: Cleanness Analysis (SAS 2000) Garbage Collector CMP (PLDI 2002) of Java Front-End and Kernel Benchmarks Mobile Ambients (ESOP 2000) Stress testing the representations We use “ relational analysis ” Save structures in every CFG location

Space Results

Abstract Counters Ignore language/implementation details A more reliable measurement technique Count only crucial space information Independent of C/Java

Abstract Counters Results

Trends in the Cleanness Analysis Benchmark

Conclusions Two novel representations of first-order structures New BDD representation New representation using functional maps Implementation techniques Substantially better than inherited sharing Structure canonization is crucial Normalization via hash-consing is the key technique

Conclusions The use of BDDs for static analysis is not a panacea for space saving Domain-specific encoding crucial for saving space Failed attempts Original implementation of Veith ’ s encoding PAG

Tuning Abstraction for Improved Performance Analysis can be very costly Explores many structures GC example explores >180,000 structures

Existing Analysis Modes Relational analysis Doubly-exponential in worst case Our most precise method Single-structure analysis (Tal Lev-Ami SAS 2000) Singly-exponential in worst case Can be very efficient Can be very imprecise Sometimes very inefficient

Single-Structure Analysis u1u1 x u n u1u1 x u1u1 x u n S1S1 S0S0 S 0  S 1 May exist

Single-Structure Analysis Active property ac=0 doesn ’ t exist in every concrete structure ac=1 exists in every concrete structure ac=1/2 may exist in some concrete structure u 1 ac=1 x u ac=1 n u 1 ac=1 x x u ac=1/2 n S1S1 S0S0 S 0  S 1

Single-Structure Analysis Sometimes overly imprecise Refine analysis by using nullary predicates to distinguish between different structures

Is there a “ sweet spot ” ? Relational Analysis Efficiency Precision

Chapter Outline Removing embedded structures Merging structures with same set of canonical names Staged analysis to localize abstraction Merging pseudo-embedded structures

Order Relations on Structures and Sets of Structures S, S ’  3-STRUCT S  ƒ S ’ if for every predicate p 1. p s (u 1, …,u k )  p s ’ ( ƒ (u 1 ), …, ƒ (u k ) ) 2. ( { u | ƒ (u)=u ’ } > 1)  sm s ’ (u ’ ) X, X ’  2 3-STRUCT X  X ’ Every S  X has S ’  X ’ and S  S ’

Compacting Transformations We look for transformation T: 2 3-STRUCT  2 3-STRUCT with the following properties: 1. Compacting – |T(x)|  |x| 2. Conservative – T(x)  x Without sacrificing precision

Removing Embedded Structures u 2 r[n,t] r[n,y] u 1 r[n,t] r[n,y] n y t u 0 r[n,x] x S0S0 u 2 r[n,t] r[n,y] n y t u 0 r[n,x] x S1S1 u 1 r[n,t] r[n,y] n ƒ ƒ ƒ

Removing Embedded Structures u 2 r[n,t] r[n,y] u 1 r[n,t] r[n,y] n y t u 0 r[n,x] x S0S0 u 2 r[n,t] r[n,y] n y t u 0 r[n,x] x S1S1 u 1 r[n,t] r[n,y] n Reversing a list with exactly 3 cells Reversing a list with at least 3 cells

Detecting Embedding is hard In general, as hard as GRAPH ISOMORPHISM Conditions for a unique mapping: Canonical abstraction Definite values Polynomial time check

Results (#structures explored)

Canonical Names Method Canonical abstraction merges individuals with same canonical names (unary abstraction predicate values) Merge structures with same set of canonical names Both transformations preserve “ definity ” of abstraction predicates But ignores precision of non-abstraction predicates

Canonical Abstraction Example u 0 r[n,x] u 1 r[n,x] n x u 2 r[n,x] n u 3 r[n,x] n u 0 r[n,x] u r[n,x] n n x

Merging Structures with Same Canonical Names Example u 0 r[n,x] u r[n,x] n n x u 0 r[n,x] u r[n,x] n x u 0 r[n,x] u r[n,x] n n x S1S1 S0S0 S 0  S 1

Merging Structures with Same Canonical Names Example u0u0 u n x S1S1 S0S0 S 0  S 1 u0u0 ux u0u0 u n x

Localizing Abstraction Find an appropriate subset of abstraction predicates for every CFG node Observation: programs contain dead variables – exploit to make corresponding predicates “ dead ” Compute “ predicate liveness ” to determine subset of abstraction predicates

reverse Example List reverse (List x) { L0: List y, t; L1: y = NULL; L2: while (x != NULL) { L3: t = y; L4: y = x; L5: x = x  n; L6: y  n = t; } L7: return y; } y dead t dead all dead

Compaction via Pseudo-Embedding Pseudo-Embedding – similar to embedding with respect to abs. predicates S, S ’  3-STRUCT S  ’ ƒ S ’ if for every abstract predicate p 1. p s (u)  p s ’ ( ƒ (u ) ) 2. ( { u | ƒ (u)=u ’ } > 1)  sm s ’ (u ’ )

Modified blur Order relation on nodes: u 1  u 2 if for every abstraction predicate p p s (u 1 )  p s ’ (u 2 ) blur ’ merges u 1 with u 2 if u 1  u 2

blur ’ Example u 0 r[n,x] u r[n,x] n x n x blur’

Merging Pseudo-Embedded Structures Example u 0 r[n,x] u r[n,y] r[n,x] n x S1S1 S0S0 S 0  S 1 x y n y u r[n,y] r[n,x] x y n u r[n,y] =1/2 r[n,x] Abstraction predicates = {x,y} Non-abstraction predicates = {r[n,x], r[n,y], n}

Empirical Evaluation Benchmarks: Garbage Collector Mobile Ambients (ESOP 2000) Sorting procedures (ISSTA 2000) MA + J2 : completed without instrumentation predicates and without messages

Results (#structures explored) False alarms Out of memory Out of time

Conclusion New method is usually much more efficient (by orders of magnitude) Doesn ’ t lose precision on benchmarks Performance more stable than other methods

Future and Ongoing Work Time optimizations Symbolic (BDD) execution of TVLA operations Compactly represent sets of structures Improving abstraction locality Truly live predicates Analyzing liveness for core predicates and deriving for instrumentation predicates Experiment with other compacting transformations Achieve polynomial complexity

The End

Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv.

Similar presentations

Presentation on theme: "Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv.

Similar presentations

Presentation on theme: "Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv."— Presentation transcript:

Similar presentations

About project

Feedback