Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv.

Similar presentations


Presentation on theme: "Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv."— Presentation transcript:

1 Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv

2 Motivation TVLA is a powerful and general abstract interpretation system Abstract interpretation in TVLA Operational semantics is expressed with first-order logic + TC formulae Program states are represented as sets of Evolving First-Order Structures Efficiency is an issue

3 Outline Shape Analysis quick intro Compactly representing structures Tuning abstraction to improve performance

4 What is Shape Analysis Determines Shape Invariants for imperative programs Can be used to verify a wide range of properties over different programming languages

5 reverse Example /* list.h */ typedef struct node { struct node * n; int data; } * List; /* print.c */ #include “list.h” List reverse (List x) { List y, t; y = NULL; while (x != NULL) { t = y; y = x; x = x  n; y  n = t; } return y; }

6 reverse Example ynn... Shape before Shape after xnn...

7 Definition of a First-Order Logical Structure S = U – a set of individuals (“node set”)  – a mapping p (r)  (U r  {0,1}) the “interpretation” of p

8 1: True 0: False 1/2: Unknown A join semi-lattice: 0  1 = 1/2 Three-Valued Logic   1/2 Information order

9 Canonical Abstraction Partition the individuals into equivalence classes based on the values of their unary predicates Collapse other predicates via  p S (u ’ 1,..., u ’ k ) =  {p B (u 1,..., u k ) | f(u 1 )=u ’ 1,..., f(u ’ k )=u ’ k ) } At most 3 n abstract individuals

10 Canonical Abstraction Example u 0 r[n,x] u 1 r[n,x] n x u 2 r[n,x] n u 3 r[n,x] n u 0 r[n,x] u r[n,x] n n x

11 Compactly Representing First-Order Logical Structures Space is a major bottleneck Analysis explores many logical structures Reduce space by sharing information across structures

12 Desired Properties Sparse data structures Share common sub-structures Inherited sharing Incidental sharing due to program invariants But feasible time performance Phase sensitive data structures

13 Chapter Outline Background First-order structure representations Base representation (TVLA 0.91) BDD representation Empirical evaluation Conclusion

14 First-Order Logical Structures Generalize shape graphs Arbitrary set of individuals Arbitrary set of predicates on individuals Dynamically evolving Usually small changes Properties are extracted by evaluating first order formula: ∃ v 1, v: x(v 1 ) ∧ n(v 1, v) Join operator requires isomorphism testing

15 First-Order Structure ADT Structure : new() /* empty structure */ SetOfNodes : nodeSet(Structure) Node : newNode(Structure) removeNode(Structure, node) Kleene eval(Structure, p (r), ) update(Structure, p (r),, Kleene) Structure copy(Structure)

16 print_all Example /* list.h */ typedef struct node { struct node * n; int data; } * L; /* print.c */ #include “list.h” void print_all(L y) { L x; x = y; while (x != NULL) { /* assert(x != NULL) */ printf(“elem=%d”, x  data); x = x  n; } }

17 print_all Example S0S0 copy(S 0 ) : S 1 x = y x’(v) := y(v) nodeset(S 0 ) : {u 1, u} eval(S 0, y, u 1 ) : 1 update(S 1, x, u 1, 1) eval(S 0, y, u) : 0 update(S 1, x, u, 0) u 1 y=1 u sm=½ n=½ S1S1 u 1 y=1 u sm=½ n=½ x=1

18 print_all Example x = x  n focus : ∃ v 1 x(v 1 ) ∧ n(v 1, v) x’(v) := ∃ v 1 x(v 1 ) ∧ n(v 1, v) S 2.0 u 1 y=1 u sm=½ n=½ S 2.1 u 1 y=1 u x=1 n=1 n=½ S 2.2 u 1 y=1 u.1 x=1 n=1 n=½ S1S1 u 1 x=1 y=1 u sm=½ n=½ u.0 sm=½ while (x != NULL) precondition : ∃ v x(v)

19 Overview and Main Results 1. Two novel representations of first-order structures New BDD representation New representation using functional maps 2. Implementation techniques 3. Empirical evaluation Comparison of different representations Space is reduced by a factor of 4 – 10 New representations scale better

20 Base Representation (Tal Lev-Ami SAS 2000) Two-Level Map : Predicate  (Node Tuple  Kleene) Sparse Representation Limited inherited sharing by “ Copy-On-Write ”

21 fx3x3 x2x2 x1x1 0000 0100 0010 1110 0001 1101 0011 1111 x3x3 x3x3 x3x3 x3x3 x2x2 x2x2 x1x1 10000101 BDDs in a Nutshell (Bryant 86) Ordered Binary Decision Diagrams Data structure for Boolean functions Functions are represented as (unique) DAGs

22 x3x3 x3x3 x3x3 x3x3 x2x2 x2x2 x1x1 01 x3x3 x3x3 x2x2 x2x2 x1x1 01 x3x3 x2x2 x1x1 01 Duplicate TerminalsDuplicate NonterminalsRedundant Tests BDDs in a Nutshell (Bryant 86) Ordered Binary Decision Diagrams Data structure for Boolean functions Functions are represented as (unique) DAGs Also achieve sharing across functions

23 Encoding Structures Using Integers Static encoding of Predicates Kleene values Dynamic encoding of nodes 0, 1, …, n-1 Encode predicate p ’ s values as e p (p).e n (u 1 ). e n (u 2 ). …. e n (u n ). e k (Kleene)

24 BDD Representation of Integer Sets Characteristic function S={1,5} 1= 5=  S = ( ¬ x 1  ¬ x 2  x 3 )  (x 1  ¬ x 2  x 3 ) 10 x2x2 x1x1 x3x3 x2x2

25 BDD Representation of Integer Sets Characteristic function S={1,5} 1= 5=  S = ( ¬ x 1  ¬ x 2  x 3 )  (x 1  ¬ x 2  x 3 ) 1 x2x2 x1x1 x3x3 x2x2

26 1 S0S0 BDD Representation Example S0S0 u 1 y=1 u sm=½ n=½

27 1 S0S0 S1S1 BDD Representation Example x=y S1S1 u 1 x=1 y=1 u sm=½ n=½ S0S0 u 1 y=1 u sm=½ n=½

28 1 S0S0 S1S1 S 2.2 BDD Representation Example x=y x=x  n S 2.2 u 1 y=1 u.1 x=1 n=1 n=½ u.0 sm=½ S1S1 u 1 x=1 y=1 u sm=½ n=½ S0S0 u 1 y=1 u sm=½ n=½

29 1 S0S0 S1S1 S 2.2 BDD Representation Example x=y x=x  n S 2.2 u 1 y=1 u.1 x=1 n=1 n=½ u.0 sm=½ S1S1 u 1 x=1 y=1 u sm=½ n=½ S0S0 u 1 y=1 u sm=½ n=½

30 Improved BDD Representation Using this representation directly doesn ’ t save space – canonicity doesn ’ t carry over from propositional to first-order logic Observation Node names can be arbitrarily remapped without affecting the ADT semantics Our heuristics Use canonic node names to encode nodes and obtain a canonic representation Increases incidental sharing Reduces isomorphism test to pointer comparison 4-10 space reduction

31 Reducing Time Overhead Current implementation not optimized Expensive formula evaluation Hybrid representation Distinguish between phases: mutable phase  Join  immutable phase Dynamically switch representations

32 Functional Representation Alternative representation for first-order structures Structures represented by maps from integers to Kleene values Tailored for representing first-order structures Achieves better results than BDDs Techniques similar to the BDD representation More details in the thesis

33 Introduction to Functional Maps A mapping N  {0,½,1} 210 10½ 3 Nodes contain a fixed number of values Hierarchical maps

34 Introduction to Functional Maps Sparse maps 210 10½ size = 9 543 000 876 10½ size = 27

35 Introduction to Functional Maps Share unique sub-maps 210 10½ size = 9 876 10½ size = 27

36 Introduction to Functional Maps Share unique sub-maps 210 10½ size = 9 size = 27

37 Functional Representation Example yxsm 100 yx 00½ n ½ size=9 size=27 S0S0 binaryunarynullary u 1 y=1 u sm=½ n=½

38 Functional Representation Example yxsm 100 yx 00½ yx 110 n ½ size=9 size=27 S0S0 binaryunarynullary S1S1 binaryunarynullary u 1 y=1 u sm=½ n=½ u 1 x=1 y=1 u sm=½ n=½

39 Functional Representation Example yxsm 100 yx 00½ yx 010 yx 110 n ½ n 1 size=9 size=27 size=81 S0S0 binaryunarynullary S 2.2 binaryunarynullary S1S1 binaryunarynullary u 1 y=1 u.1 x=1 n=1 n=½ u.0 sm=½ u 1 y=1 u sm=½ n=½ u 1 x=1 y=1 u sm=½ n=½

40 Reducing Time Overhead “ Lazy ” normalization is used to balance time/space performance

41 Empirical Evaluation Benchmarks: Cleanness Analysis (SAS 2000) Garbage Collector CMP (PLDI 2002) of Java Front-End and Kernel Benchmarks Mobile Ambients (ESOP 2000) Stress testing the representations We use “ relational analysis ” Save structures in every CFG location

42 Space Results

43

44 Abstract Counters Ignore language/implementation details A more reliable measurement technique Count only crucial space information Independent of C/Java

45 Abstract Counters Results

46 Trends in the Cleanness Analysis Benchmark

47 Conclusions Two novel representations of first-order structures New BDD representation New representation using functional maps Implementation techniques Substantially better than inherited sharing Structure canonization is crucial Normalization via hash-consing is the key technique

48 Conclusions The use of BDDs for static analysis is not a panacea for space saving Domain-specific encoding crucial for saving space Failed attempts Original implementation of Veith ’ s encoding PAG

49 Tuning Abstraction for Improved Performance Analysis can be very costly Explores many structures GC example explores >180,000 structures

50 Existing Analysis Modes Relational analysis Doubly-exponential in worst case Our most precise method Single-structure analysis (Tal Lev-Ami SAS 2000) Singly-exponential in worst case Can be very efficient Can be very imprecise Sometimes very inefficient

51 Single-Structure Analysis u1u1 x u n u1u1 x u1u1 x u n S1S1 S0S0 S 0  S 1 May exist

52 Single-Structure Analysis Active property ac=0 doesn ’ t exist in every concrete structure ac=1 exists in every concrete structure ac=1/2 may exist in some concrete structure u 1 ac=1 x u ac=1 n u 1 ac=1 x x u ac=1/2 n S1S1 S0S0 S 0  S 1

53 Single-Structure Analysis Sometimes overly imprecise Refine analysis by using nullary predicates to distinguish between different structures

54 Is there a “ sweet spot ” ? Relational Analysis Efficiency Precision

55 Chapter Outline Removing embedded structures Merging structures with same set of canonical names Staged analysis to localize abstraction Merging pseudo-embedded structures

56 Order Relations on Structures and Sets of Structures S, S ’  3-STRUCT S  ƒ S ’ if for every predicate p 1. p s (u 1, …,u k )  p s ’ ( ƒ (u 1 ), …, ƒ (u k ) ) 2. ( { u | ƒ (u)=u ’ } > 1)  sm s ’ (u ’ ) X, X ’  2 3-STRUCT X  X ’ Every S  X has S ’  X ’ and S  S ’

57 Compacting Transformations We look for transformation T: 2 3-STRUCT  2 3-STRUCT with the following properties: 1. Compacting – |T(x)|  |x| 2. Conservative – T(x)  x Without sacrificing precision

58 Removing Embedded Structures u 2 r[n,t] r[n,y] u 1 r[n,t] r[n,y] n y t u 0 r[n,x] x S0S0 u 2 r[n,t] r[n,y] n y t u 0 r[n,x] x S1S1 u 1 r[n,t] r[n,y] n ƒ ƒ ƒ

59 Removing Embedded Structures u 2 r[n,t] r[n,y] u 1 r[n,t] r[n,y] n y t u 0 r[n,x] x S0S0 u 2 r[n,t] r[n,y] n y t u 0 r[n,x] x S1S1 u 1 r[n,t] r[n,y] n Reversing a list with exactly 3 cells Reversing a list with at least 3 cells

60 Detecting Embedding is hard In general, as hard as GRAPH ISOMORPHISM Conditions for a unique mapping: Canonical abstraction Definite values Polynomial time check

61 Results (#structures explored)

62

63 Canonical Names Method Canonical abstraction merges individuals with same canonical names (unary abstraction predicate values) Merge structures with same set of canonical names Both transformations preserve “ definity ” of abstraction predicates But ignores precision of non-abstraction predicates

64 Canonical Abstraction Example u 0 r[n,x] u 1 r[n,x] n x u 2 r[n,x] n u 3 r[n,x] n u 0 r[n,x] u r[n,x] n n x

65 Merging Structures with Same Canonical Names Example u 0 r[n,x] u r[n,x] n n x u 0 r[n,x] u r[n,x] n x u 0 r[n,x] u r[n,x] n n x S1S1 S0S0 S 0  S 1

66 Merging Structures with Same Canonical Names Example u0u0 u n x S1S1 S0S0 S 0  S 1 u0u0 ux u0u0 u n x

67 Results (#structures explored)

68 Localizing Abstraction Find an appropriate subset of abstraction predicates for every CFG node Observation: programs contain dead variables – exploit to make corresponding predicates “ dead ” Compute “ predicate liveness ” to determine subset of abstraction predicates

69 reverse Example List reverse (List x) { L0: List y, t; L1: y = NULL; L2: while (x != NULL) { L3: t = y; L4: y = x; L5: x = x  n; L6: y  n = t; } L7: return y; } y dead t dead all dead

70 Results (#structures explored)

71 Compaction via Pseudo-Embedding Pseudo-Embedding – similar to embedding with respect to abs. predicates S, S ’  3-STRUCT S  ’ ƒ S ’ if for every abstract predicate p 1. p s (u)  p s ’ ( ƒ (u ) ) 2. ( { u | ƒ (u)=u ’ } > 1)  sm s ’ (u ’ )

72 Modified blur Order relation on nodes: u 1  u 2 if for every abstraction predicate p p s (u 1 )  p s ’ (u 2 ) blur ’ merges u 1 with u 2 if u 1  u 2

73 blur ’ Example u 0 r[n,x] u r[n,x] n x n x blur’

74 Merging Pseudo-Embedded Structures Example u 0 r[n,x] u r[n,y] r[n,x] n x S1S1 S0S0 S 0  S 1 x y n y u r[n,y] r[n,x] x y n u r[n,y] =1/2 r[n,x] Abstraction predicates = {x,y} Non-abstraction predicates = {r[n,x], r[n,y], n}

75 Results (#structures explored)

76 Empirical Evaluation Benchmarks: Garbage Collector Mobile Ambients (ESOP 2000) Sorting procedures (ISSTA 2000) MA + J2 : completed without instrumentation predicates and without messages

77 Results (#structures explored) False alarms Out of memory Out of time

78 Conclusion New method is usually much more efficient (by orders of magnitude) Doesn ’ t lose precision on benchmarks Performance more stable than other methods

79 Future and Ongoing Work Time optimizations Symbolic (BDD) execution of TVLA operations Compactly represent sets of structures Improving abstraction locality Truly live predicates Analyzing liveness for core predicates and deriving for instrumentation predicates Experiment with other compacting transformations Achieve polynomial complexity

80 The End


Download ppt "Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv."

Similar presentations


Ads by Google