Shape & Alias Analyses Jaehwang Kim and Jaeho Shin Programming Research Laboratory Seoul National University
Overview K-limited Graph Graph Type Shape Type Memory Type Shape Grammar May-Alias Analysis with Access Paths Shape Analysis via 3-valued Logic
k-limited Graph [Jones&Muchnick79] Parts of AST Motivation –Optimization Storage allocation Garbage collection –Memory error detection
Store = Set of Graphs Semantics of assignments 2 Store node: Var -> Store -> Node AS[[ X := atom]] = add a new leaf node labeled a; move X to the new node AS[[ X := Y ]] = move X to (node Y ) AS[[ X := cons(Y,Z)]] = make a new node n and move X to label n; add hd, tl edge to (node Y ) and (node Z ) respectively; X Y Z a bc hd tl
Share = Set of k-limited Graphs k-limited graph All nodes are accessible by k or less steps from at least one variable Each node may be labeled by one or more variables unknown nodes labeled ? –node represents a set of nodes whose internal structure is not known –?s: node may contains sharings –?c: node may contains cycles Edges from unknown nodes are represented by ---> ?c ?s ? Z X Y W Finite variables, finite size, finite selectors => Share is finite!
AS[[assign]] 2 Share -> Share [[s]] D = [ 2 D clean(next[[s]] ) –next[[s]] : creates a node or edges moves a variable to other nodes –clean( ): make graphs k-limited 1.U( ) = subgraph inaccessible from any variables by path of length k-1 or less 2.remove inaccessible nodes 3.partition U( ) into SCCs, coalesce each SCC unknown node labeled ?c 4.partition U( ) into CC, coalesce each CC unknown node labeled ?c, ?s, or ?
Example: clean for 2-limited graph X Y Z ?s remove garbage coalesce cycles coalesce CC X Y Z ?s X Y Z ?c X Y Z ?s ?c
Graph Types [Klarlund&Schwartzbach93] Recursive data type is useful for tree- shaped value L ! nonempty(head:Int, tail:L) | empty() 123 Recursive data type can’t represent graph shaped value!
Routing expression Graph type = Recursive data type + Routing expression –T ! v(…, a:T[R], …) –R = { a, ", " a, ^, $}* Examples –List with head H ! (first:L, last:L[ first ( tail)*$ " ]) L ! (head:Int, tail:L) | () –Cyclic lilst C ! (next:C) | (next:C[ " *^])
Routing Expression (cont) Well-formedness –Every routing expression always defines a unique destination –Checked by the monadic second-order logic Evaluation –Linear-time evaluation of routing expression by finite automata –Runtime evaluation cost is major weakness of graph type
Shape Type [Fradet&Metayer97] Context free graph grammar H = NT : ranked non-terminal T : ranked terminal PR : productions l ! r s.t. l 2 NT, r 2 2 Term Term : built from NT, T and set of variables Terminal : T and set of variables O : origin or derivation Shape(H) = {M| M ! *{O} and M 2 Terminal} X + ( r) ! * X + ( l) iff l ! r 2 PR and (Var( r) – Var( l)) Å Var(X) = ; : variable substitution
Example a1a2a3 p pred next pred Doubly -> p x, pred x x, L x L x -> next x y, pred y x, L y | next x x
Shape-C Transformer P = (C => A)If Check(C,A,PR,O), then for all X and , X + ( C) ! * {O} => X + ( A) ! * {O} shape int cir { pt x, L x x; L x y = L x z, L z y; L x y = next x y; }; … cir s = [|=> pt x; next x x; $x=1; |]; … s:[| pt x; next x y; => pt x; next x z; next z y; $z=i; |] pt x; next x z; next z y pt x; L x z; L z y pt x; L x y
Memory Type[LeYaYi03] Goal: Separating reusable heap cells from others fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert b i t in free l when b; h::z
Multiset Formula A term for an abstract multiset of locations: L 2 Locations ! {0,1, 1 }
Memory Type For trees, For functions which have tree ! tree type, left right root resultusage input new
Estimating Memory Usage
X (1) Memory-Usage Analysis fun copyleft a = case a of Leaf => Leaf | Node(al,ar) => let r = copyleft al (1) in Node(r,ar) (2) X (2) A.rightA.left A.root copyleft : resultusage Result: Usage: A R
Constructing Dynamic Flags 1/2
Find condition to free “ ” under two preservation constraints: Constructing Dynamic Flags 2/2
Final copyleft fun copyleft [ , ns ] a = case a of Leaf => Leaf | Node(al,ar) => let r = copyleft [ ^ ns, ns ] al in (free a when ; Node(r,ar)) : whether it is safe to free the input tree excluding the result ns :whether the input tree has no shared sub- parts
Shape Grammar[LeYaYi05] Language Abstract Domain & Semantics Correctness: is provable by the separation logic
Normalization rmjunkfold unifysimplyfy a cb x a x ::= a cb x d fe x nil a cb x ::= R( ) [ R( ) ::= R( ) [ {nil} a x y b nil a x y a’a’ ::= nil | | ::= ::= ::= nil | ::= | {/}{/} = { /nil} [ { ::= nil} = ::= nil | ::= | nil { / } = { ::= nil | }
Cyclic Structure Non-terminals have parameters Ex. Doubly linked list dll ::= nil | c ba x c ba x c (a) c ::=
Conclusion
May-alias Analysis for Pointers: Beyond k- limiting A. Deutsch PLDI’94
Symbolic Access Path String of the form e1.e2. ….en, ei is either: –a selector: name of a variable or structure field, dereference operator –an expression of form B k where B is the basis of some recursive type e.g. –X (tl ) i hd –T {left ,right } j key Note: x f for a shorthand of (*x).f
Symbolic Alias Pairs (, K) –Symbolic access paths f1 and f2 are aliases where equations K of numeric domain satisfy –e.g. (, {k1=k2})
Program Semantics Every operation a statement applies to an access path is also applied to all aliases e.g.
But This Approach Might be effective for finding aliases But not appropriate for memory leak detection Since unreachable/leaking memory cells (i.e. cells that have no access paths) cannot be expressed, hence ignored
Shape Analysis via 3-valued Logic M. Sagiv, T. Reps, R. Wilhelm POPL’99
Representing Stores Using predicates –x(u): Is u pointed by x? –n(u,u’): Is u’ pointed by field n of u? –sm(u): Is u summarizing multiple cells?
3-valued Logic Definite/indefinite values –x(u) = 1 means x points u –x(u) = 0 means x does not point u –x(u) = ½ means x may or may not point u First order formulae with transitive closure are used Unbiased analysis is possible 10 ½
3-valued Structures 3-valued structure ∈ 3-STRUCT[P] –P: Some given predicates (vocabularies) –U: universe of cells (individuals) –i: mapping 3-value for predicates over cells e.g. i(x)(u) = ½ is the value for x(u) Ordered by existence of embedding –f embeds S= in S’= if i S (p)(u 1, …, u k ) ≤ i S’ (p)(u 1, …, u k ) (|{u | f(u) = u’}| > 1) ≤ i S’ (sm)(u’)
Abstraction Renaming cells to their canonical names –partitions cells by their properties –bounds the size of a structure finite x y u1u1 unun u n+ 1 x y u {x},{y} u {y},{x} … t_embed c
Abstract Semantics 1/2 Semantics of each statement –is given by predicate update formulae which defines the new value of each predicate after running the statement e.g. st: x = t->n
Abstract Semantics 2/2 3-STRUCT[P] 3-STRUCT[P] transformer associated with st –Especially, x = malloc() changes the universe
Shape Analysis Algorithm Computes the least fixpoint of set of structures –For each node v in control flow graph G
Precision Improvements Decompose transformer for st into 3 steps 1.Focus 2.Transform (the original transformer) 3.Coerce
Focus For given formulae, Focus returns set of structures that each makes the formulae evaluate to a definite value Abstract interpretation continues separately with each case Improves precision
Coerce Compatible structures –Remove impossible structures –Recover indefinite values to definite –e.g. single variable cannot point two cells x(u) = 1 and x(u’) = ½ then x(u’) = 0 Improves precision
Partially Isomorphism Heap Abstraction R. Manevich et al., SAS2004 Merges structures with same universes Hence, reduces the size of sets used in fixpoint computation Improves analysis speed 3~100 times Reduces memory usage up to 1/200
What we can learn Sharing is not a simple problem We might need relation rather than function for points-to information …