Download presentation
Presentation is loading. Please wait.
1
Program Analysis and Verification 0368-4479
Noam Rinetzky Lecture 11: Shape Analysis + Interprocedural Analysis Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav
2
3-Value logic based shape analysis
3
Sequential Stack Want to Verify No Null Dereference
void push (int v) { Node x = malloc(sizeof(Node)); x->d = v; x->n = Top; Top = x; } int pop() { if (Top == NULL) return EMPTY; Node *s = Top->n; int r = Top->d; Top = s; return r; } Want to Verify No Null Dereference Underlying list remains acyclic after each operation
4
Shape Analysis via 3-valued Logic
1) Abstraction 3-valued logical structure canonical abstraction 2) Transformers via logical formulae soundness by construction embedding theorem, [SRW02]
5
Concrete State represent a concrete state as a two-valued logical structure Individuals = heap allocated objects Unary predicates = object properties Binary predicates = relations parametric vocabulary n n Top u1 u2 u3 (storeless, no heap addresses)
6
Concrete State S = <U, > over a vocabulary P U – universe
- interpretation, mapping each predicate from p to its truth value in S Top n u1 u2 u3 U = { u1, u2, u3} P = { Top, n } (n)(u1,u2) = 1, (n)(u1,u3)=0, (n)(u2,u1)=0,… (Top)(u1)=1, (Top)(u2)=0, (Top)(u3)=0
7
Formulae for Observing Properties
void push (int v) { Node x = malloc(sizeof(Node)); xd = v; xn = Top; Top = x; } Top u1 u2 u3 Top != null w: x(w) w:Top(w) 1 w: x(w) No node precedes Top v1,v2: n(v1, v2) Top(v2) 1 No Cycles 1 v1,v2: n(v1, v2) n*(v2, v1) v1,v2: n(v1, v2) n*(v2, v1) v1,v2: n(v1, v2) Top(v2)
8
Concrete Interpretation Rules
Statement Update formula x =NULL x’(v)= 0 x= malloc() x’(v) = IsNew(v) x=y x’(v)= y(v) x=y next x’(v)= w: y(w) n(w, v) x next=y n’(v, w) = (x(v) n(v, w)) (x(v) y(w))
9
Example: s = Topn Top Top s u1 u2 u3 u1 u2 u3
s’(v) = v1: Top(v1) n(v1,v) u1 u2 u3 Top u1 1 u2 u3 n u1 u2 U3 1 u3 Top u1 1 u2 u3 n u1 u2 U3 1 u3 s u1 u2 u3 s u1 u2 1 u3
10
Collecting Semantics { st(w)(S) | S CSS[w] }
if v = entry { <,> } { st(w)(S) | S CSS[w] } (w,v) E(G), w Assignments(G) { S | S CSS[w] } (w,v) E(G), w Skip(G) CSS [v] = { S | S CSS[w] and S cond(w)} othrewise (w,v) True-Branches(G) { S | S CSS[w] and S cond(w)} (w,v) False-Branches(G)
11
Collecting Semantics At every program point – a potentially infinite set of two-valued logical structures Representing (at least) all possible heaps that can arise at the program point Next step: find a bounded abstract representation
12
3-Valued Logic 1 = true 0 = false 1/2 = unknown
A join semi-lattice, 0 1 = 1/2 1/2 information order 1 logical order
13
3-Valued Logical Structures
A set of individuals (nodes) U Relation meaning Interpretation of relation symbols in P (p0)() {0,1, 1/2} (p1)(v) {0,1, 1/2} (p2)(u,v) {0,1, 1/2} A join semi-lattice: 0 1 = 1/2
14
Boolean Connectives [Kleene]
15
Property Space 3-struct[P] = the set of 3-valued logical structures over a vocabulary (set of predicates) P Abstract domain (3-Struct[P]) is
16
Embedding Order Given two structures S = <U, >, S’ = <U’, ’> and an onto function f : U U’ mapping individuals in U to individuals in U’ We say that f embeds S in S’ (denoted by S S’) if for every predicate symbol p P of arity k: u1, …, uk U, (p)(u1, ..., uk) ’(p)(f(u1), ..., f (uk)) and for all u’ U’ ( | { u | f (u) = u’ } | > 1) ’(sm)(u’) We say that S can be embedded in S’ (denoted by S S’) if there exists a function f such that S f S’
17
Tight Embedding S’ = <U’, ’> is a tight embedding of S=< U, > with respect to a function f if: S’ does not lose unnecessary information ’(u’1,…, u’k) = {(u1 ..., uk) | f(u1)=u’1,..., f(uk)=u’k} One way to get tight embedding is canonical abstraction
18
Canonical Abstraction
Top u1 u1 u2 u2 u3 u3 [Sagiv, Reps, Wilhelm, TOPLAS02]
19
Canonical Abstraction
Top u1 u1 u2 u2 u3 u3 Top [Sagiv, Reps, Wilhelm, TOPLAS02]
20
Canonical Abstraction
Top u1 u1 u2 u2 u3 u3 Top
21
Canonical Abstraction
Top u1 u1 u2 u2 u3 u3 Top
22
Canonical Abstraction
Top u1 u1 u2 u2 u3 u3 Top
23
Canonical Abstraction
Top u1 u1 u2 u2 u3 u3 Top
24
Canonical Abstraction ()
Merge all nodes with the same unary predicate values into a single summary node Join predicate values ’(u’1 ,..., u’k) = { (u1 ,..., uk) | f(u1)=u’1 ,..., f(uk)=u’k } Converts a state of arbitrary size into a 3-valued abstract state of bounded size (C) = { (c) | c C }
25
Information Loss Top n Canonical abstraction Top Top n Top Top Top n
26
Instrumentation Predicates
Record additional derived information via predicates rx(v) = v1: x(v1) n*(v1,v) c(v) = v1: n(v1, v) n*(v, v1) Canonical abstraction Top n Top c Top n c Top
27
Embedding Theorem: Conservatively Observing Properties
Top rTop rTop No Cycles No cycles (derived) 1/2 1 v1,v2: n(v1, v2) n*(v2, v1) v:c(v)
28
Operational Semantics
void push (int v) { Node x = malloc(sizeof(Node)); x->d = v; x->n = Top; Top = x; } Top n u1 u2 u3 int pop() { if (Top == NULL) return EMPTY; Node *s = Top->n; int r = Top->d; Top = s; return r; } s’(v) = v1: Top(v1) n(v1,v) s = Top->n Top n u1 u2 u3 s
29
? Abstract Semantics s = Topn s’(v) = v1: Top(v1) n(v1,v) Top Top
rTop s Top rTop rTop s’(v) = v1: Top(v1) n(v1,v) s = Top->n
30
Best Transformer (s = Topn)
rTop Top rTop Concrete Semantics Top s rTop Top rTop s’(v) = v1: Top(v1) n(v1,v) . . Canonical Abstraction ? Top s rTop Abstract Semantics Top s rTop Top rTop rTop
31
Semantic Reduction Improve the precision of the analysis by recovering properties of the program semantics A Galois connection (C, , , A) An operation op:AA is a semantic reduction when lL2 op(l)l and (op(l)) = (l) l op C A
32
The Focus Operation Focus: Formula((3-Struct) (3-Struct))
Generalizes materialization For every formula Focus()(X) yields structure in which evaluates to a definite values in all assignments Only maximal in terms of embedding Focus() is a semantic reduction But Focus()(X) may be undefined for some X
33
Partial Concretization Based on Transformer (s=Topn)
rTop Top rTop Abstract Semantics Top rTop Top s rTop s’(v) = v1: Top(v1) n(v1,v) Canonical Abstraction Focus (Topn) Partial Concretization u: top(u) n(u, v) Top s rTop Abstract Semantics Top rTop Top rTop s
34
Partial Concretization
Locally refine the abstract domain per statement Soundness is immediate Employed in other shape analysis algorithms [Distefano et.al., TACAS’06, Evan et.al., SAS’07, POPL’08]
35
The Coercion Principle
Another Semantic Reduction Can be applied after Focus or after Update or both Increase precision by exploiting some structural properties possessed by all stores (Global invariants) Structural properties captured by constraints Apply a constraint solver
36
Apply Constraint Solver
Top rTop Top rTop x Top rTop x x rx rx,ry n y x rx rx,ry n y
37
Sources of Constraints
Properties of the operational semantics Domain specific knowledge Instrumentation predicates User supplied
38
Example Constraints x(v1) x(v2)eq(v1, v2)
n(v, v1) n(v,v2)eq(v1, v2) n(v1, v) n(v2,v)eq(v1, v2)is(v) n*(v1, v2)t[n](v1, v2)
39
Abstract Transformers: Summary
Kleene evaluation yields sound solution Focus is a statement-specific partial concretization Coerce applies global constraints
40
Abstract Semantics { t_embed(coerce(st(w)3(focusF(w)(SS[w] ))))
if v = entry { <,> } { t_embed(coerce(st(w)3(focusF(w)(SS[w] )))) (w,v) E(G), w Assignments(G) { S | S SS[w] } othrewise (w,v) E(G), w Skip(G) SS [v] = { t_embed(S) | S coerce(st(w)3(focusF(w)(SS[w] ))) and S 3 cond(w)} (w,v) True-Branches(G) { t_embed(S) | S coerce(st(w)3(focusF(w)(SS[w] ))) and S 3 cond(w)} (w,v) False-Branches(G)
41
Recap Abstraction Transformers canonical abstraction
recording derived information Transformers partial concretization (focus) constraint solver (coerce) sound information extraction
42
Stack Push … v: x(v) v: x(v) v:c(v) v1,v2: n(v1, v2) Top(v2)
emp … Top void push (int v) { Node x = alloc(sizeof(Node)); xd = v; xn = Top; Top = x; } x x Top v: x(v) v: x(v) x x Top x Top x Top v1,v2: n(v1, v2) Top(v2) Top v:c(v) Top
43
What about procedures?
44
Procedural program void main() { int x; x = p(7); x = p(9); }
int p(int a) { return a + 1; }
45
Effect of procedures foo() bar() call bar() The effect of calling a procedure is the effect of executing its body
46
Interprocedural Analysis
foo() bar() call bar() goal: compute the abstract effect of calling a procedure
47
Reduction to intraprocedural analysis
Procedure inlining Naive solution: call-as-goto
48
Reminder: Constant Propagation
Variable not a constant -1 1 - … … No information Z
49
Reminder: Constant Propagation
L = (Var Z , ) 1 2 iff x: 1(x) ’ 2(x) ’ ordering in the Z lattice Examples: [x , y 42, z ] [x , y 42, z 73] [x , y 42, z 73] [x , y 42, z ] 13
50
Reminder: Constant Propagation
Conservative Solution Every detected constant is indeed constant But may fail to identify some constants Every potential impact is identified Superfluous impacts
51
Procedure Inlining void main() { int x; x = p(7); x = p(9); }
int p(int a) { return a + 1; }
52
Procedure Inlining void main() { int x; x = p(7); x = p(9); }
int p(int a) { return a + 1; } void main() { int a, x, ret; a = 7; ret = a+1; x = ret; a = 9; ret = a+1; x = ret; } [a ⊥, x ⊥, ret ⊥] [a 7, x 8, ret 8] [a 9, x 10, ret 10]
53
Procedure Inlining Pros Cons Simple Does not handle recursion
Exponential blow up Reanalyzing the body of procedures p1 { call p2 … } p2 { call p3 … } p3{ }
54
A Naive Interprocedural solution
Treat procedure calls as gotos
55
Simple Example int p(int a) { return a + 1; } void main() { int x ;
x = p(7); x = p(9) ; }
56
Simple Example int p(int a) { [a 7] return a + 1; } void main() {
int x ; x = p(7); x = p(9) ; }
57
Simple Example int p(int a) { [a 7] return a + 1; [a 7, $$ 8] }
void main() { int x ; x = p(7); x = p(9) ; }
58
Simple Example int p(int a) { [a 7] return a + 1; [a 7, $$ 8] }
void main() { int x ; x = p(7); [x 8] x = p(9) ; }
59
Simple Example int p(int a) { [a 7] return a + 1; [a 7, $$ 8] }
void main() { int x ; x = p(7); [x 8] x = p(9) ; }
60
Simple Example int p(int a) { [a 7] [a 9] return a + 1;
} void main() { int x ; x = p(7); [x 8] x = p(9) ; }
61
Simple Example int p(int a) { [a ] return a + 1; [a 7, $$ 8] }
void main() { int x ; x = p(7); [x 8] x = p(9) ; }
62
Simple Example int p(int a) { [a ] return a + 1; [a , $$ ] }
void main() { int x ; x = p(7); [x 8] x = p(9); }
63
Simple Example int p(int a) { [a ] return a + 1; [a , $$ ] }
void main() { int x ; x = p(7) ; [x ] x = p(9) ; }
64
A Naive Interprocedural solution
Treat procedure calls as gotos Pros: Simple Usually fast Cons: Abstract call/return correlations Obtain a conservative solution
65
why was the naive solution less precise?
analysis by reduction Procedure inlining Call-as-goto void main() { int x ; x = p(7) ; [x ] x = p(9) ; } int p(int a) { [a ] return a + 1; [a , $$ ] } void main() { int a, x, ret; a = 7; ret = a+1; x = ret; a = 9; ret = a+1; x = ret; } [a ⊥, x ⊥, ret ⊥] [a 7, x 8, ret 8] [a 9, x 10, ret 10] why was the naive solution less precise?
66
Stack regime R P P() { … R(); } Q() { … R(); } R(){ … } call call
return return R P
67
Guiding light Exploit stack regime Precision Efficiency
Main idea / guiding light
68
Simplifying Assumptions
Parameter passed by value No procedure nesting No concurrency Recursion is supported
69
Topics Covered Procedure Inlining The naive approach Valid paths
The callstring approach The Functional Approach IFDS: Interprocedural Analysis via Graph Reachability IDE: Beyond graph reachability The trivial modular approach
70
Join-Over-All-Paths (JOP)
Let paths(v) denote the potentially infinite set paths from start to v (written as sequences of edges) For a sequence of edges [e1, e2, …, en] define f [e1, e2, …, en]: L L by composing the effects of basic blocks f [e1, e2, …, en](l) = f(en) (… (f(e2) (f(e1) (l)) …) JOP[v] = {f [e1, e2, …,en]() | [e1, e2, …, en] paths(v)}
71
Join-Over-All-Paths (JOP)
Paths transformers: f[e1,e2,e3,e4] f[e1,e2,e7,e8] f[e5,e6,e7,e8] f[e5,e6,e3,e4] f[e1,e2,e3,e4,e9, e1,e2,e3,e4] f[e1,e2,e7,e8,e9, e1,e2,e3,e4,e9,…] … e1 e5 e9 e2 e6 e3 e7 JOP: f[e1,e2,e3,e4](initial) f[e1,e2,e7,e8](initial) f[e5,e6,e7,e8](initial) f[e5,e6,e3,e4](initial) … e4 e8 Number of program paths is unbounded due to loops 9
72
The lfp computation approximates JOP
JOP[v] = {f [e1, e2, …,en]() | [e1, e2, …, en] paths(v)} LFP[v] = {f [e](LFP[v’]) | e =(v’,v)} JOP LFP - for a monotone function f(x y) f(x) f(y) JOP = LFP - for a distributive function f(x y) = f(x) f(y) LFP[v0] = JOP may not be precise enough for interprocedural analysis!
73
Interprocedural analysis
Q P R Entry node sP sQ sR Call node fc2e f1 Call node fc2e f1 f1 f1 n4 n6 n1 n2 Call Q Call Q f2 f3 f1 n5 n3 n7 Return node Return node fx2r f5 f3 fx2r f6 eP eR eQ Supergraph Exit node
74
Paths paths(n) the set of paths from s to n
( (s,n1), (n1,n3), (n3,n1) ) f1 f1 n1 n2 f2 f3 f1 n3 f3 e
75
Interprocedural Valid Paths
f1 callq ret f2 fk-1 fk ( ) f3 fk-2 enterq exitq f4 fk-3 f5 IVP: all paths with matching calls and returns And prefixes
76
Interprocedural Valid Paths
IVP set of paths Start at program entry Only considers matching calls and returns aka, valid Can be defined via context free grammar matched ::= matched (i matched )i | ε valid ::= valid (i matched | matched paths can be defined by a regular expression
77
Join Over All Paths (JOP)
start n ⟦fk o ... o f1⟧ L L JOP[v] = {[[e1, e2, …,en]]() | (e1, …, en) paths(v)} JOP LFP Sometimes JOP = LFP precise up to “symbolic execution” Distributive problem
78
The Join-Over-Valid-Paths (JVP)
vpaths(n) all valid paths from program start to n JVP[n] = {[[e1, e2, …, e]]() | (e1, e2, …, e) vpaths(n)} JVP JOP In some cases the JVP can be computed (Distributive problem)
79
The Call String Approach
The data flow value is associated with sequences of calls (call string) Use Chaotic iterations over the supergraph
80
supergraph Q P R sP sQ sR n4 n6 n1 n2 Call Q Call Q n5 n3 n7 eP eR eQ
Entry node sP sQ sR Call node fc2e f1 Call node fc2e f1 f1 f1 n4 n6 n1 n2 Call Q Call Q f2 f3 f1 n5 n3 n7 Return node Return node fx2r f5 f3 fx2r f6 eP eR eQ Exit node
81
Simple Example int p(int a) { void main() { int x ; return a + 1;
} void main() { int x ; c1: x = p(7); c2: x = p(9) ; }
82
Simple Example int p(int a) { c1: [a 7] return a + 1; } void main() {
int x ; c1: x = p(7); c2: x = p(9) ; }
83
Simple Example int p(int a) { c1: [a 7] return a + 1;
} void main() { int x ; c1: x = p(7); c2: x = p(9) ; }
84
Simple Example int p(int a) { c1: [a 7] return a + 1;
} void main() { int x ; c1: x = p(7); ε: x 8 c2: x = p(9) ; }
85
Simple Example int p(int a) { c1:[a 7] return a + 1; c1:[a 7, $$ 8]
} void main() { int x ; c1: x = p(7); ε: [x 8] c2: x = p(9) ; }
86
Simple Example int p(int a) { c1:[a 7] c2:[a 9] return a + 1;
} void main() { int x ; c1: x = p(7); ε : [x 8] c2: x = p(9) ; }
87
Simple Example int p(int a) { c1:[a 7] c2:[a 9] return a + 1;
} void main() { int x ; c1: x = p(7); ε : [x 8] c2: x = p(9) ; }
88
Simple Example int p(int a) { c1:[a 7] c2:[a 9] return a + 1;
} void main() { int x ; c1: x = p(7); ε : [x 8] c2: x = p(9) ; ε : [x 10] }
89
The Call String Approach
The data flow value is associated with sequences of calls (call string) Use Chaotic iterations over the supergraph To guarantee termination limit the size of call string (typically 1 or 2) Represents tails of calls Abstract inline
90
Another Example (|cs|=2)
void main() { int x ; c1: x = p(7); ε : [x 16] c2: x = p(9) ; ε :: [x 20] } int p(int a) { c1:[a 7] c2:[a 9] return c3: p1(a + 1); c1:[a 7, $$ 16] c2:[a 9, $$ 20] } int p1(int b) { c1.c3:[b 8] c2.c3:[b 10] return 2 * b; c1.c3:[b 8,$$16] c2.c3:[b 10,$$20] }
91
Another Example (|cs|=1)
void main() { int x ; c1: x = p(7); ε : [x ] c2: x = p(9) ; } int p(int a) { c1:[a 7] c2:[a 9] return c3: p1(a + 1); c1:[a 7, $$ ] c2:[a 9, $$ ] } int p1(int b) { (c1|c2)c3:[b ] return 2 * b; (c1|c2)c3:[b , $$] }
92
Handling Recursion void main() { c1: p(7); ε : [x ] } int p(int a) {
c1: [a 7] c1.c2+: [a ] if (…) { c1: [a 7] c1.c2+: [a ] a = a -1 ; c1: [a 6] c1.c2+: [a ] c2: p (a); c1.c2*: [a ] a = a + 1; } x = -2*a + 5; c1.c2*: [a , x]
93
Summary Call String Easy to implement
Efficient for very small call strings Limited precision Often loses precision for recursive programs For finite domains can be precise even with recursion (with a bounded callstring) Order of calls can be abstracted Related method: procedure cloning
94
The Functional Approach
The meaning of a procedure is mapping from states into states The abstract meaning of a procedure is function from an abstract state to abstract states Relation between input and output In certain cases can compute JVP
95
The Functional Approach
Two phase algorithm Compute the dataflow solution at the exit of a procedure as a function of the initial values at the procedure entry (functional values) Compute the dataflow values at every point using the functional values
96
Phase 1 void main() { int p(int a) { p(7); if (…) { [a a0, x x0] }
p (a); a = a + 1; } x = -2*a + 5; [a a0, x x0] [a a0, x x0] [a a0-1, x x0] [a a0-1, x -2a0+7] [a a0, x -2a0+7] [a a0, x x0] [a a0, x ] [a a0, x -2*a0+5] p(a0,x0) = [a a0, x -2a0 + 5]
97
Phase 2 void main() { p(7); } int p(int a) { if (…) { a = a -1 ;
p (a); a = a + 1; } x = -2*a + 5; [a 7, x 0] [a , x 0] [x -9] [a 7, x 0] [a , x 0] [a 6, x 0] [a 6, x 0] [a , x 0] [a 6, x -7] [a , x ] [a 7, x -7] [a , x ] p(a0,x0) = [a a0, x -2a0 + 5] [a 7, x 0] [a , x ] [a 7, x -9] [a , x ]
98
Summary Functional approach
Computes procedure abstraction Sharing between different contexts Rather precise Recursive procedures may be more precise/efficient than loops But requires more from the implementation Representing (input/output) relations Composing relations
99
Issues in Functional Approach
How to guarantee that finite height for functional lattice? It may happen that L has finite height and yet the lattice of monotonic function from L to L do not Efficiently represent functions Functional join Functional composition Testing equality
100
Tabulation fstart,start= (,) ; otherwise (,)
Special case: L is finite Data facts: d L L Initialization: fstart,start= (,) ; otherwise (,) S[start, ] = Propagation of (x,y) over edge e = (n,n’) Maintain summary: S[n’,x] = S[n’,x] n (y)) n intra-node: n’: (x, n (y)) n call-node: n’: (y,y) if S[n’,y] = and n’= entry node n’: (x,z) if S[exit(call(n),y] = z and n’= ret-site-of n n return-node: n’: (u,y) ; nc = call-site-of n’, S[nc,u]=x
101
CFL-Graph reachability
Special cases of functional analysis Finite distributive lattices Provides more efficient analysis algorithms Reduce the interprocedural analysis problem to finding context free reachability
102
IDFS / IDE IDFS Interprocedural Distributive Finite Subset Precise interprocedural dataflow analysis via graph reachability. Reps, Horowitz, and Sagiv, POPL’95 IDE Interprocedural Distributive Environment Precise interprocedural dataflow analysis with applications to constant propagation. Reps, Horowitz, and Sagiv, FASE’95, TCS’96 More general solutions exist
103
Possibly Uninitialized Variables
{} Start {w,x,y} x = 3 {w,y} if . . . {w,y} y = x {w,y} y = w {w} w = 8 {w,y} {} printf(y) {w,y}
104
IFDS Problems Finite subset distributive
Lattice L = (D) is is Transfer functions are distributive Efficient solution through formulation as CFL reachability
105
Encoding Transfer Functions
Enumerate all input space and output space Represent functions as graphs with 2(D+1) nodes Special symbol “0” denotes empty sets (sometimes denoted ) Example: D = { a, b, c } f(S) = (S – {a}) U {b} a b c 207 a b c
106
Efficiently Representing Functions
Let f:2D2D be a distributive function Then: f(X) = f() ( { f({z}) | z X }) f(X) = f() ( { f({z}) \ f() | z X })
107
Representing Dataflow Functions
b c Identity Function a b c Constant Function
108
Representing Dataflow Functions
b c “Gen/Kill” Function a b c Non-“Gen/Kill” Function
109
x y a b start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p
110
Composing Dataflow Functions
b c a b c a a b c
111
( ) ( ] YES! NO! x y start p(a,b) a b start main if . . . x = 3
Might y be uninitialized here? Might b be uninitialized here? b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p
112
The Tabulation Algorithm
Worklist algorithm, start from entry of “main” Keep track of Path edges: matched paren paths from procedure entry Summary edges: matched paren call-return paths At each instruction Propagate facts using transfer functions; extend path edges At each call Propagate to procedure entry, start with an empty path If a summary for that entry exits, use it At each exit Store paths from corresponding call points as summary paths When a new summary is added, propagate to the return node
113
Interprocedural Dataflow Analysis via CFL-Reachability
Graph: Exploded control-flow graph L: L(unbalLeft) unbalLeft = valid Fact d holds at n iff there is an L(unbalLeft)-path from
114
Asymptotic Running Time
CFL-reachability Exploded control-flow graph: ND nodes Running time: O(N3D3) Exploded control-flow graph Special structure Running time: O(ED3) Typically: E N, hence O(ED3) O(ND3) “Gen/kill” problems: O(ED)
115
IDE Goes beyond IFDS problems Requires special form of the domain
Can handle unbounded domains Requires special form of the domain Can be much more efficient than IFDS
116
Example Linear Constant Propagation
Consider the constant propagation lattice The value of every variable y at the program exit can be represented by: y = {(axx + bx )| x Var* } c ax ,c Z {, } bx Z Supports efficient composition and “functional” join [z := a * y + b] What about [z:=x+y]?
117
Linear constant propagation
Point-wise representation of environment transformers
118
IDE Analysis Point-wise representation closed under composition
CFL-Reachability on the exploded graph Compose functions
121
Costs O(ED3) Class of value transformers F LL
idF Finite height Representation scheme with (efficient) Application Composition Join Equality Storage
122
Conclusion Handling functions is crucial for abstract interpretation
Virtual functions and exceptions complicate things But scalability is an issue Small call strings Small functional domains Demand analysis
123
Challenges in Interprocedural Analysis
Respect call-return mechanism Handling recursion Local variables Parameter passing mechanisms The called procedure is not always known The source code of the called procedure is not always available
124
A trivial treatment of procedure
Analyze a single procedure After every call continue with conservative information Global variables and local variables which “may be modified by the call” have unknown values Can be easily implemented Procedures can be written in different languages Procedure inline can help
125
Disadvantages of the trivial solution
Modular (object oriented and functional) programming encourages small frequently called procedures Almost all information is lost
126
Bibliography Textbook 2.5
Patrick Cousot & Radhia Cousot. Static determination of dynamic properties of recursive procedures In IFIP Conference on Formal Description of Programming Concepts, E.J. Neuhold, (Ed.), pages , St-Andrews, N.B., Canada, North-Holland Publishing Company (1978). Two Approaches to interprocedural analysis by Micha Sharir and Amir Pnueli IDFS Interprocedural Distributive Finite Subset Precise interprocedural dataflow analysis via graph reachability. Reps, Horowitz, and Sagiv, POPL’95 IDE Interprocedural Distributive Environment Precise interprocedural dataflow analysis with applications to constant propagation. Sagiv, Reps, Horowitz, and TCS’96
127
A Semantics for Procedure Local Heaps and its Abstractions
Noam Rinetzky Tel Aviv University Jörg Bauer Universität des Saarlandes Thomas Reps University of Wisconsin Mooly Sagiv Tel Aviv University Reinhard Wilhelm Universität des Saarlandes
128
Motivation Interprocedural shape analysis Challenge
Conservative static pointer analysis Heap intensive programs Imperative programs with procedures Recursive data structures Challenge Destructive update Localized effect of procedures
129
Main idea Local heaps x x x y t g x call p(x); y g t
130
Main idea Local heaps Cutpoints x x x y t g x call p(x); y g t
131
Main Results Concrete operational semantics Abstractions Large step
Functional analysis Storeless Shape abstractions Local heap Observationally equivalent to “standard” semantics Java and “clean” C Abstractions Shape analysis [Sagiv, Reps, Wilhelm, TOPLAS ‘02] May-alias [Deutsch, PLDI ‘94] …
132
Outline Motivating example Why semantics
Local heaps Cutpoints Why semantics Local heap storeless semantics Shape abstraction
133
Example … static void main() { } static List reverse(List t) { } n t n
q n q List x = reverse(p); n p x t r n t r n List y = reverse(q); List z = reverse(x); return r;
134
Example static void main() { } static List reverse(List t) { }
List x = reverse(p); n t n p x n p x n t n n q List y = reverse(q); n t r n t r n q y List z = reverse(x); return r;
135
Example n t n static void main() { } static List reverse(List t) { }
List x = reverse(p); n t List y = reverse(q); n p x n p x n t q y n q y n List z = reverse(x); p n x z t r n t r n return r;
136
Cutpoints Separating objects Not pointed-to by a parameter
137
Cutpoints proc(x) Separating objects Not pointed-to by a parameter
Stack sharing
138
Cutpoints proc(x) proc(x) Separating objects
Not pointed-to by a parameter proc(x) proc(x) n n n n n n n n n n x p x n n y Stack sharing Heap sharing
139
Cutpoints proc(x) proc(x) Separating objects
Not pointed-to by a parameter Capture external sharing patterns proc(x) proc(x) n n n n n n n n n n x p x n n y Stack sharing Heap sharing
140
Example static void main() { } static List reverse(List t) { }
List x = reverse(p); List y = reverse(q); n p x n t n n n p x q y n q y n List z = reverse(x); p n z x r t n r t n return r;
141
Outline Motivating example Why semantics
Local heap storeless semantics Shape abstraction
142
Abstract Interpretation [Cousot and Cousot, POPL ’77]
Operational semantics Abstract transformer
143
Introducing local heap semantics
Operational semantics ~ Local heap Operational semantics ’ ’ Abstract transformer
144
Outline Motivating example Why semantics
Local heap storeless semantics Shape abstraction
145
Programming model Single threaded Procedures Heap Value parameters
Recursion Heap Recursive data structures Destructive update No explicit addressing (&) No pointer arithmetic
146
Simplifying assumptions
No primitive values (only references) No globals Formals not modified
147
Storeless semantics No addresses Memory state: y=x Alias analysis
Object: 2Access paths Heap: 2Object Alias analysis x n x x.n x.n.n y=x x n y x.n y.n x.n.n y.n.n x=null y n y.n y.n.n
148
Example p? static void main() { } static List reverse(List t) {
return r; } List x = reverse(p); t n List y = reverse(q); t.n.n.n t.n.n t.n t t.n.n n t.n.n.n t t.n n n n p x.n.n.n p x.n.n x.n x x y.n.n q n y y.n y.n.n q n y y.n List z = reverse(x); z.n n z x z.n.n.n z.n.n z x r.n n r t r.n.n.n r.n.n r.n n r t r.n.n.n r.n.n p?
149
Example static void main() { } static List reverse(List t) { return r;
List x = reverse(p); L t n List y = reverse(q); t.n.n.n L t.n.n t.n t t.n.n n t.n.n.n L t t.n n n n p x.n.n.n p x.n.n x.n x x y.n.n q n y y.n y.n.n q n y y.n List z = reverse(x); p.n z.n n p z x p.n.n.n z.n.n.n p.n.n z.n.n p z x p.n p p.n.n p.n.n.n L.n r.n n L r t L.n.n.n r.n.n.n L.n.n r.n.n t L.n r.n n L r t L.n.n.n r.n.n.n L.n.n r.n.n t
150
Cutpoint labels Relate pre-state with post-state Additional roots
Mark cutpoints at and throughout an invocation
151
Cutpoint labels L {t.n.n.n} t L
Cutpoint label: the set of access paths that point to a cutpoint when the invoked procedure starts t.n.n.n L t.n.n t.n t L t L {t.n.n.n}
152
Sharing patterns L {t.n.n.n} Cutpoint labels encode sharing patterns
w.n w w p Stack sharing Heap sharing L {t.n.n.n}
153
Observational equivalence
L L (Local-heap Storeless Semantics) G G (Global-heap Store-based Semantics) L and G observationally equivalent when for every access paths AP1, AP2 AP1 = AP2 (L) AP1 = AP2 (G)
154
Main theorem: semantic equivalence
L L (Local-heap Storeless Semantics) G G (Global-heap Store-based Semantics) L and G observationally equivalent st, L ’L st, G ’G LSL GSB ’L and ’G are observationally equivalent
155
Corollaries Preservation of invariants Detection of memory leaks
Assertions: AP1 = AP2 Detection of memory leaks
156
Applications Develop new static analyses
Shape analysis Justify soundness of existing analyses
157
Related work Storeless semantics Jonkers, Algorithmic Languages ‘81
Deutsch, ICCL ‘92
158
Shape abstraction Shape descriptors represent unbounded memory states
Conservatively In a bounded way Two dimensions Local heap (objects) Sharing pattern (cutpoint labels)
159
A Shape abstraction L={t.n.n.n} r n n n t L r L r.n L.n r.n.n L.n.n
t, r.n.n.n L.n.n.n t L
160
A Shape abstraction L=* r n n n t L r L r.n L.n r.n.n L.n.n t, r.n.n.n
L.n.n.n t L
161
A Shape abstraction L t L=* n L=* r n n n t L r t, r.n L.n r.n r L r.n
162
A Shape abstraction L r t, r.n L.n r.n t L=* n
163
A Shape abstraction L={t.n.n.n} r n n n t L L=* n t L r L r.n L.n
L.n.n t, r.n.n.n L.n.n.n t L L r t, r.n L.n r.n t L=* n
164
A Shape abstraction L1={t.n.n.n} L2={g.n.n.n} d n n n g L2 r n n n t
L2.n d.n.n L2.n.n g, d.n.n.n L2.n.n.n g L2 r n n n r L1 r.n L1.n r.n.n L1.n.n t, r.n.n.n L1.n.n.n t L1 L=* n d n n d L d.n L.n t, d.n L.n t n L n n r L r.n L.n t, r.n L.n t r
165
Cutpoint-Freedom
166
How to tabulate procedures?
Procedure input/output relation Not reachable Not effected proc: local (reachable) heap local heap main() { append(y,z); } p q append(List p, List q) { … } p q x n t x n t n y z y n z p q n n
167
How to handle sharing? External sharing may break the functional view
main() { append(y,z); } p q n append(List p, List q) { … } p q n n n y t t x z y n z p q n x n
168
What’s the difference? n n n n x y append(y,z); append(y,z); t t y z x
1st Example 2nd Example append(y,z); append(y,z); x n t n n n y y t z x z
169
Cutpoints An object is a cutpoint for an invocation
Reachable from actual parameters Not pointed to by an actual parameter Reachable without going through a parameter append(y,z) append(y,z) n n n n y y n n t t x z z
170
Cutpoint freedom Cutpoint-free Invocation: has no cutpoints
Execution: every invocation is cutpoint-free Program: every execution is cutpoint-free append(y,z) append(y,z) n n n n y x t x t y z z
171
Interprocedural shape analysis for cutpoint-free programs using 3-Valued Shape Analysis
172
Memory states: 2-Valued Logical Structure
A memory state encodes a local heap Local variables of the current procedure invocation Relevant part of the heap Relevant Reachable main append p q n n x t y z
173
Memory states Represented by first-order logical structures Predicate
Meaning x(v) Variable x points to v n(v1,v2) Field n of object v1 points to v2
174
Memory states Represented by first-order logical structures Predicate
Meaning x(v) Variable x points to v n(v1,v2) Field n of object v1 points to v2 p u1 1 u2 q u1 u2 1 n u1 u2 1 p q n u1 u2
175
Operational semantics
Statements modify values of predicates Specified by predicate-update formulae Formulae in FO-TC
176
Procedure calls append(p,q) 1. Verify cutpoint freedom Compute input
z x n p q 1. Verify cutpoint freedom Compute input … Execute callee … 3 Combine output append(y,z) append body p n q y z x n
177
Procedure call: 1. Verifying cutpoint-freedom
An object is a cutpoint for an invocation Reachable from actual parameters Not pointed to by an actual parameter Reachable without going through a parameter append(y,z) append(y,z) n n n n y y x n z x z n t t Cutpoint free Not Cutpoint free
178
Procedure call: 1. Verifying cutpoint-freedom
Invoking append(y,z) in main R{y,z}(v)=v1:y(v1)n*(v1,v) v1:z(v1)n*(v1,v) isCPmain,{y,z}(v)= R{y,z}(v) (y(v)z(v1)) ( x(v) t(v) v1: R{y,z}(v1)n(v1,v)) (main’s locals: x,y,z,t) n n n n y y x n z x z n t t Cutpoint free Not Cutpoint free
179
Procedure call: 2. Computing the input local heap
Retain only reachable objects Bind formal parameters Call state p n q Input state n n y x z n t
180
Procedure body: append(p,q)
Input state Output state p n q n n p q
181
Procedure call: 3. Combine output
Call state Output state n n p n q n y x z n t
182
Procedure call: 3. Combine output
Call state Output state n n n n n y p x z q n t Auxiliary predicates y n z x t inUc(v) inUx(v)
183
Observational equivalence
CPF CPF (Cutpoint free semantics) GSB GSB (Standard semantics) CPF and GSB observationally equivalent when for every access paths AP1, AP2 AP1 = AP2 (CPF) AP1 = AP2 (GSB)
184
Observational equivalence
For cutpoint free programs: CPF CPF (Cutpoint free semantics) GSB GSB (Standard semantics) CPF and GSB observationally equivalent It holds that st, CPF ’CPF st, GSB ’GSB ’CPF and ’GSB are observationally equivalent
185
Introducing local heap semantics
Operational semantics ~ Local heap Operational semantics ’ ’ Abstract transformer
186
Shape abstraction Abstract memory states represent unbounded concrete memory states Conservatively In a bounded way Using 3-valued logical structures
187
3-Valued logic 1 = true 0 = false 1/2 = unknown
A join semi-lattice, 0 1 = 1/2
188
Canonical abstraction
y z n n n n n x n n t y z x n t
189
Instrumentation predicates
Record derived properties Refine the abstraction Instrumentation principle [SRW, TOPLAS’02] Reachability is central! Predicate Meaning rx(v) v is reachable from variable x robj(v1,v2) v2 is reachable from v1 ils(v) v is heap-shared c(v) v resides on a cycle
190
Abstract memory states (with reachability)
z n n n n n x rx rx rx rx rx rx rx,ry rx,ry rz rz rz rz rz rz n n t rt rt rt rt rt rt y rx rx,ry n z rz x rt t
191
The importance of reachability: Call append(y,z)
x rx rx rx rx,ry rz rz rz n n t rt rt rt y z n y z x n t n n n n x rx rx rx,ry rz rz n n t rt rt
192
Abstract semantics Conservatively apply statements on abstract memory states Same formulae as in concrete semantics Soundness guaranteed [SRW, TOPLAS’02]
193
Procedure calls append(p,q) 1. Verify cutpoint freedom Compute input
z x n p q 1. Verify cutpoint freedom Compute input 2 … Execute callee … 3 Combine output append(y,z) append body p n q y z x n
194
Conservative verification of cutpoint-freedom
Invoking append(y,z) in main R{y,z}(v)=v1:y(v1)n*(v1,v) v1:z(v1)n*(v1,v) isCPmain,{y,z}(v)= R{y,z}(v) (y(v)z(v1)) ( x(v) t(v) v1: R{y,z}(v1)n(v1,v)) n n n n n y ry ry rz y ry ry ry rx rz n n n z n x z rt rt t rt rt t Cutpoint free Not Cutpoint free
195
Interprocedural shape analysis
Tabulation exits p p call f(x) y x x y
196
Interprocedural shape analysis
Analyze f p p Tabulation exits p p call f(x) y x x y
197
Interprocedural shape analysis
Procedure input/output relation Input Output q q rq rq p q p q n rp rq rp rq rp p n n … q p q n n n rp rp rq rp rp rp rq
198
Interprocedural shape analysis
Reusable procedure summaries Heap modularity q p rp rq n y n z x append(y,z) rx ry rz y x z n rx ry rz append(y,z) g h i k n rg rh ri rk append(h,i)
199
Plan Cutpoint freedom Non-standard concrete semantics
Interprocedural shape analysis Prototype implementation
200
Prototype implementation
TVLA based analyzer Soot-based Java front-end Parametric abstraction Data structure Verified properties Singly linked list Cleanness, acyclicity Sorting (of SLL) + Sortedness Unshared binary trees Cleaness, tree-ness
201
Iterative vs. Recursive (SLL)
585
202
Inline vs. Procedural abstraction
// Allocates a list of // length 3 List create3(){ … } main() { List x1 = create3(); List x2 = create3(); List x3 = create3(); List x4 = create3();
203
Call string vs. Relational vs
Call string vs. Relational vs. CPF [Rinetzky and Sagiv, CC’01] [Jeannet et al., SAS’04]
204
Summary Cutpoint freedom Non-standard operational semantics
Interprocedural shape analysis Partial correctness of quicksort Prototype implementation
205
Application Properties proved Recursive Iterative
Absence of null dereferences Listness preservation API conformance Recursive Iterative Procedural abstraction
206
Related Work Interprocedural shape analysis Local Reasoning
Rinetzky and Sagiv, CC ’01 Chong and Rugina, SAS ’03 Jeannet et al., SAS ’04 Hackett and Rugina, POPL ’05 Rinetzky et al., POPL ‘05 Local Reasoning Ishtiaq and O’Hearn, POPL ‘01 Reynolds, LICS ’02 Encapsulation Noble et al. IWACO ’03 ...
207
Summary Operational semantics Applications Storeless Local heap
Cutpoints Equivalence theorem Applications Shape analysis May-alias analysis
208
Project 1-2 Students in a group Theoretical + Practical
3-4: Bigger projects Theoretical + Practical Your choice of topic Contact me in 2 weeks Submission – 15/Sep Code + Examples Document 20 minutes presentation
209
Past projects JavaScript Dominator Analysis
Attribute Analysis for JavaScript Simple Pointer Analysis for C Adding program counters to Past Abstraction Verification of Asynchronous programs Verifying SDNs using TVLA Verifying independent accesses to arrays in GO
210
Past projects Detecting index out of bound errors in C programs
Lattice-Based Semantics for Combinatorial Models Evolution
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.