Interprocedural Analysis Noam Rinetzky Mooly Sagiv Tel Aviv University Textbook Chapter 2.5
Outline u Challenges in interprocedural analysis u The trivial solution u Why isn’t it adequate u Simplifying assumptions u A naive solution u Join over valid paths u The call-string approach u The functional approach –A case study linear constant propagation –Context free reachability u Modularity issues u Other solutions
Challenges in Interprocedural Analysis u Respect call-return mechanism u Handling recursion u Local variables u Parameter passing mechanisms: value, value- result, reference, by name u Procedure nesting u The called procedure is not always known u The source code of the called procedure is not always available –separate compilation –vendor code –...
Extended Syntax of While a := x | n | a 1 op a a 2 b := true | false | not b | b 1 op b b 2 | a 1 op r a 2 S := [x := a] l | [call p(a, z)] l l’ | [skip] l | S 1 ; S 2 | if [b] l then S 1 else S 2 | while [b] l do S P := begin D S end D := proc id(val id*, res id*) is l S end l’ | D D
A Trivial treatment of procedures u Analyze a single procedure u After every call continue with conservative information –Global variables and local variables which “may be modified by the call” are mapped to
A Trivial treatment of procedures begin proc p() is 1 [x := 1] 2 end 3 [call p()] 4 5 [print x] 6 end [a , x ] [a , x 1] [x 0] [x][x] [x][x]
Advantages of the trivial solution u Can be easily implemented u Procedures can be written in different languages u Procedure inline can help u Side-effect analysis can help
Disadvantages of the trivial solution u Modular (object oriented and functional) programming encourages small frequently called procedures u Optimization –Modern machines allows the compiler to schedule many instructions in parallel –Need to optimize many instructions –Inline can be a bad solution u Software engineering – Many bugs result from interface misuse –Procedures define partial functions
Simplifying Assumptions u All the code is available u Simple parameter passing u The called procedure is syntactically known u No nesting u Procedure names are syntactically different from variables u Procedures are uniquely defined u Recursion is supported
Constant Example begin proc p() is 1 if [b] 2 then ( [a := a -1] 3 [call p()] 4 5 [a := a + 1] 6 ) [x := -2* a + 5] 7 end 8 [a=7] 9 ; [call p()] ; [print(x)] 12 end
A naive Interprocedural solution u Treat procedure calls as gotos u Obtain a conservative solution u Find the least fixed point of the system: u Use Chaotic iterations DF entry (s) = DF entry (v) = {f(e)(DFentry(u) : (u, v) E}
Simple Example begin proc p() is 1 [x := a + 1] 2 end 3 [a=7] 4 [call p()] 5 6 [print x] 7 [a=9] 8 [call p()] 9 10 [print a] 11 end proc p x=a+1 end a=7 call p 5 call p 6 print x a=9 call p 9 call p 10 print a [x 0, a 0] [x 0, a 7] [x 8, a 7] [x 8, a 9] [x , a ]
Simple Example begin proc p() is 1 [x := a + 1] 2 end 3 [a=7] 4 [call p()] 5 6 [print x] 7 [a=9] 8 [call p()] 9 10 [print a] 11 end proc p x=a+1 end a=7 call p 5 call p 6 print x a=9 call p 9 call p 10 print a [x 0, a 0] [x 0, a 7] [x , a ] [x , a 9] [x , a ]
We want something better … u Let paths(v) denote the potentially infinite set paths from start to v (written as sequences of labels) u For a sequence of edges [e 1, e 2, …, e n ] define f [e 1, e 2, …, e n ]: L L by composing the effects of basic blocks f [e 1, e 2, …, e n ](l) = f(e n ) (… (f(e 2 ) (f(e 1 ) (l)) …) u JOP[v] = {f[e 1, e 2, …,e n ]( ) [e 1, e 2, …, e n ] paths(v)}
Valid Paths () f1f1 f2f2 f k-1 fkfk f3f3 f4f4 f5f5 f k-2 f k-3 call q enter q exit q ret
void p() { if (...) { x = x + 1; p(); // p_calls_p1 x = x - 1; } return; } Invalid Path int x; void main() { x = 5; p(); return; }
A More Precise Solution u Only considers matching calls and returns (valid) u Can be defined via context free grammar u Every call is a different letter u Matching calls and returns Matched | Matched Matched |( c Matched ) c for all [call p()] lc lr in P Valid Matched | l c Valid for all [call p()] lc lr in P
A More Precise Solution u Only considers matching calls and returns (valid) u Can be defined via context free grammar u Every call is a different letter u Matching calls and returns Intra | (l i,l j ) Intra for all l i, l j in Lab*\ Lab IP Matched | Intra | Matched Matched | (l c,l n ) Matched (l x,l r ) for all [call p()] lc lr and p is ln S lx Valid Matched | (l c,l n) Valid for all [call p()] lc lr for all [call p()] lc lr and p is ln S lx Let Lab* = all the labels in the program Lab IP ={l c,l r : [call p()] lc lr in the program}
The Join-Over-Valid-Paths (JVP) u For a sequence of edges [e 1, e 2, …, e n ] define f [e 1, e 2, …, e n ]: L L by composing the effects of basic statements –f[](s)=s –f [e, p](s) = f[p] (f e (s)) u JVP l = {f[e 1, e 2, …, e]( ) [e 1, e 2, …, e] vpaths(l), e = (*,l)} u Compute a safe approximation to JVP u In some cases the JVP can be computed –Distributivity of f –Functional representation
The Call String Approach for Approximating JVP u No assumptions u Record at every node a pair (l, c) where l L is the dataflow information and c is a suffix of unmatched calls u Use Chaotic iterations u To guarantee termination limit the size of c (typically 1 or 2) u Emulates inline (but no code growth) u Exponential in C u For a finite lattice there exists a C which leads to join over all valid paths
Simple Example begin proc p() is 1 [x := a + 1] 2 end 3 [a=7] 4 [call p()] 5 6 [print x] 7 [a=9] 8 [call p()] 9 10 [print a] 11 end proc p x=a+1 end a=7 call p 5 call p 6 print x a=9 call p 9 call p 10 print a [x 0, a 0] [x 0, a 7] 5,[x 0, a 7] 5,[x 8, a 7] [x 8, a 7] [x 8, a 9] 9,[x 8, a 9] 9,[x 10, a 9] [x 10, a 9] 5,[x 8, a 7] 9,[x 10, a 9]
begin 0 proc p() is 1 if [b] 2 then ( [a := a -1] 3 [call p()] 4 5 [a := a + 1] 6 ) [x := -2* a + 5] 7 end 8 [a=7] 9 ; [call p()] ; [print(x)] 12 end 13 Recursive Example a=7 Call p 10 Call p 11 print(x) p If( … ) a=a-1 Call p 4 Call p 5 a=a+1 x=-2a+5 end 10:[x 0, a 7] [x 0, a 7] [x 0, a 0] 10:[x 0, a 7] 10:[x 0, a 6] 4:[x 0, a 6] 4:[x -7, a 6] 10:[x -7, a 6] 4:[x -7, a 6] 4:[x -7, a 7] 4:[x , a ]
The Functional Approach u The meaning of a function is mapping from states into states u The abstract meaning of a function is function from an abstract state to abstract states
begin proc p() is 1 if [b] 2 then ( [a := a -1] 3 [call p()] 4 5 [a := a + 1] 6 ) [x := -2* a + 5] 7 end 8 [a=7] 9 ; [call p()] ; [print(x)] 12 end Motivating Example a=7 Call p 10 Call p 11 print(x) p If( … ) a=a-1 Call p 4 Call p 5 a=a+1 x=-2a+5 end [x 0, a 7] [x 0, a 0] e.[x -2e(a)+5, a e(a)] [x -9, a 7]
begin proc p() is 1 if [b] 2 then ( [a := a -1] 3 [call p()] 4 5 [a := a + 1] 6 ) [x := -2* a + 5] 7 end 8 [read(a)] 9 ; [call p()] ; [print(x)] 12 end Motivating Example read(a) Call p 10 Call p 11 print(x) p If( … ) a=a-1 Call p 4 Call p 5 a=a+1 x=-2a+5 end [x 0, a ] [x 0, a 0] e.[x -2e(a)+5, a e(a)] [x , a ]
The Functional Approach u Main idea: Iterate on the abstract domain of functions from L to L u Two phase algorithm –Compute the dataflow solution at the exit of a procedure as a function of the initial values at the procedure entry (functional values) –Compute the dataflow values at every point using the functional values u Can compute the JVP
Example: Constant propagation u L = Var N { , } u Domain: F:L L –(f 1 f 2 )(x) = f 1 (x) f 2 (x) x=7 y=x+1 x=y env.env[x 7] env.env[y env(x)+1] Id= env L.env env.env[x 7] ○ env.env env.env[y env(x)+1] ○ env.env[x 7] ○ env.env
Example: Constant propagation u L = Var N { , } u Domain: F:L L –(f 1 f 2 )(x) = f 1 (x) f 2 (x) x=7y=x+1 x=y env.env[x 7] env.env[y env(x)+1] Id= env.env env.env[y env(x)+1] env.env[x 7]
Running Example 1 init a=7 9 Call p 10 Call p 11 print(x) 12 p1p1 If( … ) 2 a=a-1 3 Call p 4 Call p 5 a=a+1 6 x=-2a+5 7 end 8 begin 0 end 13 NFunction 0 e.[x e(x), a e(a)]=id e.
Running Example 1 NFunction 1 e. [x e(x), a e(a)]=id 2id 7 8 e.[x -2e(a)+5, a e(a)] 3id 4 e.[x e(x), a e(a)-1] 5 f 8 ○ e.[x e(x), a e(a)-1] = e.[x -2(e(a)-1)+5, a e(a)-1] 6 7 e.[x -2(e(a)-1)+5, a e(a)] e.[x e(x), a e(a)] 8 a, x.[x -2e(a)+5, a e(a)] a=7 9 Call p 10 Call p 11 print(x) 12 p1p1 If( … ) 2 a=a-1 3 Call p 4 Call p 5 a=a+1 6 x=-2a+5 7 end 8 begin 0 end 13 0 e.[x e(x), a e(a)]=id 10 e.[x e(x), a 7] 11 a, x.[x -2e(a)+5, a e(a)] ○ f 10
Running Example 2 NFunction 1 [x 0, a 7] [x -9, a 7] 3 [x 0, a 7] 4 [x 0, a 6] 1 [x -7, a 6] 6 [x -7, a 7] 7 [x , a 7] 8 [x -9, a 7] 1 [x , a ] a=7 9 Call p 10 Call p 11 print(x) 12 p1p1 If( … ) 2 a=a-1 3 Call p 4 Call p 5 a=a+1 6 x=-2a+5 7 end 8 begin 0 end 13 0 [x 0, a 0] 10 [x 0, a 7] 11 [x -9, a 7]
Issues in Functional Approach u How to guarantee that finite height for functional lattice? –It may happen that L has finite height and yet the lattice of monotonic function from L to L do not u Efficiently represent functions –Functional join –Functional composition –Testing equality –Usually non-trivial –But can be done for distributive functions
Example Linear Constant Propagation u Consider the constant propagation lattice u The value of every variable y at the program exit can be represented by: y = {(a x x + b x )| x Var * } c a x,c Z { , } b x Z u Supports efficient composition and “functional” join –[z := a * y + b] –What about [z:=x+y]? u Computes JVP
Functional Approach via Context Free Reachablity u The problem of computing reachability in a graph restricted by a context free grammar can be solved in cubic time u Can be used to compute JVP in arbitrary finite distributive data flow problems (not just bitvector) u Nodes in the graph correspond to individual facts u Efficient implementations exit (MOPED)
Conclusion u Handling functions is crucial for abstract interpretation u Virtual functions and exceptions complicate things u But scalability is an issue u Assume-guarantee helps –But relies on specifications