1 Lecture 06 – Inter-procedural Analysis Eran Yahav.

Slides:



Advertisements
Similar presentations
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani
Lecture 16 – Dataflow Analysis Eran Yahav 1 Reference: Dragon 9, 12.
Lecture 17 – Dataflow Analysis (and more!) Eran Yahav 1 Reference: Dragon 9, 12.
Foundations of Data-Flow Analysis. Basic Questions Under what circumstances is the iterative algorithm used in the data-flow analysis correct? How precise.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Worklist algorithm Initialize all d i to the empty set Store all nodes onto a worklist while worklist is not empty: –remove node n from worklist –apply.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Control Flow Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
From last time: live variables Set D = 2 Vars Lattice: (D, v, ?, >, t, u ) = (2 Vars, µ, ;,Vars, [, Å ) x := y op z in out F x := y op z (out) = out –
1 Iterative Program Analysis Part I Mooly Sagiv Tel Aviv University Textbook: Principles of Program.
Data Flow Analysis Compiler Design Nov. 3, 2005.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
1 Control Flow Analysis Mooly Sagiv Tel Aviv University Textbook Chapter 3
Previous finals up on the web page use them as practice problems look at them early.
1 Iterative Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Run time vs. Compile time
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv Tel Aviv University Textbook Chapter 2.5.
Data Flow Analysis Compiler Design Nov. 8, 2005.
From last lecture x := y op z in out F x := y op z (in) = in [ x ! in(y) op in(z) ] where a op b =
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
A simple approach Given call graph and CFGs of procedures, create a single CFG (control flow super-graph) by: –connecting call sites to entry nodes of.
1 Program Analysis Systematic Domain Design Mooly Sagiv Tel Aviv University Textbook: Principles.
From last lecture We want to find a fixed point of F, that is to say a map m such that m = F(m) Define ?, which is ? lifted to be a map: ? = e. ? Compute.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Model Checking Lecture 5. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
1 Tentative Schedule u Today: Theory of abstract interpretation u May 5 Procedures u May 15, Orna Grumberg u May 12 Yom Hatzamaut u May.
Precision Going back to constant prop, in what cases would we lose precision?
Example x := read() v := a + b x := x + 1 w := x + 1 a := w v := a + b z := x + 1 t := a + b.
Precise Interprocedural Dataflow Analysis via Graph Reachibility
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Program Analysis and Verification
Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Program Analysis and Verification
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 12: Abstract Interpretation IV Roman Manevich Ben-Gurion University.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 13: Abstract Interpretation V Roman Manevich Ben-Gurion University.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
1 Numeric Abstract Domains Mooly Sagiv Tel Aviv University Adapted from Antoine Mine.
Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
1 Iterative Program Analysis Part II Mathematical Background Mooly Sagiv Tel Aviv University
Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University.
DFA foundations Simone Campanoni
Program Analysis Last Lesson Mooly Sagiv. Goals u Show the significance of set constraints for CFA of Object Oriented Programs u Sketch advanced techniques.
Program Analysis and Verification
Spring 2016 Program Analysis and Verification
Iterative Program Analysis Abstract Interpretation
Program Analysis and Verification
Another example: constant prop
Program Analysis and Verification
Program Analysis via Graph Reachability
Data Flow Analysis Compiler Design
Program Analysis via Graph Reachability
Presentation transcript:

1 Lecture 06 – Inter-procedural Analysis Eran Yahav

Previously  Verifying absence of buffer overruns  (requires) Heap abstractions  (requires) Combining heap and numerical information 2

Today  LFP computation and join-over-all-paths  Inter-procedural analysis  Acks  Some slides adapted from Noam Rinetzky and Mooly Sagiv  DNA of some slides traces back to Tom Reps 3

Join over all paths (JOP)  Propagate analysis information along paths  A path is a sequence of edges in the CFG [e 1, e 2, …, e n ]  Can define the transfer function for a path via transformer composition  let f i denote f(e i )  f[e 1, e 2, …, e n ] = f n  …  f 2  f 1  Consider the (potentially infinite) set of paths reaching a label in the program from entry. Denote by Paths-to-in(entry,l) the set of paths from entry to label l, similarly for paths-to-out  The result at program point l can be defined as JOPin (l) =  { f[p](initial) | p  paths-to-in(entry,l) } JOPout(l) =  { f[p](initial) | p  paths-to-out(entry,l) } 4

The lfp computation approximates JOP  JOPout(l) =  { f[p](initial) | p  paths-to-out(entry,l) }  Set of paths potentially infinite (non computable)  The lfp computation take joins “earlier”  Merging sub-paths  For a monotone function  f(x  y)  f(x)  f(y) 5

lfp computation and JOP 6 e1 e2 e3 e4 e5 e6 e7 e8 Paths transformers: f[e1,e2,e3,e4] f[e1,e2,e7,e8] f[e5,e6,e7,e8] f[e5,e6,e3,e4] JOP: f[e1,e2,e3,e4](initial)  f[e1,e2,e7,e8](initial)  f[e5,e6,e7,e8](initial)  f[e5,e6,e3,e4](initial) “lfp” computation: j1 =f[e1,e2](initial)  f[e5,e6](initial) j2 = f[e3,e4](j1)  f[e7,e8](j1) (wrote “lfp” just to emphasize that no iteration is required here, no loops)

Interprocedural Analysis  The effect of calling a procedure is the effect of executing its body 7 Call bar() foo() bar()

Interprocedural Analysis  Stack can grow without a bound  Matching of call/return 8

Solution Attempt #1  Inline callees into callers  End up with one big procedure  CFGs of individual procedures = duplicated many times  Good: it is precise  distinguishes different calls to the same function  Bad  exponential blow-up, not efficient  doesn’t work with recursion 9

Simple Example int p(int a) { return a + 1; } void main() { int x ; x = p(7); x = p(9) ; }

Simple Example void main() { int x ; x = 7; x = x + 1; x = 9; x = x + 1; } [x  8] [x  10]

Inlining: Exponential Blowup 12 main() { f1(); } f1() { f2(); } … fn() {... }

Solution Attempt #2  Build a “supergraph” = inter-procedural CFG  Replace each call from P to Q with  An edge from point before the call (call point) to Q’s entry point  An edge from Q’s exit point to the point after the call (return pt)  Add assignments of actuals to formals, and assignment of return value  Good: efficient  Graph of each function included exactly once in the supergraph  Works for recursive functions (although local variables need additional treatment)  Bad: imprecise, “context-insensitive”  The “unrealizable paths problem”: dataflow facts can propagate along infeasible control paths 13

Simple Example int p(int a) { return a + 1; } void main() { int x ; x = p(7); x = p(9) ; }

Simple Example int p(int a) { [a  7] return a + 1; } void main() { int x ; x = p(7); x = p(9) ; }

Simple Example int p(int a) { [a  7] return a + 1; [a  7, $$  8] } void main() { int x ; x = p(7); x = p(9) ; }

Simple Example int p(int a) { [a  7] return a + 1; [a  7, $$  8] } void main() { int x ; x = p(7); [x  8] x = p(9) ; [x  8] }

Simple Example int p(int a) { [a  7] return a + 1; [a  7, $$  8] } void main() { int x ; x =l p(7); [x  8] x = p(9) ; [x  8] }

Simple Example int p(int a) { [a  ] return a + 1; [a  7, $$  8] } void main() { int x ; x = p(7); [x  8] x = p(9) ; [x  8] }

Simple Example int p(int a) { [a  ] return a + 1; [a , $$  ] } void main() { int x ; x = p(7); [x  8] x = p(9); [x  8] }

Simple Example int p(int a) { [a  ] return a + 1; [a , $$  ] } void main() { int x ; x = p(7) ; [x  ] x = p(9) ; [x  ] }

Unrealizable Paths 22 Call bar() foo() bar() Call bar() zoo()

IVP: Interprocedural Valid Paths () f1f1 f2f2 f k-1 fkfk f3f3 f4f4 f5f5 f k-2 f k-3 call q enter q exit q ret IVP: all paths with matching calls and returns  And prefixes

Valid Paths 24 Call bar() foo() bar() Call bar() zoo() (1 )1 (2 )2

Interprocedural Valid Paths  IVP set of paths  Start at program entry  Only considers matching calls and returns  aka, valid  Can be defined via context free grammar  matched ::= matched ( i matched ) i | ε  valid ::= valid ( i matched | matched  paths can be defined by a regular expression

The Join-Over-Valid-Paths (JVP)  vpaths(n) all valid paths from program start to n  JVP[n] =  {  e 1, e 2, …, e  (initial) (e 1, e 2, …, e)  vpaths(n)}  JVP  JFP  In some cases the JVP can be computed  (Distributive problem)

Sharir and Pnueli ‘82  Call String approach  Blend interprocedural flow with intra procedural flow  Tag every dataflow fact with call history  Functional approach  Determine the effect of a procedure  E.g., in/out map  Treat procedure invocations as “super ops”

The Call String Approach  Record at every node a pair (l, c) where l  L is the dataflow information and c is a suffix of unmatched calls  Use Chaotic iterations  To guarantee termination limit the size of c (typically 1 or 2)  Emulates inline (but no code growth)  Exponential in size of c

begin proc p() is 1 [x := a + 1] 2 end 3 [a=7] 4 [call p()] 5 6 [print x] 7 [a=9] 8 [call p()] 9 10 [print a] 11 end proc p x=a+1 end a=7 call p 5 call p 6 print x a=9 call p 9 call p 10 print a [x  0, a  0] [x  0, a  7] 5,[x  0, a  7] 5,[x  8, a  7] [x  8, a  7] [x  8, a  9] 9,[x  8, a  9]  9,[x  10, a  9]  [x  10, a  9] 5,[x  8, a  7] 9,[x  10, a  9] 

begin 0 proc p() is 1 if [b] 2 then ( [a := a -1] 3 [call p()] 4 5 [a := a + 1] 6 ) [x := -2* a + 5] 7 end 8 [a=7] 9 ; [call p()] ; [print(x)] 12 end 13 a=7 Call p 10 Call p 11 print(x) p If( … ) a=a-1 Call p 4 Call p 5 a=a+1 x=-2a+5 end 10:[x  0, a  7] [x  0, a  7] [x  0, a  0] 10:[x  0, a  7] 10:[x  0, a  6] 4:[x  0, a  6] 4:[x  -7, a  6] 10:[x  -7, a  6] 4:[x  -7, a  6] 4:[x  -7, a  7] 4:[x , a  ]

Simple Example int p(int a) { return a + 1; } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }

Simple Example int p(int a) { c1: [a  7] return a + 1; } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }

Simple Example int p(int a) { c1: [a  7] return a + 1; c1:[a  7, $$  8] } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }

Simple Example int p(int a) { c1: [a  7] return a + 1; c1:[a  7, $$  8] } void main() { int x ; c1: x = p(7);  : x  8 c2: x = p(9) ; }

Simple Example int p(int a) { c1:[a  7] return a + 1; c1:[a  7, $$  8] } void main() { int x ; c1: x = p(7);  : [x  8] c2: x = p(9) ; }

Simple Example int p(int a) { c1:[a  7] c2:[a  9] return a + 1; c1:[a  7, $$  8] } void main() { int x ; c1: x = p(7);  : [x  8] c2: x = p(9) ; }

Simple Example int p(int a) { c1:[a  7] c2:[a  9] return a + 1; c1:[a  7, $$  8] c2:[a  9, $$  10] } void main() { int x ; c1: x = p(7);  : [x  8] c2: x = p(9) ; }

Simple Example int p(int a) { c1:[a  7] c2:[a  9] return a + 1; c1:[a  7, $$  8] c2:[a  9, $$  10] } void main() { int x ; c1: x = p(7);  : [x  8] c2: x = p(9) ;  : [x  10] }

Another Example int p(int a) { c1:[a  7] c2:[a  9] return c3: p1(a + 1); c1:[a  7, $$  ] c2:[a  9, $$  ] } void main() { int x ; c1: x = p(7);  : [x  8] c2: x = p(9) ;  : [x  10] } int p1(int b) { (c1|c2)c3:[b  ] return 2 * b; (c1|c2)c3:[b , $$  ] }

int p(int a) { c1: [a  7] c1.c2+: [a   ] if (…) { c1: [a  7] c1.c2+: [a   ] a = a -1 ; c1: [a  6] c1.c2+: [a   ] c2: p (a); c1.c2*: [a   ] a = a + 1; c1.c2*: [a   ] } c1.c2*: [a   ] x = -2*a + 5; c1.c2*: [a  , x  ] } void main() { c1: p(7);  : [x  ] } Recursion

Summary: Call String  Simple  Only feasible for very short call strings  exponential in call-string length  Often loses precision under recursion  although can still be precise in some cases 41

The Functional Approach  The meaning of a procedure is mapping from states into states  The abstract meaning of a procedure is function from an abstract state to abstract states

Functional Approach: Main Idea  Iterate on the abstract domain of functions from L to L  Two phase algorithm  Compute the dataflow solution at the exit of a procedure as a function of the initial values at the procedure entry (functional values)  Compute the dataflow values at every point using the functional values  Computes JVP for distributive problems

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; p (a); a = a + 1; { x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; p (a); a = a + 1; { [a  a 0, x  x 0 ] x = -2*a + 5; } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; p (a); a = a + 1; { [a  a 0, x  x 0 ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; [a  a 0 -1, x  x 0 ] p (a); a = a + 1; { [a  a 0, x  x 0 ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; [a  a 0 -1, x  x 0 ] p (a); [a  a 0 -1, x  -2a 0 +7] a = a + 1; { [a  a 0, x  x 0 ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; [a  a 0 -1, x  x 0 ] p (a); [a  a 0 -1, x  -2a 0 +7] a = a + 1; [a  a 0, x  -2a 0 +7] { [a  a 0, x  x 0 ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; [a  a 0 -1, x  x 0 ] p (a); [a  a 0 -1, x  -2a 0 +7] a = a + 1; [a  a 0, x  -2a 0 +7] { [a  a 0, x  ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; [a  a 0 -1, x  x 0 ] p (a); [a  a 0 -1, x  -2a 0 +7] a = a + 1; [a  a 0, x  -2a 0 +7] { [a  a 0, x  ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

int p(int a) { [a  7, x  0] if (…) { a = a -1 ; p (a); a = a + 1; { x = -2*a + 5; } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a  7, x  0] if (…) { a = a -1 ; p (a); a = a + 1; { [a  7, x  0] x = -2*a + 5; } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a  7, x  0] if (…) { a = a -1 ; p (a); a = a + 1; { [a  7, x  0] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a  7, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); a = a + 1; { [a  7, x  0] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a  7, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; { [a  7, x  0] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a  7, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  0] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a  7, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a  7, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a  7, x  0] [a  6, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a , x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a , x  0] if (…) { a = a -1 ; [a , x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a , x  0] if (…) { a = a -1 ; [a , x  0] p (a); [a , x  ] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a , x  0] if (…) { a = a -1 ; [a , x  0] p (a); [a , x  ] a = a + 1; [a , x  ] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

int p(int a) { [a , x  0] if (…) { a = a -1 ; [a , x  0] p (a); [a , x  ] a = a + 1; [a , x  ] { [a , x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

int p(int a) { [a , x  0] if (…) { a = a -1 ; [a , x  0] p (a); [a , x  ] a = a + 1; [a , x  ] { [a , x  ] x = -2*a + 5; [a , x  ] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] Phase 2

Issues in Functional Approach  How to guarantee finite height for functional lattice?  Possible that L has finite height and yet the lattice of monotonic function from L to L do not  Efficiently represent functions  Functional join  Functional composition  Testing equality

Tabulation  Special case: L is finite  Data facts: d  L  L  Initialization:  f start,start = ( ,  ) ; otherwise ( ,  )  S[start,  ] =   Propagation of (x,y) over edge e = (n,n’)  Maintain summary: S[n’,x] = S[n’,x]   n  (y))  n intra-node:  n’: (x,  n  (y))  n call-node:  n’: (y,y) if S[n’,y] =  and n’= entry node  n’: (y,z) if S[n’,y] = z and n’= rest-site-of n  n return-node:  n’: (u,y) ; n c = call-site-of n’, S[n c,u]=x

IFDS  IFDS Interprocedural Distributive Finite Subset Precise interprocedural dataflow analysis via graph reachability. Reps, Horowitz, and Sagiv, POPL’95

IFDS Problems  Finite subset distributive  Lattice L =  (D)   is    is   Transfer functions are distributive  Efficient solution through formulation as CFL reachability

Possibly Uninitialized Variables Start x = 3 if … y = x y = w w = 8 printf(y) {w,x,y} {w,y} {w} {w,y} {} {w,y} {}  V.V  V.V – { x }  V. { x, y, z }  V. if x  V then V  { y } else V – { y }  V. if w  V then V  { y } else V – { y }  V.V – { w }

Encoding Transfer Functions  Enumerate all input space and output space  Represent functions as graphs with 2(D+1) nodes  Special symbol “0” denotes empty sets (sometimes denoted  )  Example: D = { a, b, c } f(S) = (S – {a}) U {b} 72 0abc 0abc

Representing Dataflow Functions Identity Function Constant Function a bc a bc  

“Gen/Kill” Function Non-“Gen/Kill” Function a bc a bc Representing Dataflow Functions  

Exploded Supergraph  Exploded supergraph  Start with supergraph  Replace each node by its graph representation  Add edges between corresponding elements in D at consecutive program points  CFL reachability  Finding MOVP solution is equivalent to computing CFL reachability over the exploded supergraph using the valid parentheses grammar.

x = 3 p(x,y) return from p printf(y) start main exit main start p(a,b) if... b = a p(a,b) return from p printf(b) exit p xy a b

The Tabulation Algorithm  Worklist algorithm, start from entry of “main”  Keep track of  Path edges: matched paren paths from procedure entry  Summary edges: matched paren call-return paths  At each instruction  Propagate facts using transfer functions; extend path edges  At each call  Propagate to procedure entry, start with an empty path  If a summary for that entry exits, use it  At each exit  Store paths from corresponding call points as summary paths  When a new summary is added, propagate to the return node

a bcbc a Composing Dataflow Functions bc a

x = 3 p(x,y) return from p start main exit main start p(a,b) if... b = a p(a,b) return from p exit p xy a b printf(y) Might b be uninitialized here? printf(b) NO! ( ] Might y be uninitialized here? YES! ( )

Interprocedural Dataflow Analysis via CFL-Reachability  Graph: Exploded control-flow graph  L: L(unbalLeft)  unbalLeft = valid  Fact d holds at n iff there is an L(unbalLeft)-path from to

Asymptotic Running Time  CFL-reachability  Exploded control-flow graph: ND nodes  Running time: O(N 3 D 3 )  Exploded control-flow graph  Special structure Running time: O(ED 3 ) Typically: E l N, hence O(ED 3 ) l O(ND 3 ) “Gen/kill” problems: O(ED)

Recap  inter-procedural analysis  inlining  call-string approach  functional approach

83

begin proc p() is 1 if [b] 2 then ( [a := a -1] 3 [call p()] 4 5 [a := a + 1] 6 ) [x := -2* a + 5] 7 end 8 [a=7] 9 ; [call p()] ; [print(x)] 12 end a=7 Call p 10 Call p 11 print(x) p If( … ) a=a-1 Call p 4 Call p 5 a=a+1 x=-2a+5 end [x  0, a  7] [x  0, a  0] e.[x  -2e(a)+5, a  e(a)] [x  -9, a  7]

begin proc p() is 1 if [b] 2 then ( [a := a -1] 3 [call p()] 4 5 [a := a + 1] 6 ) [x := -2* a + 5] 7 end 8 [read(a)] 9 ; [call p()] ; [print(x)] 12 end read(a) Call p 10 Call p 11 print(x) p If( … ) a=a-1 Call p 4 Call p 5 a=a+1 x=-2a+5 end [x  0, a  ] [x  0, a  0] e.[x  -2e(a)+5, a  e(a)] [x , a  ]

Example: Constant propagation  L = Var  N  { ,  }  Domain: F:L  L  (f 1  f 2 )(x) = f 1 (x)  f 2 (x) x=7 y=x+1 x=y  env.env[x  7]  env.env[y  env(x)+1] Id=  env  L.env  env.env[x  7]   env.env  env.env[y  env(x)+1]   env.env[x  7]   env.env

Example: Constant propagation  L = Var  N  { ,  }  Domain: F:L  L  (f 1  f 2 )(x) = f 1 (x)  f 2 (x) x=7y=x+1 x=y  env.env[x  7]  env.env[y  env(x)+1] Id=  env.env  env.env[y  env(x)+1]   env.env[x  7]

init NFunction 0 e.[x  e(x), a  e(a)]=id e.  a=7 9 Call p 10 Call p 11 print(x) 12 p1p1 If( … ) 2 a=a-1 3 Call p 4 Call p 5 a=a+1 6 x=-2a+5 7 end 8 begin 0 end 13

Solution (F) NFunction 1 e. [x  e(x), a  e(a)]=id 2id 7 8 e.[x  -2e(a)+5, a  e(a)] 3id 4 e.[x  e(x), a  e(a)-1] 5 f 8 ○ e.[x  e(x), a  e(a)-1] = e.[x  -2(e(a)-1)+5, a  e(a)-1] 6 7 e.[x  -2(e(a)-1)+5, a  e(a)]  e.[x  e(x), a  e(a)] 8 a, x.[x  -2e(a)+5, a  e(a)] 0 e.[x  e(x), a  e(a)]=id 10 e.[x  e(x), a  7] 11 a, x.[x  -2e(a)+5, a  e(a)] ○ f 10 a=7 9 Call p 10 Call p 11 print(x) 12 p1p1 If( … ) 2 a=a-1 3 Call p 4 Call p 5 a=a+1 6 x=-2a+5 7 end 8 begin 0 end 13

Solution (D) NFunction 1[x  0, a  7] 2 7 8[x  -9, a  7] 3[x  0, a  7] 4[x  0, a  6] 1[x  -7, a  6] 6[x  -7, a  7] 7[x , a  7] 8[x  -9, a  7] 1[x , a  ] 0[x  0, a  0] 10[x  0, a  7] 11[x  -9, a  7] a=7 9 Call p 10 Call p 11 print(x) 12 p1p1 If( … ) 2 a=a-1 3 Call p 4 Call p 5 a=a+1 6 x=-2a+5 7 end 8 begin 0 end 13

Interprocedural Analysis  Extend language with begin/end and with [call p()] clab rlab  Call label clab, and return label rlab 91 begin proc p() is 1 [x := a + 1] 2 end 3 [a=7] 4 [call p()] 5 6 [print x] 7 [a=9] 8 [call p()] 9 10 [print a] 11 end

Work-list Algorithm 92 Chaotic(G(V, E): CFG, s: Node, L: Lattice,  : L, f: E  (L  L)){ for each v in V to n do df entry [v] :=  df[v] =  WL = {s} while (WL   ) do select and remove an element u  WL for each v, such that (u, v)  E do temp = f(e)(df entry [u]) new := df entry (v)  temp if (new  df entry [v]) then df entry [v] := new; WL := WL  {v}

Complexity of Chaotic Iterations  Parameters  n the number of CFG nodes  k is the maximum outdegree of edges  A lattice of height h  c is the maximum cost of  applying f (e)    L comparisons  Complexity: O(n * h * c * k) 93 (slide from Mooly Sagiv)