Lecture 16 – Dataflow Analysis Eran Yahav 1 www.cs.technion.ac.il/~yahave/tocs2011/compilers-lec16.pptx Reference: Dragon 9, 12.

Slides:



Advertisements
Similar presentations
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Advertisements

Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.
1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,
Lecture 15 – Dataflow Analysis Eran Yahav 1
Lecture 02 – Structural Operational Semantics (SOS) Eran Yahav 1.
Lecture 17 – Dataflow Analysis (and more!) Eran Yahav 1 Reference: Dragon 9, 12.
1 Lecture 06 – Inter-procedural Analysis Eran Yahav.
Lecture 03 – Background + intro AI + dataflow and connection to AI Eran Yahav 1.
Lecture 04 – Numerical Abstractions Eran Yahav 1.
Foundations of Data-Flow Analysis. Basic Questions Under what circumstances is the iterative algorithm used in the data-flow analysis correct? How precise.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
1 Data flow analysis Goal : –collect information about how a procedure manipulates its data This information is used in various optimizations –For example,
CS 536 Spring Global Optimizations Lecture 23.
Control Flow Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Data Flow Analysis Compiler Design Nov. 3, 2005.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
1 Control Flow Analysis Mooly Sagiv Tel Aviv University Textbook Chapter 3
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv Tel Aviv University Textbook Chapter 2.5.
Intraprocedural Points-to Analysis Flow functions:
Data Flow Analysis Compiler Design Nov. 8, 2005.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Data flow analysis Emery Berger University.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Prof. Aiken CS 294 Lecture 11 Program Analysis. Prof. Aiken CS 294 Lecture 12 The Purpose of this Course How are the following related? –Program analysis.
Ben Livshits Based in part of Stanford class slides from
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
1 CS 201 Compiler Construction Data Flow Analysis.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Data Flow Analysis. 2 Source code parsed to produce AST AST transformed to CFG Data flow analysis operates on control flow graph (and other intermediate.
Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.
MIT Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Jeffrey D. Ullman Stanford University. 2 boolean x = true; while (x) {... // no change to x }  Doesn’t terminate.  Proof: only assignment to x is at.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Program Analysis and Verification
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
1 Data Flow Analysis Data flow analysis is used to collect information about the flow of data values across basic blocks. Dominator analysis collected.
Lecture 13 – Dataflow Analysis (and more!) Eran Yahav 1 Reference: Dragon 9, 12.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University.
Inter-procedural analysis
Dataflow Analysis CS What I s Dataflow Analysis? Static analysis reasoning about flow of data in program Different kinds of data: constants, variables,
Code Optimization Data Flow Analysis. Data Flow Analysis (DFA)  General framework  Can be used for various optimization goals  Some terms  Basic block.
Code Optimization More Optimization Techniques. More Optimization Techniques  Loop optimization  Code motion  Strength reduction for induction variables.
Data Flow Analysis Suman Jana
Simone Campanoni Dependences Simone Campanoni
Spring 2016 Program Analysis and Verification
G. Ramalingam Microsoft Research, India & K. V. Raghavan
Topic 10: Dataflow Analysis
University Of Virginia
1. Reaching Definitions Definition d of variable v: a statement d that assigns a value to v. Use of variable v: reference to value of v in an expression.
Data Flow Analysis Compiler Design
Pointer analysis.
Static Single Assignment
Live variables and copy propagation
Presentation transcript:

Lecture 16 – Dataflow Analysis Eran Yahav 1 Reference: Dragon 9, 12

Last time… Dataflow Analysis  Information flows along (potential) execution paths  Conservative approximation of all possible program executions  Can be viewed as a sequence of transformations on program state  Every statement (block) is associated with two abstract states: input state, output state  Input/output state represents all possible states that can occur at the program point  Representation is finite  Different problems typically use different representations 2

Control-Flow Graph 1: y := x; 2: z := 1; 3: while y > 0 { 4: z := z * y; 5: y := y − 1 } 6: y := 0 3 1: y:=x 2: z:=1 3: y > 0 4: z=z*y 5: y=y-1 6: y:=0

Executions 4 1: y:=x 2: z:=1 3: y > 0 4: z=z*y 5: y=y-1 6: y:=0 1: y := x; 2: z := 1; 3: while y > 0 { 4: z := z * y; 5: y := y − 1 } 6: y := 0

Input/output Sets 5 1: y := x; 2: z := 1; 3: while y > 0 { 4: z := z * y; 5: y := y − 1 } 6: y := 0 1: y:=x 2: z:=1 3: y > 0 4: z=z*y 5: y=y-1 6: y:=0 in(1) out(1) in(2) out(2) in(3) out(3) in(4) out(4) in(5) out(5) in(6) out(6)

Transfer Functions 1: y:=x 2: z:=1 3: y > 0 4: z=z*y 5: y=y-1 6: y:=0 out(1) = in(1) \ { (y,l) | l  Lab } U { (y,1) } out(2) = in(2) \ { (z,l) | l  Lab } U { (z,2) } in(1) = { (x,?), (y,?), (z,?) } in(2) = out(1) in(3) = out(2) U out(5) in(4) = out(3) in(5) = out(4) in(6) = out(3) out(4) = in(4) \ { (z,l) | l  Lab } U { (z,4) } out(5) = in(5) \ { (y,l) | l  Lab } U { (y,5) } out(6) = in(6) \ { (y,l) | l  Lab } U { (y,6) } out(3) = in(3) 6

Kill/Gen formulation for Reaching Definitions 7 Blockout (lab) [x := a] lab in(lab) \ { (x,l) | l  Lab } U { (x,lab) } [skip] lab in(lab) [b] lab in(lab) Blockkillgen [x := a] lab { (x,l) | l  Lab }{ (x,lab) } [skip] lab  [b] lab  For each program point, which assignments may have been made and not overwritten, when program execution reaches this point along some path.

Solving Gen/Kill Equations  Designated block Entry with OUT[Entry]=   pred(B) = predecessor nodes of B in the control flow graph 8 OUT[ENTRY] =  ; for (each basic block B other than ENTRY) OUT[B] =  ; while (changes to any OUT occur) { for (each basic block B other than ENTRY) { OUT[B]= (IN[B]  killB)  genB IN[B] =  p  pred(B) IN[p] }

Available Expressions Analysis 9 For each program point, which expressions must have already been computed, and not later modified, on all paths to the program point [x := a+b] 1 ; [y := a*b] 2 ; while [y > a+b] 3 ( [a := a + 1] 4 ; [x := a + b] 5 ) (a+b) always available at label 3

Some required notation 10 blocks : Stmt  P(Blocks) blocks([x := a] lab ) = {[x := a] lab } blocks([skip] lab ) = {[skip] lab } blocks(S1; S2) = blocks(S1)  blocks(S2) blocks(if [b] lab then S1 else S2) = {[b] lab }  blocks(S1)  blocks(S2) blocks(while [b] lab do S) = {[b] lab }  blocks(S) FV: (BExp  AExp)  Var Variables used in an expression AExp(a) = all non-unit expressions in the arithmetic expression a similarly AExp(b) for a boolean expression b

Available Expressions Analysis  Property space  in AE, out AE : Lab   (AExp)  Mapping a label to set of arithmetic expressions available at that label  Dataflow equations  Flow equations – how to join incoming dataflow facts  Effect equations - given an input set of expressions S, what is the effect of a statement 11

 in AE (lab) =   when lab is the initial label   { out AE (lab’) | lab’  pred(lab) } otherwise  out AE (lab) = … 12 Blockout (lab) [x := a] lab in(lab) \ { a’  AExp | x  FV(a’) } U { a’  AExp(a) | x  FV(a’) } [skip] lab in(lab) [b] lab in(lab) U AExp(b) Available Expressions Analysis From now on going to drop the AE subscript when clear from context

Transfer Functions 1: x = a+b 2: y:=a*b 3: y > a+b 4: a=a+1 5: x=a+b out(1) = in(1) U { a+b } out(2) = in(2) U { a*b } in(1) =  in(2) = out(1) in(3) = out(2)  out(5) in(4) = out(3) in(5) = out(4) out(4) = in(4) \ { a+b,a*b,a+1 } out(5) = in(5) U { a+b } out(3) = in(3) U { a+ b } 13 [x := a+b] 1 ; [y := a*b] 2 ; while [y > a+b] 3 ( [a := a + 1] 4 ; [x := a + b] 5 )

Solution 1: x = a+b 2: y:=a*b 3: y > a+b 4: a=a+1 5: x=a+b in(2) = out(1) = { a + b } out(2) = { a+b, a*b } out(4) =  out(5) = { a+b } in(4) = out(3) = { a+ b } 14 in(1) =  in(3) = { a + b }

Kill/Gen 15 Blockout (lab) [x := a] lab in(lab) \ { a’  AExp | x  FV(a’) } U { a’  AExp(a) | x  FV(a’) } [skip] lab in(lab) [b] lab in(lab) U AExp(b) Blockkillgen [x := a] lab { a’  AExp | x  FV(a’) } { a’  AExp(a) | x  FV(a’) } [skip] lab  [b] lab  AExp(b) out(lab) = in(lab) \ kill(B lab ) U gen(B lab ) B lab = block at label lab

Reaching Definitions Revisited 16 Blockout (lab) [x := a] lab in(lab) \ { (x,l) | l  Lab } U { (x,lab) } [skip] lab in(lab) [b] lab in(lab) Blockkillgen [x := a] lab { (x,l) | l  Lab }{ (x,lab) } [skip] lab  [b] lab  For each program point, which assignments may have been made and not overwritten, when program execution reaches this point along some path.

Why solution with smallest sets? 17 1: z = x+y 2: true 3: skip out(1) = ( in(1) \ { (z,?) } ) U { (z,1) } in(1) = { (x,?),(y,?),(z,?) } in(2) = out(1) U out(3) in(3) = out(2) out(3) = in(3) out(2) = in(2) [z := x+y] 1 ; while [true] 2 ( [skip] 3 ; ) in(1) = { (x,?),(y,?),(z,?) } After simplification: in(2) = in(2) U { (x,?),(y,?),(z,1) } Many solutions: any superset of { (x,?),(y,?),(z,1) } in(2) = out(1) U out(3) in(3) = out(2)

Live Variables 18 For each program point, which variables may be live at the exit from the point. [x :=2] 1 ; [y:=4] 2 ; [x:=1] 3 ; (if [y>x] 4 then [z:=y] 5 else [z:=y*y] 6 ); [x:=z] 7

[x :=2] 1 ; [y:=4] 2 ; [x:=1] 3 ; (if [y>x] 4 then [z:=y] 5 else [z:=y*y] 6 ); [x:=z] 7 Live Variables 1: x := 2 2: y:=4 4: y > x 5: z := y 7: x := z 6: z = y*y 3: x:=1

1: x := 2 2: y:=4 4: y > x 5: z := y 7: x := z 20 [x :=2] 1 ; [y:=4] 2 ; [x:=1] 3 ; (if [y>x] 4 then [z:=y] 5 else [z:=y*y] 6 ); [x:=z] 7 6: z = y*y Live Variables Blockkillgen [x := a] lab { x }{ FV(a) } [skip] lab  [b] lab  FV(b) 3: x:=1

1: x := 2 2: y:=4 4: y > x 5: z := y 7: x := z 21 [x :=2] 1 ; [y:=4] 2 ; [x:=1] 3 ; (if [y>x] 4 then [z:=y] 5 else [z:=y*y] 6 ); [x:=z] 7 6: z = y*y Live Variables: solution Blockkillgen [x := a] lab { x }{ FV(a) } [skip] lab  [b] lab  FV(b) 3: x:=1 in(1) =  out(1) = in(2) =  out(2) = in(3) = { y } out(3) = in(4) = { x,y } in(7) = { z } out(7) =  out(6) = { z } in(6) = { y } out(5) = { z } in(5) = { y } in(4) = { x,y } out(4) = { y }

Why solution with smallest set? 22 1: x>1 2: skip out(1) = in(2) U in(3) out(2) = in(1) out(3) =  in(2) = out(2) while [x>1] 1 ( [skip] 2 ; ) [x := x+1] 3 ; After simplification: in(1) = in(1) U { x } Many solutions: any superset of { x } 3: x := x + 1 in(3) = { x } in(1) = out(1)  { x }

Monotone Frameworks   is  or   CFG edges go either forward or backwards  Entry labels are either initial program labels or final program labels (when going backwards)  Initial is an initial state (or final when going backwards)  f lab is the transfer function associated with the blocks B lab 23 In(lab) =  { out(lab’) | (lab’,lab)  CFG edges } Initialwhen lab  Entry labels otherwise out(lab) = f lab (in(lab))

Forward vs. Backward Analyses 24 1: x := 2 2: y:=4 4: y > x 5: z := y 7: x := z 6: z = y*y 1: x := 2 2: y:=4 4: y > x 5: z := y 7: x := z 6: z = y*y {(x,1), (y,?), (z,?) } { (x,?), (y,?), (z,?) } {(x,1), (y,2), (z,?) }  { z } { y }

Must vs. May Analyses  When  is  - must analysis  Want largest sets that solves the equation system  Properties hold on all paths reaching a label (exiting a label, for backwards)  When  is  - may analysis  Want smallest sets that solve the equation system  Properties hold at least on one path reaching a label (existing a label, for backwards) 25

Example: Reaching Definition  L =  (Var×Lab) is partially ordered by    is   L satisfies the Ascending Chain Condition because Var × Lab is finite (for a given program) 26

Example: Available Expressions  L =  (AExp) is partially ordered by    is   L satisfies the Ascending Chain Condition because AExp is finite (for a given program) 27

Analyses Summary 28 Reaching Definitions Available Expressions Live Variables L  (Var x Lab)  (AExp)  (Var)    AExp  Initial{ (x,?) | x  Var}  Entry labels{ init } final DirectionForward Backward F{ f: L  L |  k,g : f(val) = (val \ k) U g } f lab f lab (val) = (val \ kill)  gen

Analyses as Monotone Frameworks  Property space  Powerset  Clearly a complete lattice  Transformers  Kill/gen form  Monotone functions (let’s show it) 29

Monotonicity of Kill/Gen transformers  Have to show that x  x’ implies f(x)  f(x’)  Assume x  x’, then for kill set k and gen set g (x \ k) U g  (x’ \ k) U g  Technically, since we want to show it for all functions in F, we also have to show that the set is closed under function composition 30

Distributivity of Kill/Gen transformers  Have to show that f(x  y)  f(x)  f(y)  f(x  y) = ((x  y) \ k) U g = ((x \ k)  (y \ k)) U g = (((x \ k) U g)  ((y \ k) U g)) = f(x)  f(y)  Used distributivity of  and U  Works regardless of whether  is U or  31

Points-to Analysis  Many flavors  PWHILE language 32 S ::= [x := a] lab | [skip] lab | S1;S2 | if [b] lab then S1 else S2 | while [b] lab do S | x = malloc p  PExp pointer expressions a ::= x | n | a1 op a a2 | &x | *x | nil

Points-to Analysis  Aliases  Two pointers p and q are aliases if they point to the same memory location  Points-to pair  (p,q) means p holds the address of q  Points-to pairs and aliases  (p,q) and (r,q) means that p and r are aliases  Challenge: no a priori bound on the set of heap locations 33

Terminology Example 34 zx y [x := &z] 1 [y := &z] 2 [w := &y] 3 [r := w] 4 w r Points-to pairs: (x,z), (y,z), (w,y), (r,y) Aliases: (x,y), (r,w)

(May) Points-to Analysis  Property Space  L = (  (Var x Var), , , , , Var x Var )  Transfer functions 35 Statementout(lab) [p = &x] lab in(lab)  { (p,x) } [p = q] lab in(lab) U { (p,x) | (q,x)  in(lab) } [*p = q] lab in(lab) U { (r,x) | (q,x)  in(lab) and (p,r)  in(lab) } [p = *q] lab in(lab) U { (p,r) | (q,x)  in(lab) and (x,r)  in(lab) }

(May) Points-to Analysis  What to do with malloc?  Need some static naming scheme for dynamically allocated objects  Single name for the entire heap   [p = malloc] lab  (S) = S  { (p,H) }  Name based on static allocation site   [p = malloc] lab  (S) = S  { (p,lab) } 36

(May) Points-to Analysis 37 [x :=malloc] 1 ; [y:=malloc] 2 ; (if [x==y] 3 then [z:=x] 4 else [z:=y] 5 );  { (x,H) } { (x,H), (y,H) } { (x,H), (y,H), (z,H) } Single name H for the entire heap { (x,H), (y,H), (z,H) }

Allocation Sites  Divide the heap into a fixed partition based on allocation site  All objects allocated at the same program point represented by a single “abstract object” 38 AS2 AS3 AS1 AS2 AS1

(May) Points-to Analysis 39 [x :=malloc] 1 ; // A1 [y:=malloc] 2 ; // A2 (if [x==y] 3 then [z:=x] 4 else [z:=y] 5 );  { (x,A1) } { (x,A1), (y,A2) } { (x,A1), (y,A2), (z,A1) } { (x,A1), (y,A2), (z,A2) } Allocation-site based naming (using A lab instead of just “lab” for clarity) { (x,A1), (y,A2), (z,A1), (z,A2) }

Weak Updates 40 Statementout(lab) [p = &x] lab in(lab)  { (p,x) } [p = q] lab in(lab) U { (p,x) | (q,x)  in(lab) } [*p = q] lab in(lab) U { (r,x) | (q,x)  in(lab) and (p,r)  in(lab) } [p = *q] lab in(lab) U { (p,r) | (q,x)  in(lab) and (x,r)  in(lab) } [x :=malloc] 1 ; // A1 [y:=malloc] 2 ; // A2 [z:=x] 3 ; [z:=y] 4 ;  { (x,A1) } { (x,A1), (y,A2) } { (x,A1), (y,A2), (z,A1) } { (x,A1), (y,A2), (z,A1), (z,A2) }

(May) Points-to Analysis  Fixed partition of the (unbounded) heap to static names  Allocation sites  Types  Calling contexts  …  What we saw so far – flow-insensitive  Ignoring the structure of the flow in the program 41

Flow-sensitive vs. Flow-insensitive Analyses  Flow sensitive: respect program flow  a separate set of points-to pairs for every program point  the set at a point represents possible may-aliases on some path from entry to the program point  Flow insensitive: assume all execution orders are possible, abstract away order between statements 42 [x :=malloc] 1 ; [y:=malloc] 2 ; (if [x==y] 3 then [z:=x] 4 else [z:=y] 5 ); 1: x := malloc 2: y:=malloc 3: x == y 4: z := x6: z = y

So far…  Intra-procedural analysis  How are we going to deal with procedures?  Inter-procedural analysis 43

Interprocedural Analysis  The effect of calling a procedure is the effect of executing its body 44 Call bar() foo() bar()

Call Graph 45 int (*pf)(int); int fun1 (int x) { if (x < 10) return (*pf) (x+l); // C1 else return x; } int fun2(int y) { pf = &fun1; return (*pf) (y) ; // C2 } void main() { pf = &fun2; (*pf )(5) ; // C3 } c1 c2 c3 fun1 fun2 main c1 c2 c3 fun1 fun2 main type basedmore precise

Context Sensitivity 46 main() { for (i = 0; i < n; i++) { t1 = f(0); // C1 t2 = f(42); // C2 t3 = f(42); // C3 X[i] = t1 + t2 + t3; } int f(int v) return (v+1); } i=0 if i<n goto L v = 0 // C1 t1 = retval v = 42 // C2 t2 = retval v = 42 // C3 retval=v + 1 \\f t3 = retval t4 = t1+t2 t5 = t4+ t3 x[i] = t5 i = i + 1 B1 B2 B3 B4 B5 B6 B7

Solution Attempt #1  Inline callees into callers  End up with one big procedure  CFGs of individual procedures = duplicated many times  Good: it is precise  distinguishes different calls to the same function  Bad  exponential blow-up, not efficient  doesn’t work with recursion 47 main() { f(); f(); } f() { g(); g(); } g() { h(); h(); } h() {... }

Inlining 48 main() { for (i = 0; i < n; i++) { t1 = f(0); // C1 t2 = f(42); // C2 t3 = f(42); // C3 X[i] = t1 + t2 + t3; } int f(int v) return (v+1); } i=0 if i<n goto L t1 = 1 t2 = t3 = t4 = t1+t2 t5 = t4+ t3 x[i] = t5 i = i + 1 B1 B2 B3 B4 B5 B7

Solution Attempt #2  Build a “supergraph” = inter-procedural CFG  Replace each call from P to Q with  An edge from point before the call (call point) to Q’s entry point  An edge from Q’s exit point to the point after the call (return pt)  Add assignments of actuals to formals, and assignment of return value  Good: efficient  Graph of each function included exactly once in the supergraph  Works for recursive functions (although local variables need additional treatment)  Bad: imprecise, “context-insensitive”  The “unrealizable paths problem”: dataflow facts can propagate along infeasible control paths 49

Unrealizable Paths 50 Call bar() foo() bar() Call bar() zoo()

Interprocedural Analysis  Extend language with begin/end and with [call p()] clab rlab  Call label clab, and return label rlab 51 begin proc p() is 1 [x := a + 1] 2 end 3 [a=7] 4 [call p()] 5 6 [print x] 7 [a=9] 8 [call p()] 9 10 [print a] 11 end

IVP: Interprocedural Valid Paths () f1f1 f2f2 f k-1 fkfk f3f3 f4f4 f5f5 f k-2 f k-3 call q enter q exit q ret IVP: all paths with matching calls and returns  And prefixes

Valid Paths 53 Call bar() foo() bar() Call bar() zoo() (1 )1 (2 )2

Interprocedural Valid Paths  IVP set of paths  Start at program entry  Only considers matching calls and returns  aka, valid  Can be defined via context free grammar  matched ::= matched ( i matched ) i | ε  valid ::= valid ( i matched | matched  paths can be defined by a regular expression

The Join-Over-Valid-Paths (JVP)  vpaths(n) all valid paths from program start to n  JVP[n] =  {  e 1, e 2, …, e  (initial) (e 1, e 2, …, e)  vpaths(n)}  JVP  JFP  In some cases the JVP can be computed  (Distributive problem)

Sharir and Pnueli ‘82  Call String approach  Blend interprocedural flow with intra procedural flow  Tag every dataflow fact with call history  Functional approach  Determine the effect of a procedure  E.g., in/out map  Treat procedure invocations as “super ops”

The Call String Approach  Record at every node a pair (l, c) where l  L is the dataflow information and c is a suffix of unmatched calls  Use Chaotic iterations  To guarantee termination limit the size of c (typically 1 or 2)  Emulates inline (but no code growth)  Exponential in size of c

begin proc p() is 1 [x := a + 1] 2 end 3 [a=7] 4 [call p()] 5 6 [print x] 7 [a=9] 8 [call p()] 9 10 [print a] 11 end proc p x=a+1 end a=7 call p 5 call p 6 print x a=9 call p 9 call p 10 print a [x  0, a  0] [x  0, a  7] 5,[x  0, a  7] 5,[x  8, a  7] [x  8, a  7] [x  8, a  9] 9,[x  8, a  9]  9,[x  10, a  9]  [x  10, a  9] 5,[x  8, a  7] 9,[x  10, a  9] 

begin 0 proc p() is 1 if [b] 2 then ( [a := a -1] 3 [call p()] 4 5 [a := a + 1] 6 ) [x := -2* a + 5] 7 end 8 [a=7] 9 ; [call p()] ; [print(x)] 12 end 13 a=7 Call p 10 Call p 11 print(x) p If( … ) a=a-1 Call p 4 Call p 5 a=a+1 x=-2a+5 end 10:[x  0, a  7] [x  0, a  7] [x  0, a  0] 10:[x  0, a  7] 10:[x  0, a  6] 4:[x  0, a  6] 4:[x  -7, a  6] 10:[x  -7, a  6] 4:[x  -7, a  6] 4:[x  -7, a  7] 4:[x , a  ]

The Functional Approach  The meaning of a procedure is mapping from states into states  The abstract meaning of a procedure is function from an abstract state to abstract states

begin proc p() is 1 if [b] 2 then ( [a := a -1] 3 [call p()] 4 5 [a := a + 1] 6 ) [x := -2* a + 5] 7 end 8 [a=7] 9 ; [call p()] ; [print(x)] 12 end a=7 Call p 10 Call p 11 print(x) p If( … ) a=a-1 Call p 4 Call p 5 a=a+1 x=-2a+5 end [x  0, a  7] [x  0, a  0] e.[x  -2e(a)+5, a  e(a)] [x  -9, a  7]

begin proc p() is 1 if [b] 2 then ( [a := a -1] 3 [call p()] 4 5 [a := a + 1] 6 ) [x := -2* a + 5] 7 end 8 [read(a)] 9 ; [call p()] ; [print(x)] 12 end read(a) Call p 10 Call p 11 print(x) p If( … ) a=a-1 Call p 4 Call p 5 a=a+1 x=-2a+5 end [x  0, a  ] [x  0, a  0] e.[x  -2e(a)+5, a  e(a)] [x , a  ]

Functional Approach: Main Idea  Iterate on the abstract domain of functions from L to L  Two phase algorithm  Compute the dataflow solution at the exit of a procedure as a function of the initial values at the procedure entry (functional values)  Compute the dataflow values at every point using the functional values  Computes JVP for distributive problems