Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Inter-procedural Analysis
“Advanced Compiler Techniques” Topics Up to now Intra - procedural analysis Dataflow analysis PRE Loops SSA Just for individual procedures Today : Inter - procedural analysis across / between procedures 2
“Advanced Compiler Techniques” Modularity is a Virtue Decomposing programs into procedures aids in readability and maintainability Object - oriented languages have pushed this trend even further In a good design, procedures should be : An interface A black box 3
“Advanced Compiler Techniques” The Catch This inhibits optimization ! The compiler must assume : Called procedure may use or change any accessible variable Procedure ’ s caller provides arbitrary values as parameters Interprocedural optimizations – use the calling relationships between procedures to optimize one or both of them 4
Recall Function calls can affect our points - to sets p 1 = & x ; p 2 = & p 1;... foo (); Be conservative – Lose a lot of information 5 “Advanced Compiler Techniques”
Applications of IPA Virtual method invocation Pointer alias analysis Parallelization Detection software errors and vulnerabilities SQL injection Buffer overflow analysis & protection 6
“Advanced Compiler Techniques” Basic Concepts Procedure ( Function ) Caller / Callee Call Site Call Graph Call Context Call Strings Formal Arguments Actual Arguments 7
“Advanced Compiler Techniques” Terminology Goal – Avoid making overly conservative assumptions about the effects of procedures and the state at call sites int a, e // globals procedure foo(var b, c) // formal args b := c end program main int d // locals foo(a, d) // call site with end // actual args In procedure body formals and / or globals may be aliased ( two names refer to same location ) formals may have constant value At procedure call global vars may be modified or used actual args may be modified or used 8
Interprocedural Analysis vs. Interprocedural Optimization Interprocedural analysis Gather information across multiple procedures ( typically across the entire program ) Can use this information to improve intraprocedural analysis and optimization ( e. g., CSE ) Interprocedural optimizations Optimizations that involve multiple procedures e. g., Inlining, procedure cloning, interprocedural register allocation Optimizations that use interprocedural analysis 9 “Advanced Compiler Techniques”
The Call Graph Represent procedure call relationship by call graph G = ( V, E, start ) Each procedure is a unique vertex Call site = edge between caller & callee ( u, v ) = call from u to v ( u may call v ) Can label with source line Cycles represent recursion 10
Call Graph 11 “Advanced Compiler Techniques”
Super Graph 12 “Advanced Compiler Techniques”
Validity of Interprocedural Control Flow Paths 13 “Advanced Compiler Techniques”
Safety, Precision, and Efficiency of Data Flow Analysis Data flow analysis uses static representation of programs to compute summary information along paths Ensuring Safety. All valid paths must be covered Ensuring Precision. Only valid paths should be covered. Ensuring Efficiency. Only relevant valid paths should be covered. 14 “Advanced Compiler Techniques” A path which represents legal control flow Subject to merging data flow values at shared program points without creating invalid paths A path which yields informatio n that affects the summary informatio n
Flow and Context Sensitivity Flow sensitive analysis : Considers intraprocedurally valid paths Context sensitive analysis : Considers interprocedurally valid paths For maximum statically attainable precision, analysis must be both flow and context sensitive. 15 “Advanced Compiler Techniques”
Context Sensitivity in Interprocedural Analysis 16 “Advanced Compiler Techniques”
Example of Context Sensitivity 17 “Advanced Compiler Techniques”
Staircase Diagrams of Interprocedurally Valid Paths 18 “ You can descend only as much as you have ascended !” Every descending step must match a corresponding ascending step. “Advanced Compiler Techniques”
Context Sensitivity in Presence of Recursion 19 “Advanced Compiler Techniques” For a path from u tov, g must be applied exactly the same number of times as f. For a prefix of the above path, g can be applied only at most as many times as f.
Staircase Diagrams of Interprocedurally Valid Paths 20 “Advanced Compiler Techniques”
Interprocedural Analysis Goals Enable standard optimizations even with procedure calls Reduce call overhead for procedures Enable optimizations not possible for single procedures Optimizations Register allocation Loop transformations CSE, etc. 21
“Advanced Compiler Techniques” Analysis Sensitivity Flow - insensitive What may happen ( on at least one path ) Linear - time Flow - sensitive Consider control flow ( what must happen ) Iterative data - flow : possibly exponential Context - insensitive Call treated the same regardless of caller “ Monovariant ” analysis Context - sensitive Reanalyze callee for each caller “ Polyvariant ” analysis Path - sensitive vs. path - insensitive Computes one answer for every execution path Subsumes flow - sensitivity Extremely expensive More sensitivity More sensitivity More accuracy, but more expensive 22
Increasing Precision in Data Flow Analysis 23 “Advanced Compiler Techniques” actuall y, only caller sensiti ve
“Advanced Compiler Techniques” Precision of IPA Flow - insensitive result not affected by control flow in procedure Flow - sensitive result affected by control flow in procedure A B AB 24
“Advanced Compiler Techniques” Context Sensitivity Re - analyze callee as if procedure was inlined Too expensive in space & time Recursion? Approximate context sensitivity : Reanalyze callee for k levels of calling context a = id(3);b = id(4); id(x) { return x; } 3 4 a = min(3, 4);s = min(“aardvark”, “vacuum”); min(x, y) { if (x <= y) return x; else return y; } ints strings 25
Path Sensitivity Path - sensitive analysis – Computes an answer for every path : – x is 4 at the end of the left path – x is 5 at the end of the right path Path - insensitive analysis – Computes one answer for all path : – x is not constant 26 “Advanced Compiler Techniques”
Key Challenges for Interprocedural Analysis Compilation time, memory Key problem : scalability to large programs Dominated by analysis time / memory Flow - sensitive analysis : bottleneck often memory, not time Often limited to fast but imprecise analysis Multiple calling environments Different calls to P () have different properties : Known constants Aliases Surrounding execution context ( e. g., enclosing loops ) Function pointer arguments Frequency of the call Recursion 27
Brute Force: Full Context-Sensitive Interprocedural Analysis Invocation Graph [ Emami 94] Use an invocation graph, which distinguishes all calling chains Re - analyze callee for all distinct calling paths Pro : precise Cons : exponentially expensive, recursion is tricky 28 “Advanced Compiler Techniques”
Middle Ground: Use Call Graph and Compute Summaries Goal Represent procedure Call relationships Definition If program P consists of n procedures : p 1,..., pn Static call graph of P is GP = ( N, S, E, r ) −N = { p 1,..., pn } −S = { call - site labels } −E ⊆ N × N × S −r ∈ N is start node 29 “Advanced Compiler Techniques”
Summary Information Compute summary information for each procedure Summarize effect of called procedure for callers Summarize effect of callers for called procedure Store summaries in database Use later when optimizing procedures Pros + Concise + Can be fast to compute and use + Separate compilation practical Cons – Imprecise if only have one summary per procedure 30
“Advanced Compiler Techniques” Two Types of Information Track info that flows into procedures “ Propagation problems ”, e. g.: which formals are constant? which formals are aliased to globals? Track info that flows out of procedures “ Side effect problems ”, e. g.: which globals defined / used by procedure? which locals defined / used by procedure? Which actual parameters defined by procedure? proc(x, y) {... } 31
“Advanced Compiler Techniques” Propagation Summaries: Examples MAY - ALIAS Formals that may be aliased to globals MUST - ALIAS Formals definitely aliased to globals CONSTANT Formals that are definitely constant 32
“Advanced Compiler Techniques” Side-Effect Summaries: Examples MOD Variables possibly modified ( defined ) by procedure call REF Variables possibly referenced ( used ) by procedure KILL Variables that are definitely killed in procedure 33
“Advanced Compiler Techniques” Computing Summaries Bottom - up ( MOD, REF, KILL ) Summarizes call effects Top - down ( MAY - ALIAS ) Summarizes information about caller Bi - directional ( AVAIL, CONSTANT ) Info to / from caller & callee 34
“Advanced Compiler Techniques” Side-Effect Summarization At procedure boundaries : Translate formal args to actuals at call site Compute : GMOD, GREF = procedure side effects MOD, REF = effects at call site Possibly specific to call 35
“Advanced Compiler Techniques” Parameter Binding At procedure boundaries, we need to translate formal arguments of procedure to actual arguments of procedure at call site int a,b program main // MOD(foo) = b foo(b) // REF(foo) = a,b end procedure foo (var c) // GMOD(foo)= b int d // GREF(foo)= a,b d := b bar(b) // MOD(bar) = b end // REF(bar) = a procedure bar (var d) if (...) // GMOD(bar)= d d := a// GREF(bar)= a end 36
Constructing Summary Flow Functions Iteratively 37 Termination is possible only if all function compositions and confluences can be reduced to a finite set of functions “Advanced Compiler Techniques”
An Example of Interprocedural Liveness Analysis 38 “Advanced Compiler Techniques”
An Example of Interprocedural Liveness Analysis 39 “Advanced Compiler Techniques”
An Example of Interprocedural Liveness Analysis 40 “Advanced Compiler Techniques”
An Example of Interprocedural Liveness Analysis 41 “Advanced Compiler Techniques”
An Example of Interprocedural Liveness Analysis 42 “Advanced Compiler Techniques”
An Example of Interprocedural Liveness Analysis 43 “Advanced Compiler Techniques” e ∈ In Sp but e ∉ In c 1
Interprocedural Validity and Calling Contexts 44 “ You can descend only as much as you have ascended !” Every descending step must match a corresponding ascending step. Calling context is represented by the remaining descending steps. “Advanced Compiler Techniques”
Available Expressions Analysis Using Call Strings Approach 45 “Advanced Compiler Techniques” Is a ∗ b avai labl e? int a, b, t; void p() { if (a == 0) { a = a-1; p(); t = a ∗ b; } YES !
Available Expressions Analysis Using Call Strings Approach 46 “Advanced Compiler Techniques”
Alternatives to IPA: Inlining Replaces calls to procedures with copies of their bodies Converts calls from opaque objects to local code Exposes the “ effects ” of the called procedure Extends the compilation region Language support : the inline attribute But the compiler can decide per call - site, rather than per procedure 47
“Advanced Compiler Techniques” Inlining Decisions Must be based on Heuristics, or Profile information Considerations The size of the procedure body ( smaller = better ) Number of call sites (1= usually wins ) If call site is in a loop ( yes = more optimizations ) Constant - valued parameters 48
Inlining Policies The hard question – How do we decide which calls to inline? Many possible heuristics – Only inline small functions – Let the programmer decide using an inline directive – Use a code expansion budget [ Ayers, et al ’97] – Use profiling or instrumentation to identify hot paths — inline along the hot paths [ Chang, et al ’92] – JIT compilers do this – Use inlining trials for object oriented languages [ Dean & Chambers ’94] – Keep a database of functions, their parameter types, and the benefit of inlining – Keeps track of indirect benefit of inlining – Effective in an incrementally compiled language 49 “Advanced Compiler Techniques”
Study on Real Compilers Cooper, Hall, Torczon (92) Eight Programs, five compilers, five processors Eliminated 99% of dynamic calls in 5 of the programs Measured speed of original vs. transformed code What do you expect? V.S. 50
“Advanced Compiler Techniques” Results on real compilers 51
“Advanced Compiler Techniques” What happened? Input code violated assumptions made by compiler writers Longer procedures More names Different code shapes Exacerbated problems that are unimportant on “ normal ” code Imprecise analysis Algorithms that scale poorly Tradeoffs between global and local speed Limitations in the implementations The compiler writers were surprised ! 52
“Advanced Compiler Techniques” Inlining: Summary Pros + Exposes context & side effects + Simple Cons - Code bloat ( bad for caches, branch predictor ) - Can ’ t decide statically for OOPs - Library source? - Recursion? - How do we decide when to inline? 53
“Advanced Compiler Techniques” Alternatives to IPA: Cloning Cloning : customize procedure for certain call sites Partition call sites to procedure p into equivalence classes e. g., {{ call 3, call 1 }, { call 4 }} Equivalence based on optimization Constant propagation : partition based on parameter value 54
“Advanced Compiler Techniques” Cloning Pros + Compromise between inlining & IPA + Less code bloat compared to inlining + No problem with recursion + Better caller / callee optimization potential ( compared to IPA ) Cons - Some code bloat ( compared to IPA ) - May have to do interprocedural analysis anyway e. g. Interprocedural constant propagation can guide cloning 55
“Advanced Compiler Techniques” Summary Interprocedural analysis Difficult but expensive Need source code, recompilation analysis Trade - offs for precision & speed / space Better than inlining Useful for many optimizations IPA and cloning likely to become more important Java : many small procedures 56
Summary Most compilers avoid interprocedural analysis – It ’ s expensive and complex – Not beneficial for most classical optimizations – Separate compilation + interprocedural analysis requires recompilation analysis [ Burke and Torczon ’93] – Can ’ t analyze library code When is it useful? – Pointer analysis – Constant propagation – Object oriented class analysis – Security and error checking – Program understanding and re - factoring – Code compaction – Parallelization 57 “Advanced Compiler Techniques” “ Modern ” Uses of Compile rs {
Trends Cost of procedures is growing – More of them and they ’ re smaller ( OO languages ) – Modern machines demand precise information ( memory op aliasing ) Cost of inlining is growing – Code bloat degrades efficacy of many modern structures – Procedures are being used more extensively Programs are becoming larger Cost of interprocedural analysis is shrinking – Faster machines – Better methods 58 “Advanced Compiler Techniques”
Next Time Homework Convert program to SSA form Exercise Pointer Analysis Reading : Dragon chapter 12 Mid - term Review 59