Interprocedural Optimization — a much older version of the lecture — Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.

Interprocedural Optimization — a much older version of the lecture — Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Comp 512 Spring 2011 1COMP 512, Rice University

2 Enough Analysis, What Can We Do? There are interprocedural optimizations Choosing custom procedure linkages Interprocedural common subexpression elimination Interprocedural code motion Interprocedural register allocation Memo-function implementation Cross-jumping Procedure recognition & abstraction

COMP 512, Rice University3 Linkage Tailoring Should choose the best linkage for circumstances Inline, clone, semi-open, semi-closed, closed Estimate execution frequencies & improvements Assign styles to call sites  The choices interact

COMP 512, Rice University4 Linkage Tailoring Linkage styles Traditional closed linkage p q p q Open linkage (inlined ) Eliminate the call overhead Create tailored private copy of callee

COMP 512, Rice University5 Linkage Tailoring Linkage styles Procedure Cloning t sp q0q0 q1q1 r Partition call sites by environment Create a location where desired facts can be true

COMP 512, Rice University6 Linkage Tailoring Linkage styles Traditional closed linkage p q Semi-open linkage Move prolog & epilog across call & tailor them For call in loop, parts of linkage are loop invariant p q Think of this as an aggressive closed linkage

COMP 512, Rice University7 Linkage Tailoring Should choose the best linkage for circumstances Inline, clone, semi-open, semi-closed, closed Estimate execution frequencies & improvements Assign styles to call sites Practical approach Limit choices ( standard, cloned, inlined ) Clone for better information & to specialize ( based on idfa ) Inline for high-payoff optimizations Adopt an aggressive standard linkage  Move parameter addressing code out of callee ( & out of loop ) The choices interact

COMP 512, Rice University8 Linkage Tailoring A low-level, post-compilation idea Compute register summaries for each procedure  How many registers are really used? Rotate assignment to move them into caller-saves registers  Leave callee-saves registers unused, if possible  Avoid saving them in the callee Leaf routine with little demand for registers Adjust caller-saves/callee-saves boundary Remove caller’s stores for registers unused in callee Some authors call this interprocedural register allocation

COMP 512, Rice University9 Interprocedural Register Allocation Some work has been done Chow’s compiler for the M IPS machine, “shrink wrapping”  Often slowed down the code Wall and Goodwin at D EC S RC did link-time allocation  Target machine had scads of registers  Essentially, adjusted caller-saves & callee-saves What about full-blown, allocate the whole thing, allocation? Arithmetic of costs is pretty complex Requires good profile or frequency information Need a fair basis for comparing different uses for r i Real issue would be spilling ( “spill everywhere” would be a disaster )

COMP 512, Rice University10 Interprocedural Common Subexpression Elimination Sounds good, but consider the real opportunities Procedures only share parameters, globals, & constants No local variables in an I CSE, by definition Not a very large set of expressions Possible schemes Create a global data area to hold I CSE s  Sidestep register pressure issue Ellide unnecessary parameters or pass-through parameters  “commoning” in Cocke’s terminology  Shrink the pre-call sequence

COMP 512, Rice University11 Invocation Invariant Expressions Do iie’s exist? (yes) Is moving them profitable? (it should be) Can we engineer this into a compiler? (this is tougher) 8.9% to 0.2% 1.66% on average Static counts, not dynamic counts

COMP 512, Rice University12 Interprocedural Code Motion What could we do? What could we find? Find and mark loop nests in the call graph Compute interprocedural A VAIL ? Move code across procedure boundaries Two ideas Invocation invariant expressions  Expression whose value is determined at point of call  Hoist them to the prolog, or hoist them across the call Loop embedding and extraction  Move control-flow for entire nest into one procedure  Enable optimization using intra-procedural techniques } All are difficult

COMP 512, Rice University13 Interprocedural Code Motion Moving loops across a call do i = 1 to 100 do j = 1 to 100 call fee(a,b,c,i,j) end fee(x,y,z,i,j) do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end call fee(a,b,c) fee(x,y,z) do i = 1 to 100 do j = 1 to 100 do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end Loop Embedding Think of this as dual of partial inlining ( partial outlining?)

COMP 512, Rice University14 Interprocedural Code Motion Moving loops across a call do i = 1 to 100 do j = 1 to 100 call fee(a,b,c,i,j) end fee(x,y,z,i,j) do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end do i = 1 to 100 do j = 1 to 100 do k = 1 to 100 call fee(a,b,c,i,j,k) end fee(x,y,z,i,j,k) x(i,j) = x(i,j) + y(i,k) * z(k,j) Loop Extraction Think of this as partial inlining

COMP 512, Rice University15 Memo-function implementation Idea Find pure functions & turn them into hashed lookups Implementation Use interprocedural analysis to identify pure functions Insert stub with lookup between call & evaluation Benefits Replace evaluations with table lookup Potential for substantial run-time savings Should share table implementation with other functions

COMP 512, Rice University16 Cross-jumping Idea Procedure epilogs come in two flavors  Returned value & no returned value Eliminate duplicates & save space Implementation At start of each block, compare ops before predecessor branch If identical, move it across the branch Repeat until code stops changing Presents new challenges to the debugger

COMP 512, Rice University17 Procedure Abstraction Idea Reocgnize common instruction sequences Replace them with (very) cheap calls Experience Need to abstract register names & local labels Use suffix trees ( bio-informatics ) Produces smaller code with longer paths  ~ 1% slower for each 2% smaller Wreaks havoc with the debugger Cooper & McIntosh, P LDI 99, May 1999

COMP 512, Rice University18 Putting It All Together Key questions that a compiler writer must answer? What should the optimizer & back end do? How should the compiler represent the program? Broad answers Figure out what the real problems are Learn everything you can about the target machine Attack each problem with a specific plan Separate concerns as long as possible

COMP 512, Rice University19 Lessons Experience is a tough teacher Understand what the code looks like  On input to the compiler and on output from the compiler  Hard to see problems in large procedures A good compiler need not do everything  Study the code & figure out what’s needed  Do those things well & do them thoroughly Hand simulation before implementation pays off  Avoid implementing things that solve no problem Find the right level of abstraction for each optimization  Constant propagation can be done at source level  Redundancy elimination should be done at a low-level

COMP 512, Rice University20 Lessons Experience is a tough teacher Intermediate representations are important  Determines level of exposed detail  Every pass traverses it, so efficient manipulation is critical Code shape and name spaces are important  Determine opportunities & size of data structures  Has large impact on effectiveness of algorithms Compile-time space counts  Limits size of procedures that can be compiled  Compiler touches all that space  More complex analyses take lots of space

COMP 512, Rice University21 And that’s the end of my story... Comp 512

COMP 512, Rice University22 Lessons Experience is a tough teacher Separation of concerns is vital to progress  Resource constraints from rearrangement  Allocation from optimization  Parsing from code generation Separation of concerns interferes with optimization  Allocation & scheduling  Evaluation order & redundancy elimination  Code shape decisions affect later passes { Loop shape, case stmts. Annotations ( ILOC tags)

Interprocedural Optimization — a much older version of the lecture — Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.

Similar presentations

Presentation on theme: "Interprocedural Optimization — a much older version of the lecture — Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Interprocedural Optimization — a much older version of the lecture — Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.

Similar presentations

Presentation on theme: "Interprocedural Optimization — a much older version of the lecture — Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled."— Presentation transcript:

Similar presentations

About project

Feedback