Presentation is loading. Please wait.

Presentation is loading. Please wait.

Profile Guided Code Positioning C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.

Similar presentations


Presentation on theme: "Profile Guided Code Positioning C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved."— Presentation transcript:

1 Profile Guided Code Positioning C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. See paper of the same name by Karl Pettis & Robert C. Hansen in PLDI 90, SIGPLAN Notices 25(6), pages 16–27

2 COMP 512, Fall 20032 The Problem The placement of program text in memory matters Large working sets cause excessive TLB & page misses Bad placement can increase instruction cache misses  Pettis & Hansen report that on some benchmarks, 1 of 3 cycles was a cache miss (for PA-RISC ) Random placement leaves these effects to chance The plan Discover execution-time paths Rearrange the code to keep those paths in contiguous memory Make heavy use of execution profiles

3 COMP 512, Fall 20033 Does this work? Examples within HP Pascal compiler  Moved frequently executed blocks to top of procedure  40% reduction in instruction cache misses  5% improvement in compiler’s running time Fortran compiler  Rearranged object files before linking  Attempt to improve locality on calls  20% system throughput improvement So, they believed...

4 COMP 512, Fall 20034 Two Major Issues Procedure placement If A calls B, would like A & B in adjacent locations  On same page means smaller working set  adjacent locations limit I-cache conflicts Unfortunately, many procedures might call B (& A) This is an issue for the linker Block placement Same effects occur on a smaller scale Fall through branches create an additional incentive Rarely executed code fills up the cache, too! This is an issue for the compiler & optimizer Both could be handled in a binary-to-binary postpass.

5 COMP 512, Fall 20035 Procedure Placement Simple principles Build the call graph Annotate edges with execution frequencies Use “closest is best” placement  A calls B most often  place A next to B  Keeps branches short ( advantage on PA-RISC )  Direct mapped I-cache  A & B unlikely to overlap in I-cache Profiling the call graph Linker inserts a stub for each call that bumps a counter Counters are kept in statically initialized storage ( set to zero ) Adds overhead to execution, but only in training runs As much as 80 to 98% reduction in executed long branches Ignored indirect calls (through a pointer)

6 COMP 512, Fall 20036 Procedure Placement Computing an order Combine all edges from A to B Select highest weight edge, say X  Y  Combine X & Y, along with their common edges, X  Z & Y  Z  Place X next to Y Repeat until graph cannot be reduced further X Y Z 10 2 4 XY Z 6 May have disconnected subgraphs Must add new procedures at end > W  X and Y  Z with WZ & XY > Use weights in original graph > Largest weight closest The algorithm is straight forward

7 COMP 512, Fall 20037 Block Placement Targets branches with unequal execution frequencies Make likely case the “fall through” case Move unlikely case out-of-line & out-of-sight Potential benefits Longer branch-free code sequences More executed operations per cache line Denser instruction stream  fewer cache misses Moving unlikely code  denser page use & fewer page faults

8 COMP 512, Fall 20038 Block Placement Moving infrequently executed code B1B1 B4B4 B2B2 B3B3 B1B1 B4B4 B2B2 B3B3 1000 1 1 Unlikely path gets fall through (cheap) case Likely path gets an extra branch Would like this to become B1B1 B4B4 B3B3 B2B2 Long distance In another page,... This branch goes away Denser instruction stream

9 COMP 512, Fall 20039 Block Placement Principles Goal is to eliminate taken branches  Build up traces – single paths Work from profile data  Edges are better than blocks Use a greedy, bottom-up strategy to combine blocks Gathering profile data Insert code to count edges Split critical edges Use name mangling to separate data for different procedures

10 COMP 512, Fall 200310 Block Placement The Idea Form chains that should be placed to form straight-line code The Algorithm 1.Make each block a degenerate chain & set its priority to # blocks 2.P  1 3.  edge e = in the CFG, in order by decreasing frequency if x is the tail of a chain, a, and y is the head of a chain, b then merge a and b else set priority(y) to min(priority(y),P ++) { Point is to place targets after their sources, to make forward branches

11 COMP 512, Fall 200311 Block Placement Now, to lay out the code WorkList  chain containing the entry node, n 0 While (WorkList ≠ Ø) Pick the chain c with lowest priority(c) from WorkList Place it next in the code  edge leaving c add z to WorkList Intuitions Entry node first Tries to make edge from chain i to chain j a forward branch  Predicted not-taken on target machine  Edge remains only if it is lower probability choice

12 COMP 512, Fall 200312 Going Further – Procedure Splitting Any code that has zero profile is “fluff” Move fluff into the distance  It rarely executes  Get more useful operations into I cache  Increase effective density of I cache Slower execution for rarely executed code Implementation Create a linkage-less procedure with an invented name Give it a priority that the linker will sort to the code’s end Replace the branch with a call ( a stub that does the call )  Branch to call at the end of the procedure to maintain density

13 COMP 512, Fall 200313 Putting It Together Procedure placement is done in the linker Block placement is done in the optimizer  Allows branch elision due to fluff, other tailoring Speedups ranged from 2 to 10%, depending on cache size This idea became popular on early 1990s PCs  Long cache lines  Slow page faults  Microsoft insiders suggested it was most important optimization for codes like Office ( Word, PowerPoint, Excel )


Download ppt "Profile Guided Code Positioning C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved."

Similar presentations


Ads by Google