Download presentation
Presentation is loading. Please wait.
1
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982
2
Motivation n Before the register allocation phase, the compiler assumes that there are an unlimited number of general purpose registers n The symbolic registers must be mapped to real registers in a way that avoids conflicts n Symbolic registers that cannot be mapped to real registers must be spilled to memory n We need an algorithm to map registers with minimal spilling cost
3
Paper Overview n Register allocation overview n Subsumption algorithm n Interference graph coloring algorithm n Spilling algorithm
4
Register Allocation Steps 1. Determine which registers are live at any point in the intermediate language (IL) program 2. Build a register interference graph nNodes represent symbolic registers nEdges represent a conflict between symbolic registers 3. Subsumption: eliminate unnecessary register copies 4. Find a 32-coloring of the interference graph 5. Decide which registers to spill if necessary
5
Subsumption n If the source and destination of a register copy do not interfere, they may be coalesced into a single node n For each register copy in IL, determine whether the registers interfere n If not, coalesce the two nodes into one n After first pass, rewrite IL code n Repeat until no more coalescing is possible
6
Subsumption Example InstructionsLiveDead A = 1A B = AB B = B + 1 C = BCB D = ADA …C, D AB CD
7
Subsumption Example InstructionsLiveDead AD = 1AD BC = ADBC BC = BC + 1 …AD, BC ADBC
8
Finding a 32-Coloring n Each symbolic register is assigned a color representing a real register n If no adjacent nodes have the same color, then the coloring succeeds n Assume that G has a node N with degree < 32 n Then G is 32-colorable iff the reduced graph from which N and all its edges have been omitted is 32- colorable n Algorithm throws away nodes of degree < 32 until all nodes have been removed n Algorithm fails if no node has degree < 32
9
3-coloring example InstructionsLiveDead A = 1A B = 2B C = 3C ? = AA D = 4D ? = BB ? = CC ? = DD AB CD
10
Spilling n If the 32-coloring fails, then nodes must be spilled to memory n Spilled registers are stored to memory, then loaded momentarily when their results are needed n Every time spill code is generated, the interference graph must be rebuilt n Usually recoloring succeeds after spilling, but sometimes several passes are required
11
Spilling n NP-Complete problem n Heuristic: spill the node that minimizes –Cost of spilling / Degree of node n Cost of spilling –(number of definition points + number of use points) * frequency of each point n In some cases, spilled node can be reloaded for an extended interval
12
Conclusion n The graph coloring and spilling algorithms should produce faster code n The register allocation algorithm is efficient –Graph coloring is (N) –But uses (N 2 ) space
13
Compile-time Copy Elimination Peter Schnorf Mahadevan Ganapathi John Hennessy Stanford, 1993
14
Motivation n Single assignment languages simplify dependency checking n Which simplifies automatic detection and exploitation of parallelism n But single-assignment languages require a large number of copies n Previous implementations eliminate copies at runtime n Increased efficiency if copies can be eliminated at compile time
15
Paper Overview n Single-assignment languages n Code generation n Compile-time copy elimination techniques –Substitution –Pattern matching –Substructure sharing –Substructure targeting n Results – success! –Eliminated all copies in bubble sort
16
Single-assignment languages n Functional languages (LISP, Haskell, SISAL) n Simpler dependency checking –True dependencies – write, read < b = f(c), a = f(b) –Anti-dependencies – read, write < a = f(b), b = f(c) –Output dependencies – write, write < a = f(b), a = f(c) –Aliasing < caused by pointers, array indexes n To avoid aliasing, all inputs and outputs are passed by value
17
Example – Swap(A,i,j) n Data flow diagram –Edges transport values –Simple nodes are operations n Pick any feasible node evaluation order at random n Naïve implementation –Each edge has its own memory –Swap uses 5 array copies! n Optimized implementation –Swap array updates are done in- place AElement AReplace Input
18
Example: BubbleSort(A) n Compound nodes represent control flow n Loops are implemented using recursion to avoid multiple assignment of the iteration variable n Naïve implementation –Bubble sort requires (n 2 ) array copies n Optimized implementation –All array updates are done in place –But parallelism is decreased
19
Code Generation Overview n Input is from compiler front-end –IF1: intermediate data-flow graph representation n Code generator eliminates copies n Output is in C –Compiled into machine code using an optimized C compiler
20
Vertical Substitution n If input and output have the same type and size, they can share memory –Updates are done in-place AElement AReplace Input 4 3 21
21
Horizontal Substitution n If an output has several destinations, the output edges can share memory AElement AReplace Input 4 3 21
22
Horizontal and Vertical Substitution n Horizontal and vertical substitution can interfere with each other –A node along the substitution chain modifies the shared object before its last use n Edges can be marked as read-only if they are shared and this is not the last use
23
Horizontal and Vertical Substitution AElement AReplace Input 4 3 21 AElement AReplace Input 4 2 31
24
Interprocedural Substitution n Previous discussion concerned simple nodes that can be analyzed at compiler design time n Information about a function is needed in order to use substitution –Does the function modify an input? –Will an input be chained to an output?
25
Intersubgraph Substitution n Substitution analysis is done for each construct n Same basic principles
26
Determining the Evaluation Order n Evaluation order can impact efficiency of substitution n Naïve implementation selects the next node to evaluate at random n Hints tell algorithm which nodes should be evaluated before and after other nodes if possible n Hints are ad hoc?
27
Pattern Matching n Replace hard-to-optimize pieces of code n Patterns are language-specific n Patterns are detected using “ad hoc” methods
28
Substructure Sharing n Allow substructures to be referenced without copies n AElement can be treated as a NoOp n Happens after substitution analysis – less important n Same principles as substitution analysis
29
Substructure Targeting n Allow structures to be built from substructures without copies n Similar to substructure sharing
30
Results n Compared optimizations versus naïve implementation n Optimization eliminate all copies for bubble sort n Informal comparison to run-time optimizer shows improvements
31
Results
32
Conclusions n Substitution, pattern matching and substructure sharing can almost eliminate unnecessary copies in a single assignment language. n Copy elimination no longer has to be done at run-time. n Single assignment languages should be more efficient for parallel programs.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.