Global Register Allocation via Graph Coloring Comp 412

Global Register Allocation via Graph Coloring Comp 412
FALL 2010 Global Register Allocation via Graph Coloring Comp 412 This lecture focuses on the Chaitin-Briggs approach, which EaC calls the bottom-up global algorithm. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved.

Global Register Allocation
Taking a global approach Abandon the distinction between local & global Make systematic use of registers or memory Adopt a general scheme to approximate a good allocation Graph coloring paradigm (Lavrov & (later) Chaitin ) Build an interference graph GI for the procedure Computing LIVE is harder than in the local case GI is not an interval graph (try to) construct a k-coloring Minimal coloring is NP-Complete Spill placement becomes a critical issue Map colors onto physical registers Comp 412, Fall 2010

Graph Coloring (A Background Digression)
The problem A graph G is said to be k-colorable iff the nodes can be labeled with integers 1 … k so that no edge in G connects two nodes with the same label Examples Each color can be mapped to a distinct physical register 2-colorable 3-colorable Comp 412, Fall 2010

Building the Interference Graph
What is an “interference” ? (or conflict) Two values interfere if there exists an operation where both are simultaneously live If x and y interfere, they cannot occupy the same register To compute interferences, we must know where values are “live” The interference graph, GI = (NI,EI) Nodes in GI represent values, or live ranges Edges in GI represent individual interferences For x, y  NI, <x,y>  EI iff x and y interfere A k-coloring of GI can be mapped into an allocation to k registers Comp 412, Fall 2010

The Worklist Iterative Algorithm
Computing LIVE Sets The compiler can solve these equations with a simple algorithm The world’s quickest introduction to data-flow analysis ! WorkList  { all blocks } while ( WorkList ≠ Ø) remove a block b from WorkList Compute LIVEOUT(b) Compute LIVEIN(b) if LIVEIN(b) changed then add pred (b) to WorkList Why does this work? LIVEOUT, LIVEIN  2Names UEVAR, VARKILL are constants for b Equations are monotone Finite # of additions to sets will reach a fixed point ! Speed of convergence depends on the order in which blocks are “removed” & their sets recomputed The Worklist Iterative Algorithm Comp 412, Fall 2010

Observation on Coloring for Register Allocation
Suppose you have k registers—look for a k coloring Any vertex n that has fewer than k neighbors in the interference graph (n < k) can always be colored ! Pick any color not used by its neighbors — there must be one Ideas behind Chaitin’s algorithm: Pick any vertex n such that n< k and put it on the stack Remove that vertex and all edges incident from the interference graph This may make additional nodes have fewer than k neighbors At the end, if some vertex n still has k or more neighbors, then spill the live range associated with n Otherwise successively pop vertices off the stack and color them in the lowest color not used by some neighbor Comp 412, Fall 2010

Chaitin’s Algorithm While  vertices with < k neighbors in GI
Pick any vertex n such that n< k and put it on the stack Remove that vertex and all edges incident to it from GI If GI is non-empty (all vertices have k or more neighbors) then: Pick a vertex n (using some heuristic) and spill the live range associated with n Remove vertex n from GI , along with all edges incident to it and put it on the “spill list” If this causes some vertex in GI to have fewer than k neighbors, then go to step 1; otherwise, repeat step 2 If the spill list is not empty, insert spill code, then rebuild the interference graph and try to allocate, again Otherwise, successively pop vertices off the stack and color them in the lowest color not used by some neighbor Lowers degree of n’s neighbors Comp 412, Fall 2010

Chaitin’s Algorithm in Practice
3 Registers 2 3 1 4 5 Stack 1 is the only node with degree < 3 Comp 412, Fall 2010

3 Registers 2 4 5 3 1 Stack Now, 2 & 3 have degree < 3 Comp 412, Fall 2010

3 Registers 4 5 3 2 1 Stack Now all nodes have degree < 3 Comp 412, Fall 2010

3 Registers 5 4 3 2 1 Stack Comp 412, Fall 2010

3 Registers Colors: 1: 5 3 2: 4 2 3: 1 Stack Comp 412, Fall 2010

3 Registers Colors: 1: 5 2: 4 3 2 3: 1 Stack Comp 412, Fall 2010

3 Registers Colors: 1: 4 5 2: 3 2 3: 1 Stack Comp 412, Fall 2010

3 Registers Colors: 2 1: 4 5 2: 3 3: 1 Stack Comp 412, Fall 2010

3 Registers Colors: 2 1: 1 4 5 2: 3 3: Stack Comp 412, Fall 2010

Improvement in Coloring Scheme
Optimistic Coloring If Chaitin’s algorithm reaches a state where every node has k or more neighbors, it chooses a node to spill. Briggs said, take that same node and push it on the stack When you pop it off, a color might be available for it! For example, a node n might have k+2 neighbors, but those neighbors might only use 3 (<k) colors Degree is a loose upper bound on colorability Chaitin’s algorithm immediately spills one of these nodes 2 Registers: Comp 412, Fall 2010 Briggs et al, PLDI 89 (Also, TOPLAS 1994)

Improvement in Coloring Scheme
Optimistic Coloring If Chaitin’s algorithm reaches a state where every node has k or more neighbors, it chooses a node to spill. Briggs said, take that same node and push it on the stack When you pop it off, a color might be available for it! For example, a node n might have k+2 neighbors, but those neighbors might only use just one color (or any number < k ) Degree is a loose upper bound on colorability Briggs algorithm finds an available color 2 Registers: 2-Colorable Comp 412, Fall 2010

Other Improvements to Chaitin-Briggs
Spilling partial live ranges [Bergner Pldi 97] Bergner introduced interference region spilling Limits spilling to regions of high demand for registers Splitting live ranges [Simpson CC 98, Eckhardt Icplc 05] Simple idea — break up one or more live ranges Allocator can use different registers for distinct subranges Allocator can spill subranges independently (use 1 spill location) Iterative coalescing [George & Appel] Use conservative coalescing because it is “safe” Simplify the graph until only non-trivial nodes remain Coalesce & try again If coalescing does not reveal trivial nodes, then spill Comp 412, Fall 2010

Chaitin-Briggs Allocator (Bottom-up Global)
Strengths & Weaknesses Precise interference graph Strong coalescing mechanism Handles register assignment well Runs fairly quickly Known to overspill in tight cases Interference graph has no geography Spills a live range everywhere Long blocks devolve into spilling by use counts Is improvement still possible ? Rising spill costs, aggressive transformations, & long blocks  yes, but the returns are getting rather small Comp 412, Fall 2010

What about Top-down Coloring?
The Big Picture Use high-level priorities to rank live ranges Allocate registers for them in priority order Use coloring to assign specific registers to live ranges The Details Separate constrained from unconstrained live ranges A live range is constrained if it has ≥ k neighbors in GI Color constrained live ranges first Reserve pool of local registers for spilling (or spill & iterate) Chow split live ranges before spilling them Split into block-sized pieces Recombine as long as  k Use spill costs as priority function ! Unconstrained must receive a color ! Peixotto’s 2007 MS thesis shows that top-down, in general, produces worse results unless we add an (expensive) adaptive feedback loop Comp 412, Fall 2010

What about Top-down Coloring?
The Big Picture Use high-level priorities to rank live ranges Allocate registers for them in priority order Use coloring to assign specific registers to live ranges More Details Chow used an imprecise interference graph <x,y>  GI  x,y  Live(b) for some block b Cannot coalesce live ranges since xy  <x,y>  GI Quicker to build imprecise graph Chow’s allocator may run faster on small codes, where demand for registers is also likely to be lower Comp 412, Fall 2010

Linear Scan Allocation
Approximate Global Allocation Linear Scan Allocation Coloring allocators are often viewed as too expensive for use in JIT environments, where compile time occurs at runtime Linear scan allocators use an approximate interference graph and a version of the bottom-up local algorithm Interference graph is an interval graph Optimal coloring (without spilling) in linear time Spilling handled well by bottom-up local allocator Algorithm does allocation in a “linear” scan of the graph Linear scan produces faster, albeit less precise, allocations Linear scan allocators hit a different point on the curve of cost versus performance Live Ranges in LS Interference graph of a set of intervals is an interval graph. Comp 412, Fall 2010 Sun’s HotSpot server compiler uses a complete Chaitin-Briggs allocator.

Linear Scan Allocation
Building the Interval Graph Consider the procedure as a linear list of operations A live range for some name is an interval (x,y) x and y are the indices of two operations in the list, with x < y Every operation where name is live falls between x & y, inclusive Precision of live computation can vary with cost Interval graph overestimates interference The Algorithm Use Best’s algorithm — bottom-up local Distance to next use is well defined Algorithm is fast & produces reasonable allocations Variations have been proposed that build on this scheme Comp 412, Fall 2010

Global Coloring from SSA Form
Observation: The interference graph of a program in SSA form is a chordal graph. Observation: Chordal graphs can be colored in O(N ) time. These two facts suggest allocation using an interference graph built from SSA Form Chaitin-Briggs works from live ranges that qre a coalesced version of SSA names SSA allocators use raw SSA names as live ranges Allocate live ranges, then insert copies for φ-functions SSA-based allocation has created a lot of excitement in the last couple of years. Chordal Graph Every cycle of length > 3 has a chord Comp 412, Fall 2010

SSA Name Space SSA encodes facts about flow of values into the name space Two principles Each name is defined by exactly one operation Each operand refers to exactly one definition To reconcile these principles with real code Add subscripts to variable names for uniqueness Insert -functions at merge points to reconcile name space x0  ... x1  ... x2 (x0,x1)  x x  ... ...  x + ... becomes Comp 412, Fall 2010

SSA Name Space These -functions are unusual constructs …
A -function only occurs at the start of a block A -function has one argument for each CFG edge entering the block A -function returns the argument that corresponds to the edge along which control flow entered the block All -functions in the block execute concurrently Since machines do not support -functions, must translate back out of SSA form before we produce executable code All -functions in a block execute concurrently All read their argument, all perform assignment in parallel Using SSA form leads to simpler or better formulations of many optimizations (alternative to global data-flow analysis ) Comp 412, Fall 2010

Building SSA SSA Form Each name is defined exactly once
Each use refers to exactly one name What’s Hard? Straight-line code is easy Split points are easy Merge points are hard (Sloppy) Construction Algorithm Insert a -function for each variable at each merge point Rename all values for uniqueness (using subscripts ) This approach Inserts too many -functions Inserts -functions in too many places The rest, however, is optimization & beyond the scope of today’s lecture. (See §9 in EaC) Back Comp 412, Fall 2010

Coloring from SSA Names has its advantages If graph is k-colorable, it finds the coloring (Opinion ) An SSA-based allocator will find more k-colorable graphs than a live-range based allocator because SSA names are shorter and, thus, have fewer interferences. Allocator should be faster than a live-range allocator Cost of live analysis folded into SSA construction, where it is amortized over other passes Biggest expense in Chaitin-Briggs is the Build-Coalesce phase, which SSA allocator avoids, as it destroys the chordal graph Comp 412, Fall 2010

Coloring from SSA Names has its disadvantages Coloring is rarely the problem Most non-trivial codes spill; on trivial codes, both SSA allocator and classic Chaitin-Briggs are overkill. (Try linear scan?) SSA form provides no obvious help on spilling Shorter live ranges will produce local spilling (good & bad) May increase spills inside loops After allocation, code is still in SSA form Need out-of-SSA translation Introduce copies after allocation Swap problem may require and extra register Must run a post-allocation coalescing phase Algorithms exist that do not use an interference graph They are not as powerful as the Chaitin-Briggs coalescing phase Loop-carried value cannot spill before the loop, since its name is only live inside the loop and after the loop. Comp 412, Fall 2010

Hybrid Approach ? How can the compiler attain both speed and precision? Observation: lots of procedures are small & do not spill Observation: some procedures are hard to allocate Possible solution: Try different algorithms First, try linear scan It is cheap and it may work If linear scan fails, try heavyweight allocator of choice Might be Chaitin-Briggs, SSA, or some other algorithm Use expensive allocator only when cheap one spills This approach would not help with the speed of a complex compilation, but it might compensate on simple compilations Comp 412, Fall 2010

An Even Stronger Global Allocator
Hierarchical Register Allocation (Koblenz & Callahan) Analyze control-flow graph to find hierarchy of tiles Perform allocation on individual tiles, innermost to outermost Use summary of tile to allocate surrounding tile Insert compensation code at tile boundaries (LRxLRy) Anecdotes suggest it is fairly effective Target machine is multi-threaded multiprocessor (Tera MTA) Strengths Decisions are largely local Use specialized methods on individual tiles Allocator runs in parallel Weaknesses Decisions are made on local information May insert too many copies Still, a promising idea Eckhardt’s MS (Rice, 2005) shows that K&C produces better allocations than C&B, but is much slower Comp 412, Fall 2010

Regional Approaches to Allocation
Probabilistic Register Allocation (Proebsting & Fischer) Attempt to generalize from Best’s algorithm (bottom-up, local ) Generalizes “furthest next use” to a probability Perform an initial local allocation using estimated probabilities Follow this with a global phase Compute a merit score for each LR as (benefit from x in a register = probability it stays in a register) Allocate registers to LRs in priority order, by merit score, working from inner loops to outer loops Use coloring to perform assignment among allocated LRs Little direct experience (either anecdotal or experimental) Combines top-down global with bottom-up local This idea predated Linear Scan and tried to achieve many of the same benefits. Comp 412, Fall 2010

Regional Approaches to Allocation
Register Allocation via Fusion (Lueh, Adl-Tabatabi, Gross) Use regional information to drive global allocation Partition CFGs into regions & build interference graphs Ensure that each region is k-colorable Merge regions by fusing them along CFG edges Maintain k-colorability by splitting along fused edge Fuse in priority order computed during the graph partition Assign registers using interference graphs i.e., execution frequency Strengths Flexibility Fusion operator splits on low-frequency edges Weaknesses Choice of regions is critical Breaks down many values are live across region boundaries Comp 412, Fall 2010

Extra Slides Start Here
Comp 412, Fall 2010

Global Register Allocation via Graph Coloring Comp 412

Similar presentations

Presentation on theme: "Global Register Allocation via Graph Coloring Comp 412"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Global Register Allocation via Graph Coloring Comp 412

Similar presentations

Presentation on theme: "Global Register Allocation via Graph Coloring Comp 412"— Presentation transcript:

Similar presentations

About project

Feedback