Global Register Allocation via Graph Coloring Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.

Slides:

Advertisements

Similar presentations

Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)

Advertisements

Register Allocation Consists of two parts: Goal : minimize spills

8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

Register allocation Morgensen, Torben. "Register Allocation." Basics of Compiler Design. pp from (

Register Allocation Zach Ma.

Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)

Coalescing Register Allocation CS153: Compilers Greg Morrisett.

COMPILERS Register Allocation hussein suleman uct csc305w 2004.

Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.

A Deeper Look at Data-flow Analysis Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University.

Program Representations. Representing programs Goals.

SSA-Based Constant Propagation, SCP, SCCP, & the Issue of Combining Optimizations 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon,

Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.

6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.

1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.

1 Register Allocation Consists of two parts: –register allocation What will be stored in registers –Only unambiguous values –register assignment Which.

Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.

Register Allocation (via graph coloring)

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.

Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.

1 Liveness analysis and Register Allocation Cheng-Chia Chen.

Improving Code Generation Honors Compilers April 16 th 2002.

Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.

4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Code Optimization, Part III Global Methods Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.

Structural Data-flow Analysis Algorithms: Allen-Cocke Interval Analysis Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students.

Introduction to Optimization, II Value Numbering & Larger Scopes Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.

Graph-Coloring Register Allocation: Beyond the Treatment in C OMP 412 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Register Allocation John Cavazos University.

CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.

Global Register Allocation, Part II Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.

Building SSA Form, III 1COMP 512, Rice University This lecture presents the problems inherent in out- of-SSA translation and some ways to solve them. Copyright.

Global Redundancy Elimination: Computing Available Expressions Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.

Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,

Global Register Allocation via Graph Coloring Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.

Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.

Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.

Boolean & Relational Values Control-flow Constructs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.

Register Allocation CS 471 November 12, CS 471 – Fall 2007 Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.

Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.

2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.

1 Liveness analysis and Register Allocation Cheng-Chia Chen.

Global Register Allocation Based on

Introduction to Optimization

Register Allocation Hal Perkins Autumn 2009

Introduction to Optimization

Register Allocation Hal Perkins Autumn 2011

Introduction to Code Generation

Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.

Global Register Allocation, Part II

Register Allocation Hal Perkins Summer 2004

Global Register Allocation via Graph Coloring Comp 412

Introduction to Optimization

Lecture 16: Register Allocation

Lecture 17: Register Allocation via Graph Colouring

The Partitioning Algorithm for Detecting Congruent Expressions COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper.

Fall Compiler Principles Lecture 13: Summary

CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019

(via graph coloring and spilling)

Presentation transcript:

Global Register Allocation via Graph Coloring Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. COMP 412 FALL 2010 This lecture focuses on the Chaitin- Briggs approach, which EaC calls the bottom-up global algorithm.

Notes on the Final Exam Closed-notes, closed-book exam Exam available Wednesday. Three hour time limit —I aimed for a two-hour exam, but I don’t want you to feel time pressure. You may take one break of up to fifteen minutes apiece. You are responsible for the entire course —Exam focuses primarily on material since the midterm —Chapters 5, 6, 7, 8, 9.1, 9.2, 11, 12, & 13 —All the lecture notes Return the exam to DH 3080 (Penny Anderson’s office) by 5PM on the last day of exams – December 15, 2010 If you must leave, you can me a Word file or a PDF document. Comp 412, Fall 20101

2 Register Allocation Part of the compiler’s back end Critical properties Produce correct code that uses k (or fewer) registers Minimize added loads and stores Minimize space used to hold spilled values Operate efficiently O(n), O(n log 2 n), maybe O(n 2 ), but not O(2 n ) Register Allocation Errors IR Instruction Selection k register asm Instruction Scheduling m register asm m register asm

Comp 412, Fall Global Register Allocation The Big Picture At each point in the code 1 Determine which values will reside in registers 2 Select a register for each such value The goal is an allocation that “minimizes” running time Most modern, global allocators use a graph-coloring paradigm Build a “conflict graph” or “interference graph” Find a k-coloring for the graph, or change the code to a nearby problem that it can k-color Register Allocator m register code k register code Optimal global allocation is NP-Complete, under almost any assumptions.

Comp 412, Fall What Makes Global Register Allocation Hard? What’s harder across multiple blocks? Could replace a load with a move Good assignment would obviate the move Must build a control-flow graph to understand inter-block flow Can spend an inordinate amount of time adjusting the allocation... store r4  x load x  r1... This is an assignment problem, not an allocation problem ! This is an assignment problem, not an allocation problem !

Comp 412, Fall What Makes Global Register Allocation Hard? A more complex scenario Block with multiple predecessors in the control-flow graph Must get the “right” values in the “right” registers in each predecessor In a loop, a block can be its own predecessor This adds tremendous complications... store r4  x load x  r1... store r5  x What if one block has x in a register, but not the other?

Comp 412, Fall Global Register Allocation Taking a global approach Abandon the distinction between local & global Make systematic use of registers or memory Adopt a general scheme to approximate a good allocation Graph coloring paradigm (Lavrov & (later) Chaitin ) 1 Build an interference graph G I for the procedure —Computing LIVE is harder than in the local case —G I is not an interval graph 2 (try to) construct a k-coloring —Minimal coloring is NP-Complete —Spill placement becomes a critical issue 3 Map colors onto physical registers

Comp 412, Fall Graph Coloring ( A Background Digression ) The problem A graph G is said to be k-colorable iff the nodes can be labeled with integers 1 … k so that no edge in G connects two nodes with the same labelExamples Each color can be mapped to a distinct physical register 2-colorable3-colorable

Comp 412, Fall Building the Interference Graph What is an “interference” ? (or conflict) Two values interfere if there exists an operation where both are simultaneously live If x and y interfere, they cannot occupy the same register To compute interferences, we must know where values are “live” The interference graph, G I = (N I,E I ) Nodes in G I represent values, or live ranges Edges in G I represent individual interferences —For x, y  N I,  E I iff x and y interfere A k-coloring of G I can be mapped into an allocation to k registers

Comp 412, Fall Building the Interference Graph To build the interference graph 1 Discover live ranges > Construct the SSA form of the procedure > At each ø -function, take the union of the arguments > Rename to reflect these new “live ranges” 2 Compute L IVE sets over live ranges for each block > Use an iterative data-flow solver > Solve equations for LIVE over domain of live range names 3 Iterate over each block, from bottom to top > Track the current LIVE set > At each operation, add appropriate edges & update LIVE  Add an edge from result to each value in LIVE  Remove result from LIVE  Add each operand to LIVE Update the LIVE sets No SSA yet?

Comp 412, Fall Computing L IVE Sets A value v is live at p if  a path from p to some use of v along which v is not re-defined Data-flow problems are expressed as simultaneous equations L IVE O UT (b) =  s  succ(b) L IVE I N (s) L IVE I N (b) = UEV AR (b)  (L IVE O UT (b)  V AR K ILL (b)) L IVE O UT (n f ) =  where UEV AR (b) is the set of names used in block b before being defined in b V AR K ILL (b) is the set of names defined in b Solve the equations using a fixed-point iterative scheme § in EaC1e § in EaC2e § in EaC1e § in EaC2e

Comp 412, Fall Computing L IVE Sets The compiler can solve these equations with a simple algorithm The world’s quickest introduction to data-flow analysis ! WorkList  { all blocks } while ( WorkList ≠ Ø) remove a block b from WorkList Compute LIVEOUT(b) Compute LIVEIN(b) if LIVEIN(b) changed then add pred (b) to WorkList WorkList  { all blocks } while ( WorkList ≠ Ø) remove a block b from WorkList Compute LIVEOUT(b) Compute LIVEIN(b) if LIVEIN(b) changed then add pred (b) to WorkList The Worklist Iterative Algorithm Why does this work?  L IVE O UT, L IVE I N  2 Names  UEV AR, V AR K ILL are constants for b  Equations are monotone  Finite # of additions to sets  will reach a fixed point ! Speed of convergence depends on the order in which blocks are “removed” & their sets recomputed Why does this work?  L IVE O UT, L IVE I N  2 Names  UEV AR, V AR K ILL are constants for b  Equations are monotone  Finite # of additions to sets  will reach a fixed point ! Speed of convergence depends on the order in which blocks are “removed” & their sets recomputed

Comp 412, Fall Observation on Coloring for Register Allocation Suppose you have k registers—look for a k coloring Any vertex n that has fewer than k neighbors in the interference graph (n  < k) can always be colored ! —Pick any color not used by its neighbors — there must be one Ideas behind Chaitin’s algorithm: —Pick any vertex n such that n  < k and put it on the stack —Remove that vertex and all edges incident from the interference graph  This may make additional nodes have fewer than k neighbors —At the end, if some vertex n still has k or more neighbors, then spill the live range associated with n —Otherwise successively pop vertices off the stack and color them in the lowest color not used by some neighbor

Comp 412, Fall Chaitin’s Algorithm 1. While  vertices with < k neighbors in G I > Pick any vertex n such that n  < k and put it on the stack > Remove that vertex and all edges incident to it from G I 2. If G I is non-empty ( all vertices have k or more neighbors ) then: > Pick a vertex n (using some heuristic) and spill the live range associated with n > Remove vertex n from G I, along with all edges incident to it and put it on the “spill list” > If this causes some vertex in G I to have fewer than k neighbors, then go to step 1; otherwise, repeat step 2 3. If the spill list is not empty, insert spill code, then rebuild the interference graph and try to allocate, again 4. Otherwise, successively pop vertices off the stack and color them in the lowest color not used by some neighbor Lowers degree of n’s neighbors

Comp 412, Fall Chaitin’s Algorithm in Practice Registers Stack 1 is the only node with degree < 3

Comp 412, Fall Chaitin’s Algorithm in Practice Registers Stack 1 Now, 2 & 3 have degree < 3

Comp 412, Fall Chaitin’s Algorithm in Practice Registers Stack 1 2 Now all nodes have degree < 3

Comp 412, Fall Chaitin’s Algorithm in Practice Registers Stack 1 2 4

Comp 412, Fall Chaitin’s Algorithm in Practice 3 Registers Stack Colors: 1: 2: 3:

Comp 412, Fall Chaitin’s Algorithm in Practice 5 3 Registers Stack Colors: 1: 2: 3:

Comp 412, Fall Chaitin’s Algorithm in Practice Registers Stack Colors: 1: 2: 3:

Comp 412, Fall Chaitin’s Algorithm in Practice Registers Stack 1 2 Colors: 1: 2: 3:

Comp 412, Fall Chaitin’s Algorithm in Practice Registers Stack 1 Colors: 1: 2: 3:

Comp 412, Fall Chaitin’s Algorithm in Practice Registers Stack Colors: 1: 2: 3:

Comp 412, Fall Improvement in Coloring Scheme Optimistic Coloring If Chaitin’s algorithm reaches a state where every node has k or more neighbors, it chooses a node to spill. Briggs said, take that same node and push it on the stack —When you pop it off, a color might be available for it! —For example, a node n might have k+2 neighbors, but those neighbors might only use 3 (<k) colors  Degree is a loose upper bound on colorability 2 Registers: Chaitin’s algorithm immediately spills one of these nodes Briggs et al, PLDI 89 (Also, TOPLAS 1994)

Comp 412, Fall Improvement in Coloring Scheme Optimistic Coloring If Chaitin’s algorithm reaches a state where every node has k or more neighbors, it chooses a node to spill. Briggs said, take that same node and push it on the stack —When you pop it off, a color might be available for it! —For example, a node n might have k+2 neighbors, but those neighbors might only use just one color (or any number < k )  Degree is a loose upper bound on colorability 2 Registers: 2-Colorable Briggs algorithm finds an available color

Comp 412, Fall Chaitin-Briggs Algorithm 1. While  vertices with < k neighbors in G I > Pick any vertex n such that n  < k and put it on the stack > Remove that vertex and all edges incident to it from G I  This action often creates vertices with fewer than k neighbors 2. If G I is non-empty ( all vertices have k or more neighbors ) then: > Pick a vertex n (using some heuristic condition), push n on the stack and remove n from G I, along with all edges incident to it > If this causes some vertex in G I to have fewer than k neighbors, then go to step 1; otherwise, repeat step 2 3. Successively pop vertices off the stack and color them in the lowest color not used by some neighbor > If some vertex cannot be colored, then pick an uncolored vertex to spill, spill it, and restart at step 1

Comp 412, Fall Chaitin-Briggs in Practice Registers Stack No node has degree < 2 Chaitin would spill a node Briggs picks the same node & stacks it

Comp 412, Fall Chaitin-Briggs in Practice Registers Stack Pick a node, say 1

Comp 412, Fall Chaitin-Briggs in Practice Registers Stack 1 Pick a node, say 1

Comp 412, Fall Chaitin-Briggs in Practice Registers Stack 1 Now, both 2 & 3 have degree < 2 Pick one, say 3

Comp 412, Fall Chaitin-Briggs in Practice Registers Stack 1 3 Both 2 & 4 have degree < 2. Take them in order 2, then 4.

Comp 412, Fall Chaitin-Briggs in Practice 4 2 Registers Stack 1 3 2

Comp 412, Fall Chaitin-Briggs in Practice 2 Registers Stack Now, rebuild the graph

Comp 412, Fall Chaitin-Briggs in Practice 4 2 Registers Stack Colors: 1: 2:

Comp 412, Fall Chaitin-Briggs in Practice Registers Stack 1 3 Colors: 1: 2:

Comp 412, Fall Chaitin-Briggs in Practice Registers Stack 1 Colors: 1: 2:

Comp 412, Fall Chaitin-Briggs in Practice Registers Stack Colors: 1: 2:

Comp 412, Fall Chaitin-Briggs Allocator (Bottom-up Coloring) renumber build coalesce spill costs simplify select spill Build SSA, build live ranges, rename Build the interference graph Fold unneeded copies LR x  LR y, and  G I  combine LR x & LR y Remove nodes from the graph Spill uncolored definitions & uses While stack is non-empty pop n, insert n into G I, & try to color it Estimate cost for spilling each live range Briggs’ algorithm (1989) while N is non-empty if  n with n  < k then push n onto stack else pick n to spill push n onto stack remove n from G I while N is non-empty if  n with n  < k then push n onto stack else pick n to spill push n onto stack remove n from G I

Comp 412, Fall Chaitin’s Allocator (Bottom-up Coloring) renumber build coalesce spill costs simplify select spill Build SSA, build live ranges, rename Build the interference graph Fold unneeded copies LR x  LR y, and  G I  combine LR x & LR y Remove nodes from the graph Spill uncolored definitions & uses While stack is non-empty pop n, insert n into G I, & try to color it Estimate cost for spilling each live range Chaitin’s algorithm For contrast, Chaitin’s algorithm (1981) Quick Aside … while N is non-empty if  n with n  < k then push n onto stack else pick n to spill mark n for spill pass remove n from G I while N is non-empty if  n with n  < k then push n onto stack else pick n to spill mark n for spill pass remove n from G I

Comp 412, Fall Other Improvements to Chaitin-Briggs Spilling partial live ranges [ Bergner P LDI 97 ] Bergner introduced interference region spilling Limits spilling to regions of high demand for registers Splitting live ranges [ Simpson CC 98, Eckhardt I CPLC 05 ] Simple idea — break up one or more live ranges Allocator can use different registers for distinct subranges Allocator can spill subranges independently (use 1 spill location) Iterative coalescing [George & Appel ] Use conservative coalescing because it is “safe” Simplify the graph until only non-trivial nodes remain Coalesce & try again If coalescing does not reveal trivial nodes, then spill

Comp 412, Fall Chaitin-Briggs Allocator ( Bottom-up Global ) Strengths & Weaknesses  Precise interference graph  Strong coalescing mechanism  Handles register assignment well  Runs fairly quickly  Known to overspill in tight cases  Interference graph has no geography  Spills a live range everywhere  Long blocks devolve into spilling by use counts Is improvement still possible ? Rising spill costs, aggressive transformations, & long blocks  yes, but the returns are getting rather small

Comp 412, Fall What about Top-down Coloring? The Big Picture Use high-level priorities to rank live ranges Allocate registers for them in priority order Use coloring to assign specific registers to live ranges The Details Separate constrained from unconstrained live ranges >A live range is constrained if it has ≥ k neighbors in G I Color constrained live ranges first Reserve pool of local registers for spilling (or spill & iterate) Chow split live ranges before spilling them > Split into block-sized pieces > Recombine as long as  k Use spill costs as priority function ! Unconstrained must receive a color ! Peixotto’s 2007 MS thesis shows that top-down, in general, produces worse results unless we add an (expensive) adaptive feedback loop

Comp 412, Fall What about Top-down Coloring? The Big Picture Use high-level priorities to rank live ranges Allocate registers for them in priority order Use coloring to assign specific registers to live ranges More Details Chow used an imprecise interference graph —  G I  x,y  Live(b) for some block b —Cannot coalesce live ranges since x  y   G I Quicker to build imprecise graph —Chow’s allocator may run faster on small codes, where demand for registers is also likely to be lower

Comp 412, Fall Tradeoffs in Global Coloring Allocator Design Top-down versus bottom-up Top-down uses high-level information Bottom-up uses low-level structural information Spilling Reserve registers versus iterative coloring Precise versus imprecise graph Precision allows coalescing Imprecision speeds up graph construction Several JITs use GCRA HotSpot Server JIT uses Chaitin-Briggs Dasgupta reduced costs by 35% Several JITs use GCRA HotSpot Server JIT uses Chaitin-Briggs Dasgupta reduced costs by 35%

Linear Scan Allocation Coloring allocators are often viewed as too expensive for use in JIT environments, where compile time occurs at runtime Linear scan allocators use an approximate interference graph and a version of the bottom-up local algorithm Interference graph is an interval graph —Optimal coloring (without spilling) in linear time —Spilling handled well by bottom-up local allocator Algorithm does allocation in a “linear” scan of the graph Linear scan produces faster, albeit less precise, allocations Linear scan allocators hit a different point on the curve of cost versus performance Comp 412, Fall Sun’s HotSpot server compiler uses a complete Chaitin-Briggs allocator. Approximate Global Allocation Live Ranges in LS Interference graph of a set of intervals is an interval graph. Live Ranges in LS Interference graph of a set of intervals is an interval graph.

Linear Scan Allocation Building the Interval Graph Consider the procedure as a linear list of operations A live range for some name is an interval (x,y) —x and y are the indices of two operations in the list, with x < y —Every operation where name is live falls between x & y, inclusive  Precision of live computation can vary with cost —Interval graph overestimates interference The Algorithm Use Best’s algorithm — bottom-up local Distance to next use is well defined Algorithm is fast & produces reasonable allocations Variations have been proposed that build on this scheme Comp 412, Fall

Global Coloring from SSA Form Observation: The interference graph of a program in SSA form is a chordal graph. Observation: Chordal graphs can be colored in O( N ) time. These two facts suggest allocation using an interference graph built from SSA Form Chaitin-Briggs works from live ranges that qre a coalesced version of SSA names SSA allocators use raw SSA names as live ranges Allocate live ranges, then insert copies for φ-functions SSA-based allocation has created a lot of excitement in the last couple of years. Comp 412, Fall Chordal Graph Every cycle of length > 3 has a chord Chordal Graph Every cycle of length > 3 has a chord

Global Coloring from SSA Form Coloring from SSA Names has its advantages If graph is k-colorable, it finds the coloring —(Opinion ) An SSA-based allocator will find more k-colorable graphs than a live-range based allocator because SSA names are shorter and, thus, have fewer interferences. Allocator should be faster than a live-range allocator —Cost of live analysis folded into SSA construction, where it is amortized over other passes —Biggest expense in Chaitin-Briggs is the Build-Coalesce phase, which SSA allocator avoids, as it destroys the chordal graph Comp 412, Fall

Global Coloring from SSA Form Coloring from SSA Names has its disadvantages Coloring is rarely the problem —Most non-trivial codes spill; on trivial codes, both SSA allocator and classic Chaitin-Briggs are overkill. (Try linear scan?) SSA form provides no obvious help on spilling —Shorter live ranges will produce local spilling (good & bad) —May increase spills inside loops After allocation, code is still in SSA form —Need out-of-SSA translation —Introduce copies after allocation —Swap problem may require and extra register —Must run a post-allocation coalescing phase  Algorithms exist that do not use an interference graph  They are not as powerful as the Chaitin-Briggs coalescing phase Comp 412, Fall Loop-carried value cannot spill before the loop, since its name is only live inside the loop and after the loop.

Hybrid Approach ? How can the compiler attain both speed and precision? Observation: lots of procedures are small & do not spill Observation: some procedures are hard to allocate Possible solution: Try different algorithms First, try linear scan —It is cheap and it may work If linear scan fails, try heavyweight allocator of choice —Might be Chaitin-Briggs, SSA, or some other algorithm —Use expensive allocator only when cheap one spills This approach would not help with the speed of a complex compilation, but it might compensate on simple compilations Comp 412, Fall

Comp 412, Fall An Even Stronger Global Allocator Hierarchical Register Allocation (Koblenz & Callahan) Analyze control-flow graph to find hierarchy of tiles Perform allocation on individual tiles, innermost to outermost Use summary of tile to allocate surrounding tile Insert compensation code at tile boundaries ( LR x  LR y ) Anecdotes suggest it is fairly effective Target machine is multi-threaded multiprocessor ( Tera MTA ) Strengths  Decisions are largely local  Use specialized methods on individual tiles  Allocator runs in parallel Strengths  Decisions are largely local  Use specialized methods on individual tiles  Allocator runs in parallel Weaknesses  Decisions are made on local information  May insert too many copies Still, a promising idea Weaknesses  Decisions are made on local information  May insert too many copies Still, a promising idea Eckhardt’s MS (Rice, 2005) shows that K&C produces better allocations than C&B, but is much slower

Comp 412, Fall Regional Approaches to Allocation Probabilistic Register Allocation ( Proebsting & Fischer ) Attempt to generalize from Best’s algorithm ( bottom-up, local ) Generalizes “furthest next use” to a probability Perform an initial local allocation using estimated probabilities Follow this with a global phase —Compute a merit score for each LR as (benefit from x in a register = probability it stays in a register) —Allocate registers to LR s in priority order, by merit score, working from inner loops to outer loops —Use coloring to perform assignment among allocated LR s Little direct experience (either anecdotal or experimental) Combines top-down global with bottom-up local This idea predated Linear Scan and tried to achieve many of the same benefits.

Comp 412, Fall Regional Approaches to Allocation Register Allocation via Fusion ( Lueh, Adl-Tabatabi, Gross ) Use regional information to drive global allocation Partition CFG s into regions & build interference graphs Ensure that each region is k-colorable Merge regions by fusing them along CFG edges —Maintain k-colorability by splitting along fused edge —Fuse in priority order computed during the graph partition Assign registers using interference graphs i.e., execution frequency Strengths Flexibility Fusion operator splits on low- frequency edges Strengths Flexibility Fusion operator splits on low- frequency edges Weaknesses Choice of regions is critical Breaks down many values are live across region boundaries Weaknesses Choice of regions is critical Breaks down many values are live across region boundaries

Comp 412, Fall Extra Slides Start Here

Comp 412, Fall SSA Name Space SSA encodes facts about flow of values into the name space Two principles Each name is defined by exactly one operation Each operand refers to exactly one definition To reconcile these principles with real code Add subscripts to variable names for uniqueness Insert  -functions at merge points to reconcile name space x   x +... x 0 ... x 1 ... x 2  ( x 0, x 1 )  x becomes

Comp 412, Fall SSA Name Space These  -functions are unusual constructs … A  -function only occurs at the start of a block A  -function has one argument for each CFG edge entering the block A  -function returns the argument that corresponds to the edge along which control flow entered the block —All  -functions in the block execute concurrently —Since machines do not support  -functions, must translate back out of SSA form before we produce executable code All  -functions in a block execute concurrently —All read their argument, all perform assignment in parallel Using SSA form leads to simpler or better formulations of many optimizations (alternative to global data-flow analysis )

Comp 412, Fall Building SSA SSA Form Each name is defined exactly once Each use refers to exactly one name What’s Hard? Straight-line code is easy Split points are easy Merge points are hard (Sloppy) Construction Algorithm Insert a  -function for each variable at each merge point Rename all values for uniqueness ( using subscripts ) This approach  Inserts too many  - functions  Inserts  -functions in too many places The rest, however, is optimization & beyond the scope of today’s lecture. ( See §9 in EaC ) This approach  Inserts too many  - functions  Inserts  -functions in too many places The rest, however, is optimization & beyond the scope of today’s lecture. ( See §9 in EaC ) Back

Slides on Rematerialization Cannot be taught without Wegman-Zadeck Sparse Simple Constant Propagation. Comp 412, Fall

Other Improvements to Chaitin-Briggs Consider the following example Comp 412, Fall p varies in loop p is loop invariant p ← Label y ← y + *p p ← p++ Original Code High pressure in this loop makes p spill Rematerialization