Spring 2014Jim Hogg - UW - CSE - P501P-1 CSE P501 – Compiler Construction Register allocation constraints Local allocation Fast, but poorer code Global.

Slides:



Advertisements
Similar presentations
Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
Advertisements

Register Allocation Consists of two parts: Goal : minimize spills
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Register Usage Keep as many values in registers as possible Register assignment Register allocation Popular techniques – Local vs. global – Graph coloring.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Register allocation Morgensen, Torben. "Register Allocation." Basics of Compiler Design. pp from (
Register Allocation Zach Ma.
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
Coalescing Register Allocation CS153: Compilers Greg Morrisett.
Register Allocation Mooly Sagiv Schrierber Wed 10:00-12:00 html://
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
Graph-Coloring Register Allocation CS153: Compilers Greg Morrisett.
Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
Program Representations. Representing programs Goals.
Carnegie Mellon Lecture 6 Register Allocation I. Introduction II. Abstraction and the Problem III. Algorithm Reading: Chapter Before next class:
From AST to Code Generation Professor Yihjia Tsai Tamkang University.
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
1 Register Allocation Consists of two parts: –register allocation What will be stored in registers –Only unambiguous values –register assignment Which.
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Register Allocation (via graph coloring)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
Linear Scan Register Allocation POLETTO ET AL. PRESENTED BY MUHAMMAD HUZAIFA (MOST) SLIDES BORROWED FROM CHRISTOPHER TUTTLE 1.
Precision Going back to constant prop, in what cases would we lose precision?
Supplementary Lecture – Register Allocation EECS 483 University of Michigan.
1 October 18, October 18, 2015October 18, 2015October 18, 2015 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel:
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Register Allocation John Cavazos University.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
Local Register Allocation & Lab Exercise 1 Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Register Usage Keep as many values in registers as possible Keep as many values in registers as possible Register assignment Register assignment Register.
Register Allocation CS 471 November 12, CS 471 – Fall 2007 Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
Code Analysis and Optimization Pat Morin COMP 3002.
Global Register Allocation Based on
The compilation process
Register Allocation Hal Perkins Autumn 2009
Local Register Allocation & Lab Exercise 1
Register Allocation Noam Rinetzky Text book:
Register Allocation Hal Perkins Autumn 2011
Instruction Scheduling Hal Perkins Summer 2004
Instruction Scheduling Hal Perkins Winter 2008
Register Allocation Hal Perkins Summer 2004
Register Allocation Hal Perkins Autumn 2005
Final Code Generation and Code Optimization
Local Register Allocation & Lab Exercise 1
Lecture 16: Register Allocation
Instruction Scheduling Hal Perkins Autumn 2005
Chapter 12 Pipelining and RISC
Compiler Construction
Lecture 17: Register Allocation via Graph Colouring
Fall Compiler Principles Lecture 13: Summary
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Instruction Scheduling Hal Perkins Autumn 2011
(via graph coloring and spilling)
Code Optimization.
CS 201 Compiler Construction
Presentation transcript:

Spring 2014Jim Hogg - UW - CSE - P501P-1 CSE P501 – Compiler Construction Register allocation constraints Local allocation Fast, but poorer code Global allocation Register coloring

Spring 2014Jim Hogg - UW - CSE - P501P-2 k IR typically assumes infinite number of virtual registers Real machine has k physical registers available x86: k = 8 x64: k = 16 Goals Produce correct code that uses k or less registers Minimize added loads and stores Minimize space needed for spilled values Do this efficiently – O(n), O(n log n), maybe O(n 2 )

Spring 2014Jim Hogg - UW - CSE - P501P-3 Register Allocation At each point in the code, pick which values to keep in registers Insert code to move values between registers and memory No additional transformations – scheduling already ran But will usually re-run scheduling afterwards (to include the effects of spilling) Minimize inserted save/restores ("spill" code)

Spring 2014Jim Hogg - UW - CSE - P501P-4 Allocation vs Assignment Allocation: which values to keep in registers? Assignment: which specific registers? Compiler must do both

Spring 2014Jim Hogg - UW - CSE - P501P-5 Applied to basic blocks (ie: "local") Produces good register usage inside a block But can have inefficiencies at boundaries between blocks Two variations: top-down, bottom-up Local Register Allocation

"Top-down" Local Allocation We assume a RISC target machine ("load/store architecture") load operands into two registers (if not already there) perform instruction (eg: add, lshift) store result back to memory (if required) ie: operands for instructions MUST be in registers, rather than memory (note, by contrast, that x86 allows operands directly from memory) Principle: keep most heavily used values in registers Priority = # of times virtual-register used (destination and/or source) in block If more virtual registers than physical (the common case) Reserve some registers for values allocated to memory Need enough to address and load two operands and store result (R1 = R2 + R3) Other registers dedicated to “hot” values But are tied up for entire block with particular value, even if only needed for part of the block Spring 2014Jim Hogg - UW - CSE - P501P-6

"Bottom-up" Local Allocation, 1 Keep a list of free physical registers initially all physical registers at beginning of block except frame-pointer, stack-pointer - pre-allocated Scan code, instruction-by-instruction Allocate a register when one is needed if one is free, allocate it otherwise, pick a victim - eg: VR3 using R7 save contents of R7 allocate R7 to requesting virtual register Emit instructions to use physical register Free physical register as soon as possible In x = y op z, free y and z if no longer needed before allocating x Spring 2014Jim Hogg - UW - CSE - P501P-7

Bottom-up Local Allocation, 2 How to pick a victim? Scan ahead thru code Look for next use of each currently-allocated-to virtual register Pick the virtual register whose next use is furthest off. Why? Save the victim register's to memory (restore later) Spring 2014Jim Hogg - UW - CSE - P501P-8 Virtual Register VR Physical RegisterVR Next Use Instruction  Example: CPU with 4 available registers

Local "bottom-up" Register Allocation, ; load v2 from memory 2. ; load v3 from memory 3. v1 = v2 + v3 4. ; load v5, v6 from memory 5. v4 = v5 - v6 6. v7 = v ; load v9 from memory 8. v8 = - v9 9. v10 = v6 * v4 10. v11 = v10 - v3 Spring 2014Jim Hogg - UW - CSE - P501P-9 Still in LIR. So lots (too many!) virtual registers required (v2, etc). Grey instructions (1,2,4,7) load operands from memory into virtual registers. We will ignore these going forward. Focus on mapping virtual to physical.

Local "bottom-up" Register Allocation, 0 1. v1 = v2 + v3 2. v4 = v5 - v6 3. v7 = v v8 = - v9 5. v10 = v6 * v4 6. v11 = v10 - v3 Spring 2014Jim Hogg - UW - CSE - P501P-10 v11 v21 v31 v42 v52 v62 v73 v84 v94 v105 v116 vReg NextRef R1- R2- R3- R4- pReg vReg

Local "bottom-up" Register Allocation, 1 1. v1 = v2 + v3 2. v4 = v5 - v6 3. v7 = v v8 = - v9 5. v10 = v6 * v4 6. v11 = v10 - v3 Spring 2014Jim Hogg - UW - CSE - P501P-11 v11  v21 3 v31 6 v42 v52 v62 v73 v84 v94 v105 v116 vReg NextRef R1v2 R2v3 R3v1 R4- pReg vReg R3 = R1 + R2

Local "bottom-up" Register Allocation, 2 1. v1 = v2 + v3 2. v4 = v5 - v6 3. v7 = v v8 = - v9 5. v10 = v6 * v4 6. v11 = v10 - v3 Spring 2014Jim Hogg - UW - CSE - P501P-12 v1  v23 v36 v42 5 v52  v62 5 v73 v84 v94 v105 v116 vReg NextRef R1v2 R2v3 v4 R3v1 v6 R4v5 pReg vReg R3 = R1 + R2 ; spill R3 ; spill R2? - no - still clean R2 = R4 - R3

Local "bottom-up" Register Allocation, 3 1. v1 = v2 + v3 2. v4 = v5 - v6 3. v7 = v v8 = - v9 5. v10 = v6 * v4 6. v11 = v10 - v3 Spring 2014Jim Hogg - UW - CSE - P501P-13 v1  v23  v36 v45 v5  v65 v73  v84 v94 v105 v116 vReg NextRef R1v2 R2v4 R3v6 R4v5 v7 pReg vReg And so on... R3 = R1 + R2 ; spill R3 ; spill R2? - no! R2 = R4 - R3 ; spill R4? - no! R4 = R1 - 29

Spring 2014Jim Hogg - UW - CSE - P501P-14 Bottom-Up Allocator Invented about once per decade Sheldon Best, 1955, for Fortran I Laslo Belady, 1965, for analyzing paging algorithms William Harrison, 1975, ECS compiler work Chris Fraser, 1989, LCC compiler Vincenzo Liberatore, 1997, Rutgers Will be reinvented again, no doubt Many arguments for optimality of this

Standard technique is graph coloring Use control-graph and dataflow-graph to derive interference graph Nodes are live ranges (not registers!) Edge between nodes when live at same time Connected nodes cannot use same register Then color the nodes in the graph Two nodes connected by an edge may not have same color (ie: be allocated to same physical register) If more than k colors are needed, insert spill code Spring 2014Jim Hogg - UW - CSE - P501P-15 Global Register Allocation

Graph Coloring, 1 Spring 2014Jim Hogg - UW - CSE - P501P Assign a color to each node, such that No adjacent nodes have same color

Graph Coloring, 2 Spring 2014Jim Hogg - UW - CSE - P501P colors are enough for this graph called a 2-coloring this graph's chromatic number = 2

Graph Coloring, 3 Spring 2014Jim Hogg - UW - CSE - P501P small changes forces a 3rd color

Live Ranges 1 a =... 2 b =... 3 c =... 4 a = a + b + c 5 if a < 10 then 6 d = c print(c) 8 elseif a < 20 then 9 e = d = e + a 11 print(e) 12 else 13 f = d = f + a 15 print(f) 16 endif 17 print(d) Spring 2014Jim Hogg - UW - CSE - P501P-19 Live Range of a variable is between a def and all subsequent possible uses if a < 10, then a is live lines 1..5 if a < 20, then a is live lines otherwise, a is live lines Mmm - clearly nonsense! We need a flowgraph for a correct specification Note: line 14: a goes dead on RHS; d comes live on LHS. So death and birth happen 'half- way thru' the instruction This example is simplified: variables are never re-def'd. And there are no temps

Live Ranges in Flowgraphs a =... b =... c =... a = a + b + c if a < 10 then d = c + 9 print(c) elseif a < 20 then e = 10 d = e + a print(e) else f = 12 d = f + a print(f) endif print(d) Spring 2014Jim Hogg - UW - CSE - P501P-20 a =... b =... c =... a = a + b + c a < 10 a < 20d = c + 8 print(c) e = 10 d = e + a print(e) f = 12 d = f + a print(f) print(d) a b c

Register Interference Graph Spring 2014Jim Hogg - UW - CSE - P501P-21 f e b a dc If live range of 2 variables overlap, or interfere, they cannot use same register Eg: a and b cannot both be stored in register R5 Eg: c and e can be stored in register R5 (at different, non-overlapping times) How many registers will we need? - Guesses?

Interference Graph - Coloring Spring 2014Jim Hogg - UW - CSE - P501P-22 f e b a dc Problem: color this interference graph, where each color represents a different physical register Solution: simplify graph: recursively remove easiest nodes (lowest degree) NodeDegree a3b2c3d3e2f2a3b2c3d3e2f2

Interference Graph - Remove b Spring 2014Jim Hogg - UW - CSE - P501P-23 f e b a dc NodeDegree a2b2c2d3e2f2a2b2c2d3e2f2 Removed:b

Interference Graph - Remove a Spring 2014Jim Hogg - UW - CSE - P501P-24 f e b a dc NodeDegree a2b2c1d3e1f1a2b2c1d3e1f1 Removed:b a

Interference Graph - Remove c Spring 2014Jim Hogg - UW - CSE - P501P-25 f e b a d c NodeDegree a2b2c1d2e1f1a2b2c1d2e1f1 Removed:b a c

Interference Graph - Remove e Spring 2014Jim Hogg - UW - CSE - P501P-26 f e b a d c NodeDegree a2b2c1d2e1f1a2b2c1d2e1f1 Removed:b a c e

Interference Graph - Color! Spring 2014Jim Hogg - UW - CSE - P501P-27 f e b a d c Removed:b a c e

Interference Graph - Add e Spring 2014Jim Hogg - UW - CSE - P501P-28 f e b a d c Removed:b a c

Interference Graph - Add c Spring 2014Jim Hogg - UW - CSE - P501P-29 f e b a dc Removed:b a

Interference Graph - Add a Spring 2014Jim Hogg - UW - CSE - P501P-30 f e b a dc Removed:b

Interference Graph - Add b Spring 2014Jim Hogg - UW - CSE - P501P-31 f e b a dc 3 colors are enough eg: red = R1, green = R2, blue = R3 If less registers available - need to spill

Live Ranges: More Realistic Example 1. v arp = v w = [v arp + 0] 3. v 2 = 2 4. v x = [v arp 5. v y = [v arp 6. v z = [v arp 7. v w = v w * v 2 8. v w = v w * v x 9. v w = v w * v y 10. v w = v w * v z 11. [v arp + 0] = v w v arp v w 2..7 v w 7..8 v w 8..9 v w v w v v x 4..8 v y 5..9 v z P-32 RegisterInterval In this example, virtual registers are re-def'd. Eg: v w participates in 5 live ranges Eg: v w [7..8] interferes with v x [4..8] but not v w [9..10] But no control flow!

Build Live Ranges, 1 Use dataflow information to build interference graph Nodes = live ranges Add an edge for each pair of live ranges that overlap Beware copy instructions: r x = r y does not create interference between r x and r y : can use same register if the ranges do not otherwise interfere Spring 2014Jim Hogg - UW - CSE - P501P-33

Build Live Ranges, 2 Construct the interference graph Find live ranges – SSA Build SSA form of IR Live range initially includes single SSA name' At a  -function, form union of live ranges Either rewrite code to use live range names or keep a mapping between SSA names and live-range names Spring 2014Jim Hogg - UW - CSE - P501P-34

Simplify & Color Algorithm Spring 2014Jim Hogg - UW - CSE - P501P-35 while graph cannot be colored remove easiest node (fewest neighbors = lowest degree) if degree of easiest node > k insert spill code remember that node into stack recalculate degree of each remaining node Simplify color the sole remaining node while stack is non-empty pop node from stack color newly expanded graph Color

Coalescing Live Ranges Idea: if two live ranges are connected by a copy operation (r x = r y ) do not otherwise interfere, then coalesce Rewrite all references to r x to use r y Remove the copy instruction Then fix up interference graph Spring 2014Jim Hogg - UW - CSE - P501P-36

Advantages of Coalescing? Makes the code smaller, faster (no copy operation) Shrinks number of live ranges Reduces the degree of any live range that interfered with both live ranges r x and r y But: coalescing two live ranges can prevent coalescing of others, so ordering matters coalesce most frequently executed ranges first (eg: inner loops) Can have a substantial payoff – do it! Spring 2014Jim Hogg - UW - CSE - P501P-37

Overall Structure Spring 2014Jim Hogg - UW - CSE - P501P-38 Find live ranges Build int. graph Coalesce Spill Costs Find Coloring Insert Spills No Spills More Coalescing Possible Spills

Real-Life Complications Need to deal with irregularities in the register set Some operations require dedicated registers Eg: idiv in x86 Eg: split address/data registers in M68k Register conventions like function results Eg: use of registers across calls (return in EAX) Different register classes Eg: integer, floating-point, SIMD Overlapping registers Eg: x86 AL, AH, AX, EAX, RAX Model by precoloring nodes, adding constraints in the graph, etc. Spring 2014Jim Hogg - UW - CSE - P501P-39

Graph Representation Recall: Register allocation is one of the most important optimizations Finding optimum solution for typical interference graph will take a million years So we use heuristics to find a good solution a million times each day Interference graph representation drives the time & space requirements for the allocator (may even dominate the entire compiler) Not unknown to have ~5k nodes and ~1M edges Spring 2014Jim Hogg - UW - CSE - P501P-40