Register Allocation Mooly Sagiv html://www.math.tau.ac.il/~msagiv/courses/wcc02.html.

Slides:



Advertisements
Similar presentations
Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
The University of Adelaide, School of Computer Science
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Coalescing Register Allocation CS153: Compilers Greg Morrisett.
Register Allocation Mooly Sagiv Schrierber Wed 10:00-12:00 html://
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
1 CS 201 Compiler Construction Machine Code Generation.
Graph-Coloring Register Allocation CS153: Compilers Greg Morrisett.
Procedures in more detail. CMPE12cGabriel Hugh Elkaim 2 Why use procedures? –Code reuse –More readable code –Less code Microprocessors (and assembly languages)
Procedure Calls Prof. Sirer CS 316 Cornell University.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /17/2013 Lecture 12: Procedures Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
Recap Roman Manevich Mooly Sagiv. Outline Subjects Studied Questions & Answers.
Carnegie Mellon Lecture 6 Register Allocation I. Introduction II. Abstraction and the Problem III. Algorithm Reading: Chapter Before next class:
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Procedures in more detail. CMPE12cCyrus Bazeghi 2 Procedures Why use procedures? Reuse of code More readable Less code Microprocessors (and assembly languages)
Register Allocation Mooly Sagiv html://
1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.
Liveness Analysis Mooly Sagiv Schrierber Wed 10:00-12:00 html://
CS 536 Spring Code generation I Lecture 20.
Activation Records Mooly Sagiv html:// Chapter 6.3.
Register Allocation Mooly Sagiv Make-Up Class Optional Material Friday 10:00-13:00 Schriber 6 html://
Code Generation Simple Register Allocation Mooly Sagiv html:// Chapter
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
Run time vs. Compile time
Register Allocation (via graph coloring)
Code Generation Mooly Sagiv html:// Chapter 4.
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
Compiler Summary Mooly Sagiv html://
Compiler Construction Recap Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Register Allocation Recap Mooly Sagiv html:// Special Office Hours Wednesday 12-14, Thursday 12-14, Schriber.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
Code Generation Mooly Sagiv html:// Chapter 4.
Register Allocation Mooly Sagiv html://
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
Chapter 8 :: Subroutines and Control Abstraction
Lecture 18: 11/5/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Activation Records (in Tiger) CS 471 October 24, 2007.
Lecture 19: 11/7/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Compilation (Semester A, 2013/14) Lecture 9: Activation Records + Register Allocation Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv.
COMPILERS Instruction Selection hussein suleman uct csc305h 2005.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.
Compilation (Semester A, 2013/14) Lecture 10: Register Allocation Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv and Eran Yahav.
Compiler Principles Fall Compiler Principles Lecture 12: Register Allocation Roman Manevich Ben-Gurion University.
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
Compilation /16a Lecture 9 Register Allocation Noam Rinetzky 1.
Compiler Principles Fall Compiler Principles Lecture 11: Register Allocation Roman Manevich Ben-Gurion University of the Negev.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
Compiler Summary Lexical analysis—scanner – String of characters  token – Finite automata. Syntactical analysis – parser – Sequence of tokens –> language.
Compilation (Semester A, 2013/14)
Computer structure: Procedure Calls
Mooly Sagiv html://
Fall Compiler Principles Lecture 9: Register Allocation
Register Allocation Noam Rinetzky Text book:
Topic 12: Register Allocation
Compilation /17a Lecture 10
Liveness Analysis and Register Allocation
Procedures and Calling Conventions
Compiler Construction
Computer Architecture
Fall Compiler Principles Lecture 13: Summary
(via graph coloring and spilling)
Presentation transcript:

Register Allocation Mooly Sagiv html://

Basic Compiler Phases Source program (string) Fin. Assembly lexical analysis syntax analysis semantic analysis Translate Instruction selection Register Allocation Tokens Abstract syntax tree Intermediate representation Assembly Frame

Register Allocation Input: –Sequence of machine code instructions (assembly) Unbounded number of temporary registers Output –Sequence of machine code instructions (assembly) –Machine registers –Some MOVE instructions removed –Missing prologue and epilogue

Register Allocation Process Repeat Construct the interference graph Color graph nodes with machine registers Adjacent nodes are not colored by the same register Spill a temporary into activation record Until no more spill

t128 l3:beq t128, $0, l0 /* $0, t128 */ l1: or t131, $0, t128 /* $0, t128, t131 */ addi t132, t128, -1 /* $0, t131, t132 */ or $4, $0, t132 /* $0, $4, t131 */ jal nfactor /* $0, $2, t131 */ or t130, $0, $2 /* $0, t130, t131 */ or t133, $0, t131 /* $0, t130, t133 */ mult t133, t130 /* $0, t133 */ mflo t133 /* $0, t133 */ or t129, $0, t133 /* $0, t129 */ l2: or t103, $0, t129 /* $0, t103 */ b lend /* $0, t103 */ l0: addi t129, $0, 1 /* $0, t129 */ b l2 /* $0, t129 */ t133 $2 $0 $4 t128 t129 t130 t131 t132 t103 t132 t131 t130 t133 t129 t103

l3:beq t128, $0, l0 l1: or t131, $0, t128 addi t132, t128, -1 or $4, $0, t132 jal nfactor or t130, $0, $2 or t133, $0, t131 mult t133, t130 mflo t133 or t129, $0, t133 l2: or t103, $0, t129 b lend l0: addi t129, $0, 1 b l2 t128 t133 $2 $0 $4 t128 t129 t130 t131 t132 t103 t132 t131 t130 t133 t129 t103

Challenges The Coloring problem is computationally hard The number of machine registers may be small Avoid too many MOVEs Handle “pre-colored” nodes

Coloring by Simplification [Kempe 1879] K –the number of machine registers G(V, E) –the interference graph Consider a node v  V with less than K neighbors: –Color G – v in K colors –Color v in a color different than its (colored) neighbors

Graph Coloring by Simplification Build: Construct the interference graph Simplify: Recursively remove nodes with less than K neighbors ; Push removed nodes into stack Potential-Spill: Spill some nodes and remove nodes Push removed nodes into stack Select: Assign actual registers (from simplify/spill stack) Actual-Spill: Spill some potential spills and repeat the process

Artificial Example K=2 t1t3 t2 t4 t7 t8 t5 t6

Coalescing MOVs can be removed if the source and the target share the same register The source and the target of the move can be merged into a single node (unifying the sets of neighbors) May require more registers Conservative Coalescing –Merge nodes only if the resulting node has fewer than K neighbors with degree  K (in the resulting graph)

Constrained Moves A instruction T  S is constrained – if S and T interfere May happen after coalescing Constrained MOVs are not coalesced X Y Z X  Y /* X, Y, Z */ YZYZ

Graph Coloring with Coalescing Build: Construct the interference graph Simplify: Recursively remove non MOVE nodes with less than K neighbors; Push removed nodes into stack Potential-Spill: Spill some nodes and remove nodes Push removed nodes into stack Select: Assign actual registers (from simplify/spill stack) Actual-Spill: Spill some potential spills and repeat the process Coalesce: Conservatively merge unconstrained MOV related nodes with fewer than K “heavy” neighbors Freeze: Give-Up Coalescing on some low-degree MOV related nodes

Spilling Many heuristics exist –Maximal degree –Live-ranges –Number of uses in loops The whole process need to be repeated after an actual spill

Pre-Colored Nodes Some registers in the intermediate language are pre-colored: –correspond to real registers (stack-pointer, frame-pointer, parameters, ) Cannot be Simplified, Coalesced, or Spilled (infinite degree) Interfered with each other But normal temporaries can be coalesced into pre- colored registers Register allocation is completed when all the nodes are pre-colored

Caller-Save and Callee-Save Registers callee-save-registers (MIPS 16-23) – Saved by the callee when modified – Values are automatically preserved across calls caller-save-registers –Saved by the caller when needed –Values are not automatically preserved Usually the architecture defines caller-save and callee-save registers –Separate compilation –Interoperability between code produced by different compilers/languages But compilers can decide when to use calller/callee registers

Caller-Save vs. Callee-Save Registers int foo(int a){ int b=a+1; f1(); g1(b); return(b+2); } void bar (int y) { int x=y+1; f2(y); g2(2); }

Saving Callee-Save Registers enter: def(r 7 ) … exit:use(r 7 ) enter: def(r 7 ) t 231  r 7 … r 7  t 231 exit:use(r 7 )

A Complete Example enter: c := r3 a := r1 b := r2 d := 0 e := a loop: d := d+b e := e-1 if e>0 goto loop r1 := d r3 := c return /* r1,r3 */ r1, r2caller save r3 callee-save enter: /* r2, r1, r3 */ c := r3 /* c, r2, r1 */ a := r1 /* a, c, r2 */ b := r2 /* a, c, b */ d := 0 /* a, c, b, d */ e := a /* e, c, b, d */ loop: d := d+b /* e, c, b, d */ e := e-1 /* e, c, b, d */ if e>0 goto loop /* c, d */ r1 := d /* r1, c */ r3 := c /* r1, r3 */ return /* r1, r3 */

Graph Coloring with Coalescing Build: Construct the interference graph Simplify: Recursively remove non MOVE nodes with less than K neighbors; Push removed nodes into stack Potential-Spill: Spill some nodes and remove nodes Push removed nodes into stack Select: Assign actual registers (from simplify/spill stack) Actual-Spill: Spill some potential spills and repeat the process Coalesce: Conservatively merge unconstrained MOV related nodes with fewer that K “heavy” neighbors Freeze: Give-Up Coalescing on some low-degree MOV related nodes

A Complete Example enter: c := r3 a := r1 b := r2 d := 0 e := a loop: d := d+b e := e-1 if e>0 goto loop r1 := d r3 := c return /* r1,r3 */ r1, r2caller save r3 callee-save enter: /* r2, r1, r3 */ c := r3 /* c, r2, r1 */ a := r1 /* a, c, r2 */ b := r2 /* a, c, b */ d := 0 /* a, c, b, d */ e := a /* e, c, b, d */ loop: d := d+b /* e, c, b, d */ e := e-1 /* e, c, b, d */ if e>0 goto loop /* c, d */ r1 := d /* r1, c */ r3 := c /* r1, r3 */ return /* r1, r3 */

enter: /* r2, r1, r3 */ c := r3 /* c, r2, r1 */ a := r1 /* a, c, r2 */ b := r2 /* a, c, b */ d := 0 /* a, c, b, d */ e := a /* e, c, b, d */ loop: d := d+b /* e, c, b, d */ e := e-1 /* e, c, b, d */ if e>0 goto loop /* c, d */ r1 := d /* r1, c */ r3 := c /* r1, r3 */ return /* r1,r3 */

enter: /* r2, r1, r3 */ c := r3 /* c, r2, r1 */ a := r1 /* a, c, r2 */ b := r2 /* a, c, b */ d := 0 /* a, c, b, d */ e := a / * e, c, b, d */ loop: d := d+b /* e, c, b, d */ e := e-1 /* e, c, b, d */ if e>0 goto loop /* c, d */ r1 := d /* r1, c */ r3 := c /* r1, r3 */ return /* r1,r3 */ use+ def outside loop use+ def within loop degspill priority a b c d e spill priority = (uo + 10 ui)/deg

c stack Spill C

c stack c Coalescing a+e

Coalescing b+r2 c stack c

Coalescing ae+r1 c stack c r1ae and d are constrained

Simplifying d dcdc stack c

Pop d dcdc stack c d is assigned to r3

Pop c c stack actual spill! c

enter: /* r2, r1, r3 */ c1 := r3 /* c1, r2, r1 */ M[c_loc] := c1 /* r2 */ a := r1 /* a, r2 */ b := r2 /* a, b */ d := 0 /* a, b, d */ e := a / * e, b, d */ loop: d := d+b /* e, b, d */ e := e-1 /* e, b, d */ if e>0 goto loop /* d */ r1 := d /* r1 */ c2 := M[c_loc] /* r1, c2 */ r3 := c2 /* r1, r3 */ return /* r1,r3 */ enter: /* r2, r1, r3 */ c := r3 /* c, r2, r1 */ a := r1 /* a, c, r2 */ b := r2 /* a, c, b */ d := 0 /* a, c, b, d */ e := a / * e, c, b, d */ loop: d := d+b /* e, c, b, d */ e := e-1 /* e, c, b, d */ if e>0 goto loop /* c, d */ r1 := d /* r1, c */ r3 := c /* r1, r3 */ return /* r1,r3 */

enter: /* r2, r1, r3 */ c1 := r3 /* c1, r2, r1 */ M[c_loc] := c1 /* r2 */ a := r1 /* a, r2 */ b := r2 /* a, b */ d := 0 /* a, b, d */ e := a / * e, b, d */ loop: d := d+b /* e, b, d */ e := e-1 /* e, b, d */ if e>0 goto loop /* d */ r1 := d /* r1 */ c2 := M[c_loc] /* r1, c2 */ r3 := c2 /* r1, r3 */ return /* r1,r3 */

Coalescing c1+r3; c2+c1r3 stack

Coalescing a+e; b+r2 stack

Coalescing ae+r1; Simplify d d stack

Pop d stack d d a r1 b r2 c1 r3 c2 r3 dr3 er1

enter: c1 := r3 M[c_loc] := c1 a := r1 b := r2 d := 0 e := a loop: d := d+b e := e-1 if e>0 goto loop r1 := d c2 := M[c_loc] r3 := c2 return /* r1,r3 */ a r1 b r2 c1 r3 c2 r3 dr3 er1 enter: r3 := r3 M[c_loc] := r3 r1 := r1 r2 := r2 r3 := 0 r1 := r1 loop: r3 := r3+r2 r1 := r1-1 if r1>0 goto loop r1 := r3 r3 := M[c_loc] r3 := r3 return /* r1,r3 */

enter: r3 := r3 M[c_loc] := r3 r1 := r1 r2 := r2 r3 := 0 r1 := r1 loop: r3 := r3+r2 r1 := r1-1 if r1>0 goto loop r1 := r3 r3 := M[c_loc] r3 := r3 return /* r1,r3 */ enter: M[c_loc] := r3 r3 := 0 loop: r3 := r3+r2 r1 := r1-1 if r1>0 goto loop r1 := r3 r3 := M[c_loc] return /* r1,r3 */

Graph Coloring Implementation Two kinds of queries on interference graph –Get all the nodes adjacent to node X –Is X and Y adjacent? Significant save is obtained by not explicitly representing adjacency lists for machine registers –Need to be traversed when nodes are coalesced Heuristic conditions for coalescing temporary X with machine register R: –For every T that is a neighbor of X any of the following is true (guarantee that the number of neighbors of T is not increased from <K to =K): –T already interfere with R –T is a machine register –Degree(T) < K

Worklist Management Use work-list to avoid expensive iterative queries Maintain various invariants as data structures are updated Implement work-lists as doubly linked lists to allow element-deletion

Graph Coloring Expression Trees Drastically simpler No need for dataflow analysis Optimal solution can be computed (using dynamic programming)

Simple Register Allocation int n := 0 function SimpleAlloc(t) { for each non trivial tile u that is a child of t SimpleAlloc(u) for each non trivial tile u that is a child of t { n := n - 1 } n:= n + 1 assign r n to hold the value of t }

Bad Example PLUS BINOP MEM BINOP NAME a TIMESMEM NAME bNAME c

PLUS BINOP MEM BINOP NAME a TIMES MEM NAME b NAME c simpleAlloc() n = 1 load r1 M[a] simpleAlloc() n = 2 load r2 M[b] simpleAlloc() n = 3 load r3 M[b] n = 2 mul r2, r3, r2 n = 1 add r1, r2, r1

Two Phase Solution Dynamic Programming Sethi & Ullman Bottom-up (labeling) –Compute for every subtree The minimal number of registers needed Top-Down –Generate the code using labeling by preferring “heavier” subtrees (larger labeling)

The Labeling Algorithm void label(t) { for each tile u that is a child of t do label(u) if t is trivial then need[t] := 0 else if t has two children, left and right then if need[left] = need[right] then need[t] := need[left] + 1 else need[t] := max(1, need[left], need[right]) else if t has one child, c then need[t] := max(1, need[c]) else if t has no children then need[t] := 1 }

int n := 0 void StehiUllman(t) { if t has two children,left and right then if need[left]>K and need[right]>K then StehiUllman (right) ; n := n – 1 ; spill: emit instruction to store reg[right] StehiUllman (left) unspill: emit instruction to fetch reg[right] else if need[left]  need[right] then StehiUllman (left) ; StehiUllman (right) ; n := n - 1 else if need[right]> need[left] then StehiUllman (right) ; StehiUllman (left) ; n := n – 1 reg[t] := r n ; emit(OPER, instruction[t], reg[t], reg[left], reg[right]) else if t has one child c then StehiUllman(c) ; reg[t] : = r n ; emit(OPER, instruction[t], reg[t], reg[c]) else if t is non trivial but has no children n := n + 1 ; reg[t] := r n ; emit (OPER, instruction[t], reg[t]) else if t a trivial node TEMP(r i ) then reg[t] := r i

PLUS BINOP MEM BINOP NAME a TIMES MEM NAME b NAME c need=1 need=2 n = 1 load r1 M[b] SethiUlllman() n = 2 load r2 M[c] n = 1 MUL r1, r2, r1 SethiUlllman() n = 2 load r2 M[a] n = 1 ADD r2, r1, r1

Interprocedural Allocation Allocate registers to multiple procedures Potential saving –caller/callee save registers –Parameter passing –Return values But may increase compilation cost Function inline can help

main: addiu $sp,$sp, -K1 L4: sw$2,0+K1($sp) or$25,$0,$31 sw$25,-4+K1($sp) addiu $25,$sp,0+K1 or$2,$0,$25 addi$25,$0,10 or$4,$0,$25 jalnfactor lw$25,-4+K1 or$31,$0,$25 bL3 L3: addiu $sp,$sp,K1 j$31 nfactor: addiu$sp,$sp,-K2 L6: sw $2,0+K2($sp) or$25,$0,$4 or$24,$0,$31 sw$24,-4+K2($sp) sw$30,-8+K2($sp) beq$25,$0,L0 L1: or$30,$0,$25 lw$24,0+K2 or$2,$0,$24 addi $25,$25,-1 or$4,$0,$25 jalnfactor or$25,$0,$2 mult $30,$25 mflo $30 L2: or$2,$0,$30 lw$30,-4+K2($sp) or$31,$0,$30 lw$30,-8+K2($sp) bL5 L0: addi $30,$0,1 bL2 L5: addiu $sp,$sp,K2 j$31

Summary Two Register Allocation Methods –Local of every IR tree Simultaneous instruction selection and register allocation Optimal (under certain conditions) –Global of every function Applied after instruction selection Performs well for machines with many registers Can handle instruction level parallelism Missing –Interprocedural allocation