Register Allocation: Graph Coloring Compiler Baojian Hua

Register Allocation: Graph Coloring Compiler Baojian Hua bjhua@ustc.edu.cn

Middle and Back End AST translation IR1 asm other IR and translation translation IR2

Back-end Structure IR TempMa p instruction selector register allocator Assem instruction scheduler

Instruction Selection int f (int x, int y) { int a; int b; int c; int d; a = x + y; b = a + 4; c = b * 2; d = c / 8; return d; } y: 12(%ebp) x: 8(%ebp) Positions for a, b, c, d can not be determined during this phase. int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret } Prolog Epilog

Register allocation After instruction selection, there may be some variables left basic idea: put as many as possible of these variables into registers speed! Into memory, only if the register are out of supply This process is called register allocation the most popular and important optimization in modern compilers

Register Allocation int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret } Suppose that the register allocation determines that (we will discuss how to do this a little later): a => %eax b => %eax c => %eax d => %eax t1 => %eax t2 => %edx (this data structure is called a temp map)

Rewriting With the given temp map: a => %eax b => %eax c => %eax d => %eax t1 => %eax t2 => %edx.text.globl f f: pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl b, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret %eax %edx We can rewrite the code accordingly, to generate the final assembly code. %eax %edx%eax The rest are left to you!

Peep-hole Optimization.globl f f: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax movl 12(%ebp), %edx movl %eax, %eax addl %edx, %eax movl %eax, %eax addl $4, %eax movl %eax, %eax imult $2 movl %eax, %eax cltd idivl $8 movl %eax, %eax leave ret Peep-hole optimizations try to improve the code by examine the code using a code window. It ’ s of a local manner. For example, we can use a code window of width 1, to eliminate the obvious redundancy of the form: movl r, r

Final Assembly // This function does // NOT need a (stack) // frame!.text.globl f f: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax movl 12(%ebp), %edx addl %edx, %eax addl $4, %eax imult $2 cltd idivl $8 leave ret int f (int x, int y) { int a; int b; int c; int d; a = x + y; b = a + 4; c = b * 2; d = b / 8; return 0; }

Register Allocation Register allocation determines a temp map: a => %eax b => %eax c => %eax d => %eax t1 => %eax t2 => %edx int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret } How to generate such a temp map? Key observation: two variables can reside in one register, iff they don NOT live simultaneously.

Liveness Analysis int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret } So, we can perform liveness analysis to calculate the live variable information. On the right, we mark, between each two statements, the liveOut set. {eax} {d} {eax} {…}

Interference Graph (IG) Register allocation determines that: (the temp map) a => %eax b => %eax c => %eax d => %eax t1 => %eax t2 => %edx int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret } t2 ∞ t1 a ∞ t2 ab cd t1t2 %eax %edx

Steps in Register Allocator Do liveness analysis Build the interference graph (IG) draw an edge between any two variables which don ’ t live simultaneously Coloring the IG with K colors (registers) K is the number of available registers on a machine A classical problem in graph theory NP-complete (for K>=3), thus one must use heuristics Allocate physical registers to variables

History Early work by Cocke suggests that register allocation can be viewed as a graph coloring problem (1971) The first working allocator is Chaitin ’ s for IBM PL/1 compiler (1981) Later, IBM PL.8 compiler Have some impact on the RISC

History, cont The more recent graph coloring allocator is due to Briggs (1992) For now, the graph coloring is the most popular allocator, used in many production compilers e.g., GCC But more advanced allocators invented in recent years so, graph coloring is a lesson abandoned? more on next few lectures …

Graph coloring Once we have the interference graph, we can try to color the graph with K colors K: number of machine registers adjacent nodes with difference colors But this problem is a NP-complete problem (for K>=3) So we must use some heuristics

Kempe ’ s Allocator

Kempe ’ s Theorem [Kempe] Given a graph G with a node n such that degree(n)<K, G is K-colorable iff (G-{n}) is K-colorable (remove n and all edges connect n) Proof? n … degree(n)<K

Kempe ’ s Algorithm kempe(graph G, int K) while (there is any node n, degree(n)<K) remove this node n assign a color to the removed node n // greedy if (G is empty) // i.e., G is K-colorable return success; return failure;

Example ab cd e K = 4 1, 2, 3, 4 degree(a) = 3<4 remove node “ a ”, assign the first available color

Example ab cd e K = 4 1, 2, 3, 4 degree(a) = 3<4 remove node “ a ”, assign the first available color degree(b) = 2<4 remove node “ b ”, assign the first available color Here, we want to choose the node with lowest degree, what kind of data structure should we use?

Example ab cd e K = 4 1, 2, 3, 4 degree(a) = 3<4 remove node “ a ”, assign the first available color degree(b) = 2<4 remove node “ b ”, assign the first available color degree(c) = 2<4 remove node “ c ”, assign the first available color

Example ab cd e K = 4 1, 2, 3, 4 degree(a) = 3<4 remove node “ a ”, assign the first available color degree(b) = 2<4 remove node “ b ”, assign the first available color degree(c) = 2<4 remove node “ c ”, assign the first available color degree(d) = 1<4 remove node “ d ”, assign the first available color

Example ab cd e K = 4 1, 2, 3, 4 degree(a) = 3<4 remove node “ a ”, assign the first available color degree(b) = 2<4 remove node “ b ”, assign the first available color degree(c) = 2<4 remove node “ c ”, assign the first available color degree(d) = 1<4 remove node “ d ”, assign the first available color degree(e) = 0<4 remove node “ e ”, assign the first available color

Example ab cd e K = 3 1, 2, 3 So this graph is 3-colorable. But if we have three colors, we can NOT apply the Kempe algorithm. (Why?) We can refine it to the following one: kempe(graph G, int K) stack = []; while (true) remove and push node<K to stack; if node>=K, remove and push it pop stack and assign colors Essentially, this is a lazy algorithm!

Example ab cd e K = 3 1, 2, 3 remove node “ a ”, push onto the stack

Example ab cd e K = 3 1, 2, 3 remove node “ a ”, push onto the stack a remove node “ b ”, push onto the stack significant

Example ab cd e K = 3 1, 2, 3 remove node “ a ”, push onto the stack a remove node “ b ”, push onto the stack b remove node “ c ”, push onto the stack significant

Example ab cd e K = 3 1, 2, 3 remove node “ a ”, push onto the stack a remove node “ b ”, push onto the stack b remove node “ c ”, push onto the stack c remove node “ d ”, push onto the stack d remove node “ e ”, push onto the stack significant

Example ab cd e K = 3 1, 2, 3 remove node “ a ”, push onto the stack a remove node “ b ”, push onto the stack b remove node “ c ”, push onto the stack c remove node “ d ”, push onto the stack d remove node “ e ”, push onto the stack e pop the stack, assign suitable colors pop “ e ” significant

Example ab cd e K = 3 1, 2, 3 remove node “ a ”, push onto the stack a remove node “ b ”, push onto the stack b remove node “ c ”, push onto the stack c remove node “ d ”, push onto the stack d remove node “ e ”, push onto the stack pop the stack, assign suitable colors pop “ e ” pop “ d ” significant

Example ab cd e K = 3 1, 2, 3 remove node “ a ”, push onto the stack a remove node “ b ”, push onto the stack b remove node “ c ”, push onto the stack c remove node “ d ”, push onto the stack remove node “ e ”, push onto the stack pop the stack, assign suitable colors pop “ e ” pop “ d ” pop “ c ” significant

Example ab cd e K = 3 1, 2, 3 remove node “ a ”, push onto the stack a remove node “ b ”, push onto the stack b remove node “ c ”, push onto the stack remove node “ d ”, push onto the stack remove node “ e ”, push onto the stack pop the stack, assign suitable colors pop “ e ” pop “ d ” pop “ c ” pop “ b ” significant

Example ab cd e K = 3 1, 2, 3 remove node “ a ”, push onto the stack a remove node “ b ”, push onto the stack remove node “ c ”, push onto the stack remove node “ d ”, push onto the stack remove node “ e ”, push onto the stack pop the stack, assign suitable colors pop “ e ” pop “ d ” pop “ c ” pop “ b ” pop “ a ” significant

Example ab cd e K = 3 1, 2, 3 remove node “ a ”, push onto the stack remove node “ b ”, push onto the stack remove node “ c ”, push onto the stack remove node “ d ”, push onto the stack remove node “ e ”, push onto the stack pop the stack, assign suitable colors pop “ e ” pop “ d ” pop “ c ” pop “ b ” pop “ a ”

Moral Kempe ’ s algorithm: step #1: simplify remove graph nodes, be optimistic step #2: select assign a color for each node, be lazy You should use this algorithm for your lab6 first But what about the select phase fail? no enough colors (registers)!

Example ab cd e K = 2 1, 2 remove node “ a ”, push onto the stack

Failure It ’ s often the case that Kempe ’ s algorithm fails The IG is not K-colorable The basic idea is to generate spilling code some variables should be put into memory, instead of into registers Usually, spilled variables reside in the call stack Should modify code using such variables: for variable use: read from the memory for variable def: store into the memory

Spill code generation The effect of spill code is to turn long live range into shorter ones This may introduce more temporaries The register allocator should start over, after generating spill code We ’ ll talk about this shortly

Chaitin ’ s Allocator

Chaitin ’ s Algorithm Build: build the interference graph (IG) Simplify: simplify the graph Spill: for significant nodes, mark it as potential spill (sp), remove it and continue Select: pop nodes and try to assign colors if this fails for potential spill node, mark potential spill as actural spill and continue Start over: generate spill code for actural spills and start over from step #1 (build)

Chaitin ’ s Algorithm buildsimplify Potential spill Select Actual spill

Step 1: build the IG a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2

Step 2: simplification a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 f

Step 2: simplification a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 f e

Step 2: simplification a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 f e c ps

Step 2: simplification a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 f e c ps d

Step 2: simplification a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 f e c ps d a

Step 2: simplification a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 f e c ps d a b

Step 3: selection a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 f e c ps d a b

Step 3: selection a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 f e c ps d a

Step 3: selection a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 f e c ps d

Step 3: selection a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 f e c ps actural spill a fake color

Step 3: selection a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 f e actural spill a fake color actural spill a fake color

Step 3: selection a = 1 b = 2 c = a+b d = a+c e = a+b f = d+f ab cd ef K = 2 1, 2 f actural spill a fake color actural spill a fake color

Step 3: selection a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e ab cd ef K = 2 1, 2 actural spill a fake color actural spill a fake color There are two spills: c and d. One must rewrite the code.

Step 4: code rewriting (actual spill) a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e a = 1 b = 2 x1 = a+b M[l_c] = x1 x2 = M[l_c] x3 = a+x2 M[l_d] = x3 e = a+b x4 = M[l_d] f = x4+e There are two spills: c and d. Suppose the memory address for c and d are l_c and l_d (two integers indicating stack offsets). Then for each use, generate a read, for each def, generate a write. What ’ s special about xi? They can NOT spill any more. (Why?) c = a+b d = a+c f = d+e

Step 4: … star over ab x1x2 f K = 2 1, 2 a = 1 b = 2 x1 = a+b M[l_c] = x1 x2 = M[l_d] x3 = t1+x2 M[l_d] = x3 e = a+b x4 = M[l_d] f = x4+e e x4 x3 Leave other steps to you. This graph can NOT be colored with 2 colors. (There is a K2 sub-graph.) Veryyyyyyyy EXPENSIVE! So, we have to do another iteration to generate spill code (Keep in mind that you can NOT spill x1, x2, x3 and x4) …

Code spill (2 nd time) x5 = 1 M[l_a] = x5 b = 2 s6 = M[l_a] x1 = s6+b M[l_c] = x1 x2 = M[l_c] x7 = M[l_a] x3 = x7+x2 M[l_d] = x3 x8 = M[l_a] e = x8+b x4 = M[l_d] f = x4+e a = 1 b = 2 x1 = a+b M[l_c] = x1 x2 = M[l_c] x3 = a+x2 M[l_d] = x3 e = a+b x4 = M[l_d] f = x4+e ab x1x2 f K = 2 1, 2 e x4 x3 spilled

IG x1 b x2 x8 f K = 2 1, 2 e x3 x4 x5 x6 x7 x5 = 1 M[l_a] = x5 b = 2 s6 = M[l_a] x1 = s6+b M[l_c] = x1 x2 = M[l_c] x7 = M[l_a] x3 = x7+x2 M[l_d] = x3 x8 = M[l_a] e = x8+b x4 = M[l_d] f = x4+e This graph is still not 2- colorable. Why? So we should continue to spill code. And star over … There are 3 variables remained: b, e, f. Which one should be spilled? Suppose we spill b.

Third Round x5 = 1 M[l_a] = x5 x9 = 2 M[l_b] = x9 x6 = M[l_a] x10 = M[l_b] x1 = x6+x10 M[l_c] = x1 x2 = M[l_c] x7 = M[l_a] x3 = x7+x2 M[l_d] = x3 x1 x9 x8 x6 f K = 2 1, 2 e x3 x4 x5 x2 x7 x8 = M[l_a] x11 = M[l_b] e = s8+s11 x4 = M[l_d] f = x4+e x11 x10 We have spill all of a, b, c, and d. This has the effect of chopping up all long live ranges into small live ranges!

Spilling a use For a statement like this: t = u + v if we mark u as an actural spill, rewrite to: u ’ = M[l_u] t = u ’ +v where u ’ can NOT be a candidate for future spill (unspillable)

Spilling a def For a statement like this: t = u + v if we mark t as an actural spill, rewrite to: t ’ = u+v M[l_t] = t ’ where t ’ can NOT be a candidate for future spill (unspillable)

Spilled temps Where should these variables be spilled to? function frames! %ebp … %esp arg1 arg0 ret addr old ebp Spill_0 Spill_1 … The compiler maintains an internal counter. Each time the compiler finds an actural spill, it increases the counter and assigns a location for that spilled variable.

Frame Suppose we put the frame on the stack:.text.globl f f: pushl %ebp movl %esp, %ebp pushl %ebx pushl %edi pushl %esi subl $(n*4), %esp n is the number of all spills, which can only be determined after register allocation.

Some improvements We can speed up the graph coloring based register alloctor in several ways But: To finish first, first finish KISS: keep it simple and stupid Don ’ t be too smart by half Your Tiger compiler must produce correct target code first

#1: Good data structures For live sets bit-vector? or other data structures? For IG adjacency list? adjacency matrix? both? Similar for other data structures Use good interface will let you write dead simple code and enhance it later

#2: frame slot allocation x5 = 1 M[l_a] = x5 x9 = 2 M[l_b] = x9 x6 = M[l_a] x10 = M[l_b] x1 = x6+x10 M[l_c] = x1 x2 = M[l_c] x7 = M[l_a] x3 = x7+x2 M[l_d] = x3 x8 = M[l_a] x11 = M[l_b] e = x8+x11 x4 = M[l_d] f = x4+e Allocating every spilled temp to its own frame slot can lead to a lot of memory used! A better idea is to share frame slot between spilled temp: iff they don ’ t live simultaneously: frame slot allocation!

#2: frame slot allocation l_a l_bl_d l_c How many different colors are required to color this graph? x5 = 1 M[l_a] = x5 x9 = 2 M[l_b] = x9 x6 = M[l_a] x10 = M[l_b] x1 = x6+x10 M[l_c] = x1 x2 = M[l_c] x7 = M[l_a] x3 = x7+x2 M[l_d] = x3 x8 = M[l_a] x11 = M[l_b] e = x8+x11 x4 = M[l_d] f = x4+e

#3: coalescing Suppose we have a move statement: t = u What ’ s the potential benefit of allocating both t and u to the same register r? r = r This is called coalescing

Briggs ’ Allocator

Register Allocation: Graph Coloring Compiler Baojian Hua

Similar presentations

Presentation on theme: "Register Allocation: Graph Coloring Compiler Baojian Hua"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Register Allocation: Graph Coloring Compiler Baojian Hua

Similar presentations

Presentation on theme: "Register Allocation: Graph Coloring Compiler Baojian Hua"— Presentation transcript:

Similar presentations

About project

Feedback