Compiler Principles Winter 2012-2013 Compiler Principles Loop Optimizations and Register Allocation Mayer Goldberg and Roman Manevich Ben-Gurion University.

Slides:



Advertisements
Similar presentations
Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
Advertisements

Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Lecture 11: Code Optimization CS 540 George Mason University.
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Coalescing Register Allocation CS153: Compilers Greg Morrisett.
Register Allocation Mooly Sagiv Schrierber Wed 10:00-12:00 html://
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
Graph-Coloring Register Allocation CS153: Compilers Greg Morrisett.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
SSA.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Worklist algorithm Initialize all d i to the empty set Store all nodes onto a worklist while worklist is not empty: –remove node n from worklist –apply.
CS 536 Spring Global Optimizations Lecture 23.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Data Flow Analysis Compiler Design Nov. 3, 2005.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Register Allocation (via graph coloring)
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
Administrative stuff Office hours: After class on Tuesday.
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Compiler Principles Winter Compiler Principles Global Optimizations Mayer Goldberg and Roman Manevich Ben-Gurion University.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Compiler Principles Fall Compiler Principles Lecture 12: Register Allocation Roman Manevich Ben-Gurion University.
Compilation Lecture 8 Abstract Interpretation Noam Rinetzky 1.
Compiler Principles Fall Compiler Principles Lecture 11: Loop Optimizations Roman Manevich Ben-Gurion University.
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
Compiler Principles Fall Compiler Principles Lecture 11: Register Allocation Roman Manevich Ben-Gurion University of the Negev.
Compiler Principles Fall Compiler Principles Lecture 9: Dataflow & Optimizations 2 Roman Manevich Ben-Gurion University of the Negev.
COMPILERS Liveness Analysis hussein suleman uct csc3003s 2009.
Compilation (Semester A, 2013/14)
Fall Compiler Principles Lecture 9: Register Allocation
Register Allocation Noam Rinetzky Text book:
Fall Compiler Principles Lecture 8: Loop Optimizations
Abstract Interpretation Noam Rinetzky
Exam Topics Hal Perkins Autumn 2009
Fall Compiler Principles Lecture 10: Global Optimizations
Fall Compiler Principles Lecture 10: Loop Optimizations
Data Flow Analysis Compiler Design
Compiler Construction
COMPILERS Liveness Analysis
Fall Compiler Principles Lecture 13: Summary
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
(via graph coloring and spilling)
Exam Topics Hal Perkins Winter 2008
Presentation transcript:

Compiler Principles Winter Compiler Principles Loop Optimizations and Register Allocation Mayer Goldberg and Roman Manevich Ben-Gurion University

Today Review (global) dataflow analysis – Join semilattices Monotone dataflow frameworks – Termination – Distribute transfer functions  join over all paths Loop optimizations – Introduce reaching definitions analysis – Loop code motion – (Strength reduction via induction variables) Register allocation by graph coloring – From liveness to register interference graph – Heuristics for graph coloring 2

Liveness Analysis A variable is live at a point in a program if later in the program its value will be read before it is written to again 3

Join semilattice definition A join semilattice is a pair (V,  ), where V is a domain of elements  is a join operator that is – commutative: x  y = y  x – associative: (x  y)  z = x  (y  z) – idempotent: x  x = x If x  y = z, we say that z is the join or (Least Upper Bound) of x and y Every join semilattice has a bottom element denoted  such that   x = x for all x 4

Partial ordering induced by join Every join semilattice (V,  ) induces an ordering relationship  over its elements Define x  y iff x  y = y Need to prove – Reflexivity: x  x – Antisymmetry: If x  y and y  x, then x = y – Transitivity: If x  y and y  z, then x  z 5

A join semilattice for liveness Sets of live variables and the set union operation Idempotent: – x  x = x Commutative: – x  y = y  x Associative: – (x  y)  z = x  (y  z) Bottom element: – The empty set: Ø  x = x Ordering over elements = subset relation 6

Join semilattice example for liveness 7 {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Bottom element

Dataflow framework A global analysis is a tuple ( D, V, , F, I ), where –D is a direction (forward or backward) The order to visit statements within a basic block, NOT the order in which to visit the basic blocks –V is a set of values (sometimes called domain) –  is a join operator over those values –F is a set of transfer functions f s : V  V (for every statement s) –I is an initial value 8

Running global analyses Assume that ( D, V, , F, I ) is a forward analysis For every statement s maintain values before - IN[s] - and after - OUT[s] Set OUT[s] =  for all statements s Set OUT[entry] = I Repeat until no values change: – For each statement s with predecessors PRED[s]={p 1, p 2, …, p n } Set IN[s] = OUT[p 1 ]  OUT[p 2 ]  …  OUT[p n ] Set OUT[s] = f s (IN[s]) The order of this iteration does not matter – Chaotic iteration 9

Proving termination Our algorithm for running these analyses continuously loops until no changes are detected Problem: how do we know the analyses will eventually terminate? 10

A non-terminating analysis The following analysis will loop infinitely on any CFG containing a loop: Direction: Forward Domain: ℕ Join operator: max Transfer function: f(n) = n + 1 Initial value: 0 11

A non-terminating analysis 12 start end x = y

Initialization 13 start end x = y 0 0

Fixed-point iteration 14 start end x = y 0 0

Choose a block 15 start end x = y 0 0

Iteration 1 16 start end x = y 0 0 0

Iteration 1 17 start end x = y 1 0 0

Choose a block 18 start end x = y 1 0 0

Iteration 2 19 start end x = y 1 0 0

Iteration 2 20 start end x = y 1 0 1

Iteration 2 21 start end x = y 2 0 1

Choose a block 22 start end x = y 2 0 1

Iteration 3 23 start end x = y 2 0 1

Iteration 3 24 start end x = y 2 0 2

Iteration 3 25 start end x = y 3 0 2

Why doesn’t this terminate? Values can increase without bound Note that “increase” refers to the lattice ordering, not the ordering on the natural numbers The height of a semilattice is the length of the longest increasing sequence in that semilattice The dataflow framework is not guaranteed to terminate for semilattices of infinite height Note that a semilattice can be infinitely large but have finite height – e.g. constant propagation

Height of a lattice An increasing chain is a sequence of elements   a 1  a 2  …  a k – The length of such a chain is k The height of a lattice is the length of the maximal increasing chain For liveness with n program variables: – {}  {v 1 }  {v 1,v 2 }  …  {v 1,…,v n } For available expressions it is the number of expressions of the form a=b op c – For n program variables and m operator types: m  n 3 27

Another non-terminating analysis This analysis works on a finite-height semilattice, but will not terminate on certain CFGs: Direction: Forward Domain: Boolean values true and false Join operator: Logical OR Transfer function: Logical NOT Initial value: false 28

A non-terminating analysis 29 start end x = y

Initialization 30 start end x = y false

Fixed-point iteration 31 start end x = y false

Choose a block 32 start end x = y false

Iteration 1 33 start end x = y false

Iteration 1 34 start end x = y true false

Iteration 2 35 start end x = y true false true

Iteration 2 36 start end x = y false true

Iteration 3 37 start end x = y false

Iteration 3 38 start end x = y true false

Why doesn’t it terminate? Values can loop indefinitely Intuitively, the join operator keeps pulling values up If the transfer function can keep pushing values back down again, then the values might cycle forever 39 false true false true false...

Why doesn’t it terminate? Values can loop indefinitely Intuitively, the join operator keeps pulling values up If the transfer function can keep pushing values back down again, then the values might cycle forever How can we fix this? 40 false true false true false...

Monotone transfer functions A transfer function f is monotone iff if x  y, then f(x)  f(y) Intuitively, if you know less information about a program point, you can't “gain back” more information about that program point Many transfer functions are monotone, including those for liveness and constant propagation Note: Monotonicity does not mean that x  f(x) – (This is a different property called extensivity) 41

Liveness and monotonicity A transfer function f is monotone iff if x  y, then f(x)  f(y) Recall our transfer function for a = b + c is – f a = b + c (V) = (V – {a})  {b, c} Recall that our join operator is set union and induces an ordering relationship X  Y iff X  Y Is this monotone? 42

Is constant propagation monotone? A transfer function f is monotone iff if x  y, then f(x)  f(y) Recall our transfer functions – f x=k (V) = V| x  k (update V by mapping x to k) – f x=a+b (V) = V| x  Not-a-Constant (assign Not-a-Constant) Is this monotone? 43 Undefined Not-a-constant

The grand result Theorem: A dataflow analysis with a finite- height semilattice and family of monotone transfer functions always terminates Proof sketch: – The join operator can only bring values up – Transfer functions can never lower values back down below where they were in the past (monotonicity) – Values cannot increase indefinitely (finite height) 44

An “optimality” result A transfer function f is distributive if f(a  b) = f(a)  f(b) for every domain elements a and b If all transfer functions are distributive then the fixed-point solution is the solution that would be computed by joining results from all (potentially infinite) control-flow paths – Join over all paths Optimal if we ignore program conditions 45

An “optimality” result A transfer function f is distributive if f(a  b) = f(a)  f(b) for every domain elements a and b If all transfer functions are distributive then the fixed-point solution is equal to the solution computed by joining results from all (potentially infinite) control-flow paths – Join over all paths Optimal if we pretend all control-flow paths can be executed by the program Which analyses use distributive functions? 46

Loop optimizations Most of a program’s computations are done inside loops – Focus optimizations effort on loops The optimizations we’ve seen so far are independent of the control structure Some optimizations are specialized to loops – Loop-invariant code motion – (Strength reduction via induction variables) Require another type of analysis to find out where expressions get their values from – Reaching definitions (Also useful for improving register allocation) 47

Loop invariant computation 48 y = t * 4 x < y + z end x = x + 1 start y = … t = … z = …

Loop invariant computation 49 y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … t*4 and y+z have same value on each iteration

Code hoisting 50 x < w end x = x + 1 start y = … t = … z = … y = t * 4 w = y + z

What reasoning did we use? 51 y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … y is defined inside loop but it is loop invariant since t*4 is loop-invariant Both t and z are defined only outside of loop constants are trivially loop-invariant

What about now? 52 y = t * 4 x < y + z end x = x + 1 t = t + 1 start y = … t = … z = … Now t is not loop-invariant and so are t*4 and y

Loop-invariant code motion d: t = a 1 op a 2 – d is a program location a 1 op a 2 loop-invariant (for a loop L) if computes the same value in each iteration – Hard to know in general Conservative approximation – Each a i is a constant, or – All definitions of a i that reach d are outside L, or – Only one definition of of a i reaches d, and is loop- invariant itself Transformation: hoist the loop-invariant code outside of the loop 53

Reaching definitions analysis A definition d: t = … reaches a program location if there is a path from the definition to the program location, along which the defined variable is never redefined 54

Reaching definitions analysis A definition d: t = … reaches a program location if there is a path from the definition to the program location, along which the defined variable is never redefined Direction: Forward Domain: sets of program locations that are definitions ` Join operator: union Transfer function: f d: a=b op c (RD) = (RD - defs(a))  {d} f d: not-a-def (RD) = RD – Where defs(a) is the set of locations defining a (statements of the form a=...) Initial value: {} 55

Reaching definitions analysis 56 d4: y = t * 4 d4:x < y + z d6: x = x + 1 d1: y = … d2: t = … d3: z = … start end {}

Reaching definitions analysis 57 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … end {}

Initialization 58 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} end {}

Iteration 1 59 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} end {}

Iteration 1 60 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} end {}

Iteration 2 61 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} {}

Iteration 2 62 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {}

Iteration 2 63 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {}

Iteration 2 64 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {}

Iteration 3 65 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {}

Iteration 3 66 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5}

Iteration 4 67 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5}

Iteration 4 68 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5}

Iteration 4 69 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4, d5}

Iteration 5 70 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4, d5} {d1} {d1, d2} {d1, d2, d3} d5: x = x + 1 {d2, d3, d4} {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5}

Iteration 6 71 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4, d5} {d1} {d1, d2} {d1, d2, d3} d5: x = x + 1 {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5}

Which expressions are loop invariant 72 t is defined only in d2 – outside of loop z is defined only in d3 – outside of loop y is defined only in d4 – inside of loop but depends on t and 4, both loop-invariant start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} end {d2, d3, d4, d5} d5: x = x + 1 {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5} x is defined only in d5 – inside of loop so is not a loop-invariant

Inferring loop-invariant expressions For a statement s of the form t = a 1 op a 2 A variable a i is immediately loop-invariant if all reaching definitions IN[s]={d 1,…,d k } for a i are outside of the loop LOOP-INV = immediately loop-invariant variables and constants LOOP-INV = LOOP-INV  {x | d: x = a 1 op a 2, d is in the loop, and both a 1 and a 2 are in LOOP-INV} – Iterate until fixed-point An expression is loop-invariant if all operands are loop-invariants 73

Computing LOOP-INV 74 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5}

Computing LOOP-INV 75 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t}

Computing LOOP-INV 76 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z}

Computing LOOP-INV 77 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z}

Computing LOOP-INV 78 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z}

Computing LOOP-INV 79 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} LOOP-INV = {t, z, 4} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5}

Computing LOOP-INV 80 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4, d5} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4, d5} LOOP-INV = {t, z, 4, y}

Induction variables 81 while (i < x) { j = a + 4 * i a[j] = j i = i + 1 } i is incremented by a loop-invariant expression on each iteration – this is called an induction variable j is a linear function of the induction variable with multiplier 4

Strength-reduction 82 j = a + 4 * i while (i < x) { j = j + 4 a[j] = j i = i + 1 } Prepare initial value Increment by multiplier

Summary of optimizations 83 Enabled OptimizationsAnalysis Common-subexpression elimination Copy Propagation Available Expressions Constant foldingConstant Propagation Dead code eliminationLive Variables Loop-invariant code motionReaching Definitions

Global Register Allocation 84

85

86

Registers Most machines have a set of registers, dedicated memory locations that – can be accessed quickly, – can have computations performed on them, and – exist in small quantity Using registers intelligently is a critical step in any compiler – A good register allocator can generate code orders of magnitude better than a bad register allocator 87

Register allocation In TAC, there are an unlimited number of variables On a physical machine there are a small number of registers: – x86 has four general-purpose registers and a number of specialized registers – MIPS has twenty-four general-purpose registers and eight special-purpose registers Register allocation is the process of assigning variables to registers and managing data transfer in and out of registers 88

Challenges in register allocation Registers are scarce – Often substantially more IR variables than registers – Need to find a way to reuse registers whenever possible Registers are complicated – x86: Each register made of several smaller registers; can't use a register and its constituent registers at the same time – x86: Certain instructions must store their results in specific registers; can't store values there if you want to use those instructions – MIPS: Some registers reserved for the assembler or operating system – Most architectures: Some registers must be preserved across function calls 89

Simple approach Problem: program execution very inefficient– moving data back and forth between memory and registers 90 x = y + z mov 16(%ebp), %eax mov 20(%ebp), %ebx add %ebx, %eax mov %eax, 24(%ebx) Straightforward solution: Allocate each variable in activation record At each instruction, bring values needed into registers, perform operation, then store result to memory

Find a register allocation 91 b = a + 2 c = b * b b = c + 1 return b * a eax ebx register variable ?a ?b ?c

Is this a valid allocation? eax ebx register 92 b = a + 2 c = b * b b = c + 1 return b * a registervariable eaxa ebxb eaxc ebx = eax + 2 eax = ebx * ebx ebx = eax + 1 return ebx * eax Overwrites previous value of ‘a’ also stored in eax

Is this a valid allocation? eax ebx register 93 b = a + 2 c = b * b b = c + 1 return b * a registervariable ebxa eaxb c eax = ebx + 2 eax = eax * eax eax = eax + 1 return eax * ebx Value of ‘c’ stored in eax is not needed anymore so reuse it for ‘b’

Main idea For every node n in CFG, we have out[n] – Set of temporaries live out of n Two variables interfere if they appear in the same out[n] of any node n – Cannot be allocated to the same register Conversely, if two variables do not interfere with each other, they can be assigned the same register – We say they have disjoint live ranges How to assign registers to variables? 94

Interference graph Nodes of the graph = variables Edges connect variables that interfere with one another Nodes will be assigned a color corresponding to the register assigned to the variable Two colors can’t be next to one another in the graph 95

Interference graph construction b = a + 2 c = b * b b = c + 1 return b * a 96

Interference graph construction b = a + 2 c = b * b b = c + 1 {b, a} return b * a 97

Interference graph construction b = a + 2 c = b * b {a, c} b = c + 1 {b, a} return b * a 98

Interference graph construction b = a + 2 {b, a} c = b * b {a, c} b = c + 1 {b, a} return b * a 99

Interference graph construction {a} b = a + 2 {b, a} c = b * b {a, c} b = c + 1 {b, a} return b * a 100

Interference graph a cb eax ebx color register 101 {a} b = a + 2 {b, a} c = b * b {a, c} b = c + 1 {b, a} return b * a

Colored graph a cb eax ebx color register 102 {a} b = a + 2 {b, a} c = b * b {a, c} b = c + 1 {b, a} return b * a

Graph coloring This problem is equivalent to graph-coloring, which is NP-hard if there are at least three registers No good polynomial-time algorithms (or even good approximations!) are known for this problem We have to be content with a heuristic that is good enough for RIGs that arise in practice 103

Coloring by simplification [Kempe 1879] How to find a k-coloring of a graph Intuition: – Suppose we are trying to k-color a graph and find a node with fewer than k edges – If we delete this node from the graph and color what remains, we can find a color for this node if we add it back in – Reason: fewer than k neighbors  some color must be left over 104

Coloring by simplification [Kempe 1879] How to find a k-coloring of a graph Phase 1: Simplification – Repeatedly simplify graph – When a variable (i.e., graph node) is removed, push it on a stack Phase 2: Coloring – Unwind stack and reconstruct the graph as follows: – Pop variable from the stack – Add it back to the graph – Color the node for that variable with a color that it doesn’t interfere with 105 simplify color

Coloring k=2 b ed a c stack: eax ebx color register 106

Coloring k=2 b ed a stack: c c eax ebx color register 107

Coloring k=2 b ed a stack: e c c eax ebx color register 108

Coloring k=2 b ed a stack: a e c c eax ebx color register 109

Coloring k=2 b ed a stack: b a e c c eax ebx color register 110

Coloring k=2 b ed a stack: d b a e c c eax ebx color register 111

Coloring k=2 b ed eax ebx color register a stack: b a e c c 112

Coloring k=2 b e a stack: a e c c eax ebx color register d 113

Coloring k=2 e a stack: e c c eax ebx color register b d 114

Coloring k=2 e stack: c c eax ebx color register a b d 115

Coloring k=2 stack: c eax ebx color register e a b d 116

Failure of heuristic If the graph cannot be colored, it will eventually be simplified to graph in which every node has at least K neighbors Sometimes, the graph is still K-colorable! Finding a K-coloring in all situations is an NP- complete problem – We will have to approximate to make register allocators fast enough 117

Coloring k=2 stack: c eax ebx color register e a b d 118

Coloring k=2 c eax ebx color register e a b d stack: c b e a d Some graphs can’t be colored in K colors: 119

Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: b e a d 120

Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: e a d 121

Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: e a d no colors left for e! 122

Chaitin’s algorithm Choose and remove an arbitrary node, marking it “troublesome” – Use heuristics to choose which one – When adding node back in, it may be possible to find a valid color – Otherwise, we have to spill that node 123

Spilling Phase 3: spilling – once all nodes have K or more neighbors, pick a node for spilling There are many heuristics that can be used to pick a node Try to pick node not used much, not in inner loop Storage in activation record – Remove it from graph We can now repeat phases 1-2 without this node Better approach – rewrite code to spill variable, recompute liveness information and try to color again 124

Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: e a d no colors left for e! 125

Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: b e a d 126

Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: e a d 127

Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: a d 128

Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: d 129

Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: 130

Handling precolored nodes Some variables are pre-assigned to registers – Eg: mul on x86/pentium uses eax; defines eax, edx – Eg: call on x86/pentium Defines (trashes) caller-save registers eax, ecx, edx To properly allocate registers, treat these register uses as special temporary variables and enter into interference graph as precolored nodes 131

Handling precolored nodes Simplify. Never remove a pre-colored node – it already has a color, i.e., it is a given register Coloring. Once simplified graph is all colored nodes, add other nodes back in and color them using precolored nodes as starting point 132

Optimizing move instructions Code generation produces a lot of extra mov instructions mov t5, t9 If we can assign t5 and t9 to same register, we can get rid of the mov – effectively, copy elimination at the register allocation level Idea: if t5 and t9 are not connected in inference graph, coalesce them into a single variable; the move will be redundant Problem: coalescing nodes can make a graph un-colorable – Conservative coalescing heuristic 133

Summary of material 1/2 134 TechniquesCompiler task Regular expressions Finite automata (DFA/NFA) Determinization via subset construction Maximal munch and precedences Automatic scanner generation tools (Jflex) Scanning Context-free grammars Leftmost/Rightmost-derivations, parse trees Ambiguity / ambiguity elimination tactics LL parsing: building prediction tables (FIRST/FOLLOWS), conflicts, left-recursion elimination, recursive descent, automata-based parsing Shift-reduce parsing: LR items, transition relation construction, conflicts, SLR, LALR, resolving ambiguity via precedence, automatic parser generation tools (CUP) Parsing

Summary of material 2/2 135 TechniquesCompiler task Three-Address Code and recursive lowering, Sethi-Ullman translation minimizing number of temporaries Lowering to IR Basic blocks, control flow graphs Local analysis: transfer functions Local analysis vs. Global analysis Dataflow analysis: join semilattices, partial orderings, monotone transfer functions Available expressions, liveness, constant propagation, reaching definitions Common-subexpression elimination, copy propagation, constant folding, loop-invariant code motion Optimizations Naïve allocation Register interference graph – isomorphism to graph coloring Graph coloring by simplification Chaitin’s algorithm (spilling) Register allocation

Good luck with final project and exams! I hope some of this was interesting Advertisement for next semester course: Program Analysis and Verification Program Analysis and Verification