1 Code Generation The target machine Instruction selection and register allocation Basic blocks and flow graphs A simple code generator Peephole optimization.

Slides:



Advertisements
Similar presentations
DCSP-20 Jianfeng Feng Department of Computer Science Warwick Univ., UK
Advertisements

SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan
CSE 5317/4305 L9: Instruction Selection1 Instruction Selection Leonidas Fegaras.
Chapter 4 The Von Neumann Model
Shortest Violation Traces in Model Checking Based on Petri Net Unfoldings and SAT Victor Khomenko University of Newcastle upon Tyne Supported by IST project.
BE Ernst & Young Center of eBusiness Innovation (CBI) Switzerland Version 1.0 Adolf Dörig January 22, 2000 Center of eBusiness Innovation (CBI)
Compiler Construction
Slides created by: Professor Ian G. Harris Efficient C Code  Your C program is not exactly what is executed  Machine code is specific to each ucontroller.
Compiler Construction Sohail Aslam Lecture Code Generation  The code generation problem is the task of mapping intermediate code to machine code.
Computer Science 210 Computer Organization Strings, I/O, and Trap Service Routines.
Joey Paquet, Lecture 11 Code Generation.
Overheads for Computers as Components 2nd ed.
CPSC 330 Fall 1999 HW #1 Assigned September 1, 1999 Due September 8, 1999 Submit in class Use a word processor (although you may hand-draw answers to Problems.
Stored Program Architecture
Target Code Generation
CMPUT Compiler Design and Optimization
Passing by-value vs. by-reference in ARM by value C code equivalent assembly code int a;.section since a is not assigned an a:.skip initial.
Target code Generation Made by – Siddharth Rakesh 11CS30036 Date – 12/11/2013.
Tutorial 2 IDE for ARM 7 board (2). Outline Introduce the Debug mode of uVision4 2.
Code Generation.
Lecture 9 – OOO execution © Avi Mendelson, 5/ MAMAS – Computer Architecture Lecture 9 – Out Of Order (OOO) Dr. Avi Mendelson Some of the slides.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
1 Code generation Our book's target machine (appendix A): opcode source1, source2, destination add r1, r2, r3 addI r1, c, r2 loadI c, r2 load r1, r2 loadAI.
1 CS 201 Compiler Construction Machine Code Generation.
1 Chapter 8: Code Generation. 2 Generating Instructions from Three-address Code Example: D = (A*B)+C =* A B T1 =+ T1 C T2 = T2 D.
A simple register allocation optimization scheme.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Control Flow Analysis. Construct representations for the structure of flow-of-control of programs Control flow graphs represent the structure of flow-of-control.
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
ITCS 3181 Logic and Computer Systems 2015 B. Wilkinson slides3.ppt Modification date: March 16, Addressing Modes The methods used in machine instructions.
Execution of an instruction
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Lecture 23 Basic Blocks Topics Code Generation Readings: 9 April 17, 2006 CSCE 531 Compiler Construction.
Code Generation Professor Yihjia Tsai Tamkang University.
1 CS 201 Compiler Construction Lecture 1 Introduction.
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Execution of an instruction
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
1 Code Generation. 2 Position of a Code Generator in the Compiler Model Front-End Code Optimizer Source program Symbol Table Lexical error Syntax error.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
Compilers Modern Compiler Design
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Computer Organization Instructions Language of The Computer (MIPS) 2.
Topic #9: Target Code EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Chapter10: Code generator. 2 Code Generator Source Program Target Program Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator.
Code Generation Part I Chapter 8 (1st ed. Ch.9)
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Instruction Set.
Code Generation Part I Chapter 9
Instruction cycle Instruction: A command given to the microprocessor to perform an operation Program : A set of instructions given in a sequential.
Code Generation.
Code Generation Part I Chapter 8 (1st ed. Ch.9)
CS 201 Compiler Construction
Code Generation Part I Chapter 9
Under Address Modes Source: under
Introduction to Micro Controllers & Embedded System Design
Under Address Modes Source: under
8 Code Generation Topics A simple code generator algorithm
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Code Generation Part II
Instruction Set Summary
Target Code Generation
TARGET CODE GENERATION
CS 201 Compiler Construction
Presentation transcript:

1 Code Generation The target machine Instruction selection and register allocation Basic blocks and flow graphs A simple code generator Peephole optimization Instruction selector generator Graph-coloring register allocator

2 The Target Machine A byte addressable machine with four bytes to a word and n general purpose registers Two address instructions –opsource, destination Six addressing modes –absolute MM1 –register RR0 –indexed c(R) c+content(R)1 –ind register *R content(R)0 –ind indexed *c(R) content(c+content(R))1 –literal #cc1

3 Examples MOVR0, M MOV4 (R0), M MOV*R0, M MOV*4 (R0), M MOV#1, R0

4 Instruction Costs Cost of an instruction = 1 + costs of source and destination addressing modes This cost corresponds to the length (in words) of the instruction Minimize instruction length also tend to minimize the instruction execution time

5 Examples MOVR0, R11 MOVR0, M2 MOV#1, R02 MOV4 (R0), *12 (R1)3

6 An Example Consider a := b + c 1. MOVb, R02. MOVb, a ADDc, R0 ADDc, a MOVR0, a 3. R0, R1, R2 contains4. R1, R2 contains the addresses of a, b, c the values of b, c MOV*R1, *R0 ADDR2, R1 ADD*R2, *R0 MOVR1, a

7 Instruction Selection Code skeleton x := y + z a := b + c d := a + e MOVy, R0 MOV b, R0 MOV a, R0 ADDz, R0 ADD c, R0 ADD e, R0 MOVR0, x MOV R0, a MOV R0, d Multiple choices a := a + 1 MOV a, R0 INC a ADD #1, R0 MOV R0, a

8 Register Allocation Register allocation: select the set of variables that will reside in registers Register assignment: pick the specific register that a variable will reside in The problem is NP-complete

9 An Example t := a + b t := t * ct := t + ct := t / d MOVa, R1MOVa, R0 ADDb, R1ADDb, R0 MULc, R0ADDc, R0 DIVd, R0SRDAR0, 32 MOVR1, tDIVd, R0 MOVR1, t

10 Basic Blocks A basic block is a sequence of consecutive statements in which control enters at the beginning and leaves at the end without halt or possibility of branching except at the end

11 An Example (1)prod := 0 (2)i := 1 (3)t1 := 4 * i (4)t2 := a[t1] (5)t3 := 4 * i (6)t4 := b[t3] (7)t5 := t2 * t4 (8)t6 := prod + t5 (9)prod := t6 (10)t7 := i + 1 (11)i := t7 (12)if i <= 20 goto (3)

12 Flow Graphs A flow graph is a directed graph The nodes in the graph are basic blocks There is an edge from B 1 to B 2 iff B 2 immediately follows B 1 in some execution sequence –B 2 immediately follows B 1 in program text –there is a jump from B 1 to B 2 B 1 is a predecessor of B 2, B 2 is a successor of B 1

13 An Example (1)prod := 0 (2)i := 1 (3)t1 := 4 * i (4)t2 := a[t1] (5)t3 := 4 * i (6)t4 := b[t3] (7)t5 := t2 * t4 (8)t6 := prod + t5 (9)prod := t6 (10)t7 := i + 1 (11)i := t7 (12)if i <= 20 goto (3) B0B0 B1B1

14 Construction of Basic Blocks Determine the set of leaders –the first statement is a leader –the target of a jump is a leader –any statement immediately following a jump is a leader For each leader, its basic block consists of the leader and all statements up to but not including the next leader or the end of the program

15 Representation of Basic Blocks Each basic block is represented by a record consisting of –a count of the number of statements –a pointer to the leader –a list of predecessors –a list of successors

16 Define and Use A three address statement x := y + z is said to define x and to use y and z A name is live in a basic block at a given point if its value is used after that point, perhaps in another basic block

17 Next-Use Information i:x := … …no assignment to x j:y := … x … Statement j uses the value of x defined at i

18 An Example (1)a := b + c a:(2,3,5), c:(4), d:(2) (2)e := a + d a:(3,5), c:(4), e:(3) (3)f := e - a a:(5), c:(4), f:(4) (4)e := f + c a:(5), e:(5) (5)g := e - a g:(?) b, c, d are live at the beginning of the block b:(1), c:(1,4), d:(2)

19 Computing Next Uses Scan statements “i: x := y op z” backward Attach to statement i the information currently found in the symbol table regarding the next uses and liveness of x, y, and z In the symbol table, set x to “not live” and clear the next uses” of x In the symbol table, set y and z to “live” and add i to the “next uses” of y and z among blocks within blocks

20 A Simple Code Generator Consider each statement in a basic block in turn, remembering if operands are in registers Assume that –each operator has a corresponding target language operator –computed results can be left in registers as long as possible, unless out of registers at the end of a basic block

21 Register and Address Descriptors A register descriptor keeps track of what is currently in each register An address descriptor keeps track of the location(s) where the current value of the name can be found at run time

22 An Example d := (a - b) + (a - c) + (a - c) [ ] t := a - bMOV a, R0[R0:(t)] SUB b, R0[t:(R0)] u := a - cMOV a, R1[R0:(t), R1:(u)] SUB c, R1[t:(R0), u:(R1)] v := t + uADD R1, R0[R0:(v), R1:(u)] [v:(R0), u:(R1)] d := v + uADD R1, R0[R0:(d)] [d:(R0)] MOV R0, d[ ] [ ]

23 Code Generation Algorithm Consider an instruction of the form “x := y op z” Invoke getreg to determine the location L where the result of “y op z” will be placed Determine a current location y’ of y from the address descriptor (register location preferred). If y’ is not L, generate “MOV y’, L” Generate “op z’, L”, where z’ is a current location of z from the address descriptor. Update the address and register descriptors for x, y, z, and L

24 Code Generation Algorithm Consider an instruction of the form “x := y” If y is in a register, change the register and address descriptors If y is in memory, –if x has next use in the block, invoke getreg to find a register r, generate “MOV y, r”, and make r the location of x –otherwise, generate “MOV y, x”

25 Code Generation Algorithm Once all statements in the basic block are processed, we store those names that are live on exit and not in their memory locations

26 The Function getreg Consider an instruction of the form “x := y op z” If y is in a register r that holds the value of no other names, and y is not live and no next uses after this statement, return r Otherwise, return an empty register r if there is one Otherwise, if x has a next use in the block, or op is an operator requiring a register, find an occupied register r. Store the value of r, update address descriptor, and return r If x has no next use, or no suitable occupied register can be found, return the memory location of x

27 An Example d := (a - b) + (a - c) + (a - c) [ ] t := a - bMOV a, R0[R0:(t)] SUB b, R0[t:(R0)] u := a - cMOV a, R1[R0:(t), R1:(u)] SUB c, R1[t:(R0), u:(R1)] v := t + uADD R1, R0[R0:(v), R1:(u)] [v:(R0), u:(R1)] d := v + uADD R1, R0[R0:(d)] [d:(R0)] MOV R0, d[ ] [ ]

28 Indexing and Pointer Operations i in Ri i in Mi i in Si(A) a := b[i] MOV b(Ri), R MOV Mi, R MOV Si(A), R MOV b(R), R MOV b(R), R a[i] := b MOV b, a(Ri) MOV Mi, R MOV Si(A), R MOV b, a(R) MOV b, a(R) p in Rp p in Mp p in Sp(A) a := *p MOV *Rp, R MOV Mp, R MOV Sp(A), R MOV *R, R MOV *R, R *p := a MOV a, *Rp MOV Mp, R Mov a, R MOV a, *R MOV R, *Sp(A)

29 Conditional Statements Condition codes if x < y goto zCMP x, y CJ< z Conditon code descriptors x := y + zMOV y, R0 if x < 0 goto zADD z, R0 MOV R0, x CJ< z

30 Global Register Allocation Keep live variables in registers across block boundaries Keep variables frequently used in inner loops in registers

31 Loops A loop is a collection of nodes such that –all nodes in the collection are strongly connected –the collection of nodes has a unique entry An inner loop is one that contains no other loops

32 Variable Usage Counts Savings –Count a saving of one for each use of x in loop L that is not preceded by an assignment to x in the same block –Save two units if we can avoid a store of x at the end of a block Costs –Cost two units if x is live at the entry or exit of the inner loop

33 An Example a := b + c d := d - b e := a + f f := a - d b := d + f e := a - c b := d + c B1 B2B3 B4 b,c,d,e,f b,c,d,f b,c,d,e,f c,d,e,f b,c,d,e,f b,d,e,f a,c,d,f a,c,d,e a,c,d,e,f

34 An Example use(a, B1) = 0, use(a, B2) = 1 use(a, B3) = 1, use(a, B4) = 0 live(a, B1) = 1, live(a, B2) = 0 live(a, B3) = 0, live(a, B4) = 0 save(a) = ( ) + 2  ( ) = 4 save(b) = 5 save(c) = 3 save(d) = 6 save(e) = 4 save(f) = 4

35 An Example MOV R1, R0; ADD c, Ro SUB R1, R2; MOV R0, R3 ADD f, R3; MOV R3, e MOV R0, R3; SUB R2, R3 MOV R3, f MOV R2, R1; ADD f, R1 MOV R0, R1; SUB c, R3 MOV R3, e MOV R2, R1; ADD c, R1 B1 B2B3 B4 MOV R1, b; MOV R2, d MOV b, R1; MOV d, R2

36 Register Assignment for Outer Loops Apply the same idea for inner loops to progressively larger loops If an outer loop L1 contains an inner loop L2, a name allocated a register in L2 need not be allocated a register in L1-L2 If name x is allocated a register in L1 but not L2, need store x on entrance to L2 and load x on exit from L2 If name x is allocated a register in L2 but not L1, need load x on entrance to L2 and store x on exit from L2

37 Peephole Optimization Improve the performance of the target program by examining and transforming a short sequence of target instructions May need repeated passes over the code Can also be applied directly after intermediate code generation

38 Examples Redundant loads and stores MOVR0, a MOVa, Ro Algebraic Simplification x := x + 0 x := x * 1 Constant folding x := 2 + 3x := 5 y := x + 3y := 8

39 Examples Unreachable code #define debug 0 if (debug) (print debugging information) if 0 <> 1 goto L1 print debugging information L1: if 1 goto L1 print debugging information L1:

40 Examples Flow-of-control optimization goto L1goto L2 …… L1: goto L2 L2: goto L2 goto L1if a < b goto L2 …goto L3 L1: if a < b goto L2… L3: L3:

41 Examples Reduction in strength: replace expensive operations by cheaper ones –x 2  x * x –fixed-point multiplication and division by a power of 2  shift –floating-point division by a constant  floating-point multiplication by a constant

42 Examples Use of machine Idioms: hardware instructions for certain specific operations –auto-increment and auto-decrement addressing mode (push or pop stack in parameter passing)

43 DAG Representation of Blocks Easy to determine: common subexpressions names used in the block but evaluated outside the block names whose values could be used outside the block

44 DAG Representation of Blocks Leaves labeled by unique identifiers Interior nodes labeled by operator symbols Nodes optionally given a sequence of identifiers, having the value represented by the nodes

45 An Example (1)t1 := 4 * i (2)t2 := a[t1] (3)t3 := 4 * i (4)t4 := b[t3] (5)t5 := t2 * t4 (6)t6 := prod + t5 (7)prod := t6 (8)t7 := i + 1 (9)i := t7 (10)if i <= 20 goto (1) i0i0 41 <= * [] + b a * prod t1,t3 t4 t2 t5 t6, prod (1) t7, i

46 Constructing a DAG Consider x := y op z. Other statements can be handled similarly If node(y) is undefined, create a leaf labeled y and let node(y) be this leaf. If node(z) is undefined, create a leaf labeled z and let node(z) be that leaf

47 Constructing a DAG Determine if there is a node labeled op, whose left child is node(y) and its right child is node(z). If not, create such a node. Let n be the node found or created. Delete x from the list of attached identifiers for node(x). Append x to the list of attached identifiers for the node n and set node(x) to n

48 Reconstructing Quadruples Evaluate the interior nodes in topological order Assign the evaluated value to one of its attached identifier x, preferring one whose value is needed outside the block If there is no attached identifier, create a new temp to hold the value If there are additional attached identifiers y1, y2, …, yk whose values are also needed outside the block, add y1 := x, y2 := x, …, yk := x

49 An Example (1)t1 := 4 * i (2)t2 := a[t1] (3)t3 := b[t1] (4)t4 := t2 * t3 (5)prod := prod + t4 (6)i := i + 1 (7)if i <= 20 goto (1) i0i0 41 <= * []+ b a * prod prod (1) i

50 Arrays, Pointers, Procedure Calls x := a[i] a[j] := yz := x z := a[i]a[j] := y => range analysis *p := w => aliasing analysis side effects caused by procedure calls => inter-procedural analysis

51 Ordering Rules Any evaluation of or assignment to an element of array a must follow the previous assignment of that array if there is one Any assignment to an element of array a must follow any previous evaluation of a

52 Ordering Rules Any use of any identifier must follow the previous procedure call or indirect assignment through a pointer if there is one Any procedure call or indirect assignment through a pointer must follow all previous evaluations of any identifier

53 Generating Code From DAGs t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 (1)MOV a, R0 (2)ADD b, R0 (3)MOV c, R1 (4)ADD d, R1 (5)MOV R0, t1 (6)MOV e, R0 (7)SUB R1, R0 (8)MOV t1, R1 (9)SUB R0, R1 (10)MOV R1, t a0a0 b0b0 e0e0 c0c0 d0d0 t1 t2 t3 t4

54 Rearranging the Order t2 := c + d t3 := e - t2 t1 := a + b t4 := t1 - t3 (1)MOV c, R0 (2)ADD d, R0 (3)MOV e, R1 (4)SUB R0, R1 (5)MOV a, R0 (6)ADD b, R0 (7)SUB R1, R0 (8)MOV R0, t a0a0 b0b0 e0e0 c0c0 d0d0 t1 t2 t3 t4

55 A Heuristic Ordering for DAG Attempt as far as possible to make the evaluation of a node immediately follow the evaluation of its left most argument

56 Node Listing Algorithm while unlisted interior nodes remain do begin select an unlisted node n, all of whose parents have been listed; list n; while the leftmost child m of n has no unlisted parents and is not a leaf do begin list m; n := m; end

57 An Example * a0a0 b0b0 c0c0 + d0d0 e0e * 4 57 t7 := d + e t6 := a + b t5 := t6 - c t4 := t5 * t7 t3 := t4 - e t2 := t6 + t4 t1 := t2 * t3

58 Generating Code From Trees There exists an algorithm that determines the optimal order in which to evaluate statements in a block when the dag representation of the block is a tree Optimal order here means the order that yields the shortest instruction sequence

59 Optimal Ordering for Trees Label each node of the tree bottom-up with an integer denoting fewest number of registers required to evaluate the tree with no stores of immediate results Generate code during a tree traversal by first evaluating the operand requiring more registers

60 The Labeling Algorithm if n is a leaf then if n is the leftmost child of its parent then label(n) := 1 else label(n) := 0 else begin let n 1, n 2, …, n k be the children of n ordered by label so that label(n 1 )  label(n 2 )  …  label(n k ); label(n) := max 1  i  k (label(n i ) + i - 1) end

61 An Example t1 t4 t2 ab c t3 d e For binary interior nodes: label(n) = max(l1, l2), if l1  l2 l1 + 1, if l1 = l2

62 Code Generation From a Labeled Tree Use a stack rstack to allocate registers R0, R1, …, R(r-1) The value of a tree is always computed in the top register on rstack The function swap(rstack) interchanges the top two registers on rstack Use a stack tstack to allocate temporary memory locations T0, T1,...

63 Cases Analysis op n1n1 n2n2 n name op n1n1 n2n2 n1n1 n2n2 n1n1 n2n2 label(n 1 ) < label(n 2 ) label(n 2 )  label(n 1 )both labels  r

64 The Function gencode procedure gencode(n); begin if n is a left leaf representing operand name and n is the leftmost child of its parent then print 'MOV' || name || ',' || top(rstack) else if n is an interior node with operator op, left child n 1, and right child n 2 then if label(n 2 ) = 0 then /* case 1 */ else if 1  label(n 1 ) < label(n 2 ) and label(n 1 ) < r then /* case 2 */ else if 1  label(n 2 )  label(n 1 ) and label(n 2 ) < r then /* case 3 */ else /* case 4, both labels  r */ end

65 The Function gencode /* case 1 */ begin let name be the operand represented by n 2 ; gencode(n 1 ); print op || name || ',' || top(rstack) end /* case 2 */ begin swap(rstack); gencode(n 2 ); R := pop(rstack); gencode(n 1 ); print op || R || ',' || top(rstack); push(rstack, R); swap(rstack); end

66 The Function gencode /* case 3 */ begin gencode(n 1 ); R := pop(rstack); gencode(n 2 ); print op || R || ',' || top(rstack); push(rstack, R); end /* case 4 */ begin gencode(n 2 ); T := pop(tstack); print 'MOV' || top(rstack) || ',' || T; gencode(n 1 ); push(tstack, T); print op || T || ',' || top(rstack); end

67 An Example t1 t4 t2 ab c t3 d e gencode(t4) [R1, R0] /* 2 */ gencode(t3) [R0, R1] /* 3 */ gencode(e) [R0, R1] /* 0 */ print MOV e, R1 gencode(t2) [R0] /* 1 */ gencode(c) [R0] /* 0 */ print MOV c, R0 print ADD d, R0 print SUB R0, R1 gencode(t1) [R0] /* 1 */ gencode(a) [R0] /* 0 */ print MOV a, R0 print ADD b, R0 print SUB R1, R

68 Multiregister Operations Some operations like multiplication, division, or a function call normally require more than one register The labeling algorithm needs to ensure that label(n) is always at least the number of registers required by the operation label(n) = max(2, l1, l2), if l1  l2 l1 + 1, if l1 = l2

69 Algebraic Properties + T1T1 + T1T1 1l max(2, l) l0 l Ti3Ti3 + Ti1Ti1 Ti2Ti2 + Ti4Ti4 ++ T1T1 T4T4 + T2T2 T3T3 + commutative associative largest

70 Common Subexpressions Nodes with more than one parent in a dag are called shared nodes Optimal code generation for dags on both a one-register machine or an unlimited number of registers machine are NP-complete

71 Partitioning a DAG into Trees Partition a dag into a set of trees by finding for each root and shared node n, the maximal subtree with n as root that includes no other shared nodes, except as leaves Determine a code generation ordering for the trees Generate code for each tree using the algorithms for generating code from trees

72 An Example * a0a0 b0b0 c0c0 + d0d0 e0e * ** e0e * e0e * e0e0 d0d0 c0c0 + a0a0 b0b0 6

73 Dynamic Programming Code Generation The dynamic programming algorithm applies to a broad class of register machines with complex instruction sets Machines has r interchangeable registers Machines has instructions of the form Ri = E where E is any expression containing operators, registers, and memory locations. If E involves registers, then Ri must be one of them

74 Dynamic Programming The dynamic programming algorithm partitions the problem of generating optimal code for an expression into sub-problems of generating optimal code for the sub- expressions of the given expression + T1T1 T2T2

75 Contiguous Evaluation We say a program P evaluates a tree T contiguously if it first evaluates those subtrees of T that need to be computed into memory it then evaluates the subtrees of the root in either order it finally evaluates the root

76 Optimally Contiguous Program For the machines defined above, given any program P to evaluate an expression tree T, we can find an equivalent program P' such that –P' is of no higher cost than P –P' uses no more registers than P –P' evaluates the tree in a contiguous fashion This implies that every expression tree can be evaluated optimally by a contiguous program

77 Dynamic Programming Algorithm Phase 1: compute bottom-up for each node n of the expression tree T an array C of costs, in which the ith component C[i] is the optimal cost of computing the subtree S rooted at n into a register, assuming i registers are available for the computation. C[0] is the optimal cost of computing the subtree S into memory

78 Dynamic Programming Algorithm To compute C[i] at node n, consider each machine instruction R := E whose expression E matches the subexpression rooted at node n Determine the costs of evaluating the operands of E by examining the cost vectors at the corresponding descendants of n

79 Dynamic Programming Algorithm For those operands of E that are registers, consider all possible orders in which the corresponding subtrees of T can be evaluated into registers In each ordering, the first subtree corresponding to a register operand can be evaluated using i available registers, the second using i-1 registers, and so on

80 Dynamic Programming Algorithm For node n, add in the cost of the instruction R := E that was used to match node n The value C[i] is then the minimum cost over all possible orders At each node, store the instruction used to achieve the best cost for C[i] for each i The smallest cost in the vector gives the minimum cost of evaluating T

81 Dynamic Programming Algorithm Phase 2: traverse T and use the cost vectors to determine which subtrees of T must be computed into memory Phase 3: traverse T and use the cost vectors and associated instructions to generate the final target code

82 An Example Consider a machine with two registers R0 and R1 and instructions Ri := MjMi := RiRi := Rj Ri := Ri op RjRi := Ri op Mj - + (0, 1, 1) (8, 8, 7) (3, 2, 2) ab / * (5, 5, 4) (0, 1, 1) (3, 2, 2) e cd (0, 1, 1)

83 An Example - + (0, 1, 1) (8, 8, 7) (3, 2, 2) ab / * (5, 5, 4) (0, 1, 1) (3, 2, 2) c de (0, 1, 1) R0 := c R1 := d R1 := R1 / e R0 := R0 * R1 R1 := a R1 := R1 - b R1 := R1 + R0

84 Code Generator Generators A tool to automatically construct the instruction selection phrase of a code generator Such tools may use tree grammars or context free grammars to describe the target machines Register allocation will be implemented as a separate mechanism Graph coloring is one of the approaches for register allocation

85 Tree Rewriting := ind+ mem b const ind const i const a reg sp a[i] := b + 1

86 Tree Rewriting The code is generated by reducing the input tree into a single node using a sequence of tree-rewriting rules Each tree rewriting rule is of the form replacement  template { action } –replacement is a single node –template is a tree –action is a code fragment A set of tree-rewriting rules is called a tree- translation scheme

87 An Example reg i  + reg j { ADD Rj, Ri } Each tree template represents a computation performed by the sequence of machines instructions emitted by the associated action

88 Tree Rewriting Rules (1) reg i  const c { MOV #c, Ri } (2) reg i  mem a { MOV a, Ri } (3) := mem a reg i mem  { MOV Ri, a } (4) := indreg j mem  reg i { MOV Rj, *Ri } + const c reg j reg i  ind (5) { MOV c(Rj), Ri }

89 Tree Rewriting Rules (6) reg i  { ADD c(Rj), Ri } + const 1 reg i  (8) { INC Ri } reg i + reg j reg i  (7) { ADD Rj, Ri } reg i + ind reg j reg i + const c

90 An Example := ind+ mem b const ind const i const a reg sp (1) { MOV #a, R0 }

91 An Example := ind+ mem b const ind const i reg 0 reg sp (7) { ADD SP, R0 }

92 An Example := ind+ mem b const ind const i reg 0 reg sp (5) (6) { MOV i (SP), R1 } { ADD i (SP), R0 }

93 An Example := ind+ mem b const 1 reg 0 (2) { MOV b, R1 }

94 An Example := ind+ reg 1 const 1 reg 0 (8) { INC R1 }

95 An Example := indreg 1 reg 0 (4) { MOV R1, *R0 }

96 Tree Pattern Matching The tree pattern matching algorithm can be implemented by extending the multiple- keyword pattern matching algorithm Each tree template is represented by a set of strings, each of which represents a path from the root to a leave Each rule is associated with cost information The dynamic programming algorithm can be used to select an optimal sequence of matches

97 Semantic Predicates reg i  + const c { if c = 1 then INC Ri else ADD #c, Ri } The general use of semantic actions and predicates can provide greater flexibility and ease of description than a purely grammatical specification

98 Pattern Matching by Parsing Use an LR parser to do the pattern matching The input tree can be treated as a string by using its prefix representation := ind + + const a reg sp ind + const i reg sp + mem b const 1 The tree-translation scheme can be converted into a syntax-directed translation scheme by replacing the tree templates with their prefix representations

99 Syntax-Directed Translation Scheme (1)reg i  const c { MOV #c, Ri } (2)reg i  mem a { MOV a, Ri } (3) mem  := mem a reg i { MOV Ri, a } (4) mem  := ind reg i reg j { MOV Rj, *Ri } (5) reg i  ind + const c reg j { MOV c(Rj), Ri } (6) reg i  + reg i ind + const c reg j { ADD c(Rj), Ri } (7) reg i  + reg i reg j { ADD Rj, Ri } (8) reg i  + reg i const 1 { INC Ri }

100 Advantages of Syntax-Directed Translation Scheme The parsing method is efficient and well understood It is relatively easy to retarget the code generator The code generator can be made more efficient by adding special-case productions

101 Disadvantages of Syntax- Directed Translation Scheme A left-to-right order of evaluation is fixed The machine description grammar can become inordinately large Context free grammar is usually highly ambiguous

102 Graph Coloring In the first pass, target machine instructions are selected as though there were an infinite number of symbolic registers In the second pass, physical registers are assigned to symbolic registers using graph coloring algorithms During the second pass, if a register is needed when all available registers are used, some of the used registers must be spilled

103 Interference Graph For each procedure, a register-interference graph is constructed The nodes in the graph are symbolic registers An edge connects two nodes if one is live at a point where the other is defined

104 K-Colorable Graphs A graph is said to be k-colorable if each node can be assigned one of the k colors such that no two adjacent nodes have the same color A color represents a register The problem of determining whether a graph is k-colorable is NP-complete

105 A Graph Coloring Algorithm Remove a node n and its edges if it has fewer than k neighbors Repeat the removing step above until we end up with the empty graph or a graph in which each node has k or more adjacent nodes In the latter case, a node is selected and spilled by deleting that node and its edges, and the removing step above continues

106 A Graph Coloring Algorithm The nodes in the graph can be colored in the reverse order in which they are removed Each node can be assigned a color not assigned to any of its neighbors Spilled nodes can be assigned any color

107 An Example

108 An Example G B G R R B G R R B GR GRR