7. Code Generation Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, (2 nd Ed. 2006) 1
2 Overview
3 Interpretation An interpreter is a program that consider the nodes of the AST in the correct order and performs the actions prescribed for those nodes by the semantics of the language. Two varieties Recursive Iterative
4 Interpretation Recursive interpretation operates directly on the AST [attribute grammar] simple to write thorough error checks very slow: 1000x speed of compiled code Iterative interpretation operates on intermediate code good error checking slow: 100x speed of compiled code
5 Recursive Interpretation
6 Self-identifying data must handle user-defined data types value = pointer to type descriptor + array of subvalues example: complex number re: 3.0 im: 4.0
7 Complex number representation
8 Iterative interpretation Operates on threaded AST Active node pointer Flat loop over a case statement IF condition THENELSE FI
9 Sketch of the main loop
10 Example for demo compiler
11 Code Generation Compilation produces object code from the intermediate code tree through a process called code generation Tree rewriting Replace nodes and subtrees of the AST by target code segments Produce a linear sequence of instructions from the rewritten AST
12 Example of code generation a:=(b[4*c+d]*2)+9;
13 Machine instructions Load_Addr M[Ri], C, Rd Loads the address of the Ri-th element of the array at M into Rd, where the size of the elements of M is C bytes Load_Byte (M+Ro)[Ri], C, Rd Loads the byte contents of the Ri-th element of the array at M plus offset Ro into Rd, where the other parameters have the same meanings as above
14 Two sample instructions with their ASTs
15 Code generation Main issues: Code selection – which template? Register allocation – too few! Instruction ordering Optimal code generation is NP-complete Consider small parts of the AST Simplify target machine Use conventions
16 Object code sequence Load_Byte (b+Rd)[Rc], 4, Rt Load_Addr 9[Rt], 2, Ra
17 Trivial code generation
18 Code for (7*(1+5))
19 Partial evaluation
20 New Code
21 Simple code generation Consider one AST node at a time Two simplistic target machines Pure register machine Pure stack machine BP SP stack frame vars
22 Pure stack machine Instructions
23 Example of p:=p+5 Push_Local #p Push_Const 5 Add_Top2 Store_Local #p
24 Pure register machine Instructions
25 Example of p:=p+5 Load_Mem p, R1 Load_Const 5, R2 Add_Reg R2, R1 Store_Reg R1, p
26 Simple code generation for a stack machine The AST for b*b – 4 *(a*c)
27 The ASTs for the stack machine instructions
28 The AST for b*b - 4*(a*c) rewritten
29 Simple code generation for a stack machine (demo) example: b*b – 4*a*c threaded AST - ** * bb ac 4
30 Simple code generation for a stack machine (demo) example: b*b – 4*a*c threaded AST - ** * bb ac 4 Sub_Top2 Mul_Top2 Push_Local #b Push_Local #aPush_Local #c Push_Const 4
31 Simple code generation for a stack machine (demo) example: b*b – 4*a*c rewritten AST - ** * bb ac 4 Sub_Top2 Mul_Top2 Push_Local #b Push_Local #aPush_Local #c Push_Const 4 Push_Local #b Mul_Top2 Push_Const 4 Push_Local #a Push_Local #c Mul_Top2 Sub_Top2
32 Depth-first code generation
33 Stack configurations
34 Simple code generation for a register machine The ASTs for the register machine instructions
35 Code generation with register allocation
36 Code generation with register numbering
37 Register machine code for b*b - 4*(a*c)
38 Register contents
39 Weighted register allocation It is advantageous to generate the code for the child that requires the most registers first Weight: The number of registers required by a node
40 Register weight of a node
41 AST for b*b-4*(a*c) with register weights
42 Weighted register machine code
43 Example Parameter number N Stored weight Registers occupied when starting parameter N Maximum per parameter Overall maximum 4
44 Example: Tree representation
45 Register spilling Too few registers? Spill registers in memory, to be retrieved later Heuristic: select subtree that uses all registers, and replace it by a temporary example: b*b – 4*a*c 2 registers ** * bb ac 4 T1 1
** * bb ac 4 T1 1 Register spilling Load_Mem b, R1 Load_Mem b, R2 Mul_Reg R2, R1 Store_Mem R1, T1 Load_Mem a, R1 Load_Mem c, R2 Mul_Reg R2, R1 Load_Const 4, R2 Mul_Reg R1, R2 Load_Mem T1, R1 Sub_Reg R2, R1
47 Another example ** * bb ac 4 T1 1
48 Algorithm
49 Machines with register-memory operations An instruction: Add_Mem X, R1 Adding the contents of memory location X to R1
50 Register-weighted tree for a memory-register machine
51 Code generation for basic blocks Finding the optimal rewriting of the AST with available instruction templates is NP-complete. Three techniques Basic blocks Bottom-up tree rewriting Register allocation by graph coloring
52 Basic block Improve quality of code emitted by simple code generation Consider multiple AST nodes at a time Generate code for maximal basic blocks that cannot be extended by including adjacent AST nodes basic block: a part of the control graph that contains no splits (jumps) or combines (labels)
53 Example of basic block A basic block consists of expressions and assignments Fixed sequence (;) limits code generation An AST is too restrictive
54 From AST to dependency graph AST for the simple basic block
55 Simple algorithm to convert AST to a data dependency graph Replace arcs by downwards arrows (upwards for destination under assignment) Insert data dependencies from use of V to preceding assignment to V Insert data dependencies from the assignment to a variable V to the previous assignment to V Add roots to the graph (output variables) Remove ;-nodes and connecting arrows
56 Simple data dependency graph
57 Cleaned-up graph
58 Exercise { int n; n = a+1; x = (b+c) * n; n = n+1; y = (b+c) * n; } Convert the above codes to a data dependency graph
59 Answer + b c a + 1 * x ++ 1 * y
60 Common subexpression elimination Simple example x=a*a+2*a*b + b*b; y=a*a-2*a*b + b*b; Three common subxpressions double quads = a*a + b*b; double cross_prod = 2*a*b; x = quads + cross_prod; y = quads – cross_prod;
61 Common subexpression Equal subexpression in a basic block are not necessarily common subexpressions x=a*a+2*a*b + b*b; a=b=0; y=a*a-2*a*b + b*b;
62 Common subexpression example (1/3)
63 Common subexpression example (2/3)
64 Common subexpression example (3/3)
65 From dependency graph to code Rewrite nodes with machine instruction templates, and linearize the result Instruction ordering: ladder sequences Register allocation:graph coloring
66 Linearization of the data dependency graph Example: (a+b)*c – d Definition of a ladder sequence Each root node is a ladder sequence A ladder sequence S ending in operator node N can be extended with the left operand of N If operator N is commutative then S may also extended with the right operand of N Load_Mem a, R1 Add_Mem b, R1 Mul_Mem, c, R1 Sub_Mem d, R1
67 Code generated for a given ladder sequence load_Mem b, R1 Add_Reg I1, R1 Add_Mem c, R1 Store_Reg R1, x
68 Heuristic ordering algorithm To delay the issues of register allocation, use pseudo- registers during the linearization Select ladder sequence S without more than one incoming dependencies Introduce temporary (pseudo-) registers for non-leaf operands, which become additional roots Generate code for S, using R1 as the ladder register Remove S from the graph Repeat step 1 through 4 until the entire data dependency graph has been consumed and rewritten to code
69 Example of linearization X1
70 The code for y, *, + Load_Reg X1, R1 Add_Const 1, R1 Multi_Mem d, R1 Store_Reg R1, y
71 Remove the ladder sequence y, *, +
72 The code for x, +, +, * Load_Reg X1, R1 Mult_Reg X1, R1 Add_Mem b, R1 Add_Mem c, R1 Store_Reg R1, x
73 The Last step Load_Mem a, R1 Add_Const 1, R1 Load_Reg R1, X1
74 The results of code generation
75 Exercise Generate code for the following dependency graph * 2 * + + x - + y * a * b
76 Answers * 2 * + + x - + y * a * b Load_Reg R2, R1 Add_Reg R3, R1 Add_Reg, R4, R1 Store_Mem R1, x 1) ladder: x, +, + Load_Reg R2, R1 Sub_Reg R3, R1 Add_Reg, R4, R1 Store_Mem R1, y 2) ladder: y, +, - R2 R3 R4 Load_Const 2, R1 Mul_Reg Ra, R1 Mul_Reg, Rb, R1 Load_Reg R1, R3 3) ladder: R3, *, * Load_Reg Ra, R1 Mul_Reg Ra, R1 Load_Reg R1, R2 4) ladder: R2, * Load_Reg Rb, R1 Mul_Reg Rb, R1 Load_Reg R1, R4 5) ladder: R4, *
77 Register allocation for the linearized code Map the pseudo-registers to memory locations or real registers gcc compiler
78 Code optimization in the presence of pointers Pointers cause two different problems for the dependency graph a=x * y; *p = 3; b = x * y; a=*p * y; b = 3; c = *p * q; x * y is not a common subexpression if p happens to point to x or y *p * q is not a common subexpression if p happens to point to b
79 Example (1/4) Assignment under a pointer
80 Example (2/4) Data dependency graph with an assignment under a pointer
81 Example (3/4) Cleaned-up graph
82 Example (4/4) Target code *x:=R1
83 BURS code generation In practice, machines often have a great variety of instructions, simple ones and complicated ones, and better code can be generated if all available instructions are utilized. Machines often have several hundred different machine instructions, often each with ten or more addressing modes, and it would be very advantages if code generators for such machines could be derived from a concise machine description rather than written by hand.
84 BURS code generation Simple instruction patterns (1/2)
85 BURS code generation Simple instruction patterns (2/2)
86 Example: Input tree
87 Na ï ve rewrite Its cost is 17 units = 17
88 Code resulting
89 Top-down largest-fit rewrite
90 Discussions How do we find all possible rewrites, and how do we represent them? It will be clear that we do not fancy listing them all!! How do we find the best/cheapest rewrite among all possibilities, preferably in time linear in the size of the expression to be translated.
91 Bottom-up pattern matching The dotted trees
92 Outline code for bottom-up pattern matching
93 Label set resulting
94 Instruction selection by dynamic programming Bottom-up pattern matching with costs #5->reg #6->reg #7.1 #8.1 Instructions selection
95 Cost evaluation Lower * (1+3+4) Higher * (1+7+4) (1+3+5) Top + (?) Exercise
96 Code generation by bottom-up matching
97 Code generation by bottom-up matching, using commutativity
98 Pattern matching and instruction selection combined Two basic operands State S1: -> State S2: ->
99 States of the BURS
100 Creating the cost-conscious next- state table The triplet { ‘ + ’, S1, S1}=S3 S3: (1+1+1) { ‘ + ’, S1, S2} = S5 S5: Exercise: { ‘ + ’, S1, S5} Exercise: { ‘ * ’, S1, S2} (4) (0) (0)
101 Cost conscious next table
102 Code generation using cost- conscious next-state table
103 Register allocation by graph coloring Procedure-wide register allocation Only live variables require register storage Two variables(values) interfere when their live ranges overlap dataflow analysis: a variable is live at node N if the value it holds is used on some path further down the control-flow graph; otherwise it is dead
104 A program segment for live analysis
105 Live range of the variables
106 Graph coloring NP complete problem Heuristic: color easy nodes last Find node N with lowest degree Remove N from the graph Color the simplified graph Set color of N to the first color that is not used by any of N ’ s neighbors
107 Coloring process 3 registers
108 Preprocessing the intermediate code Preprocessing of expressions char lower_case_from_capital(char ch) { return ch + ( ‘ a ’ – ‘ A ’ ); } Constant expression evaluation char lower_case_from_capital(char ch) { return ch + 32; }
109 Arithmetic simplification Transformations that replace an operation by a simpler one are called strength reductions. Operations that can be removed completely are called null sequences.
110 Some transformations for arithmetic simplification
111 Preprocessing of if-statements and goto statements When the condition in an if-then-else statement turns out to be constant, we can delete the code of the branch that will never be executed. This process is called dead code elimination. If a goto or return statement is followed by code that has no incoming data flow, that code is dead and can be eliminated.
112 Stack representations
113 Stack representations (details) condition IF THEN ELSE > y0 y x 5 y x 5 5 y x y x 5 T y x 5 y x 5 y x 5 x = 7; y x 5 7 y x 5 dead code FI merge
114 Preprocessing of routines In-lining method
115 In-lining result Advanced examples: {int n=3; printf(“square=%d\n”, n*n);} => {int n=3; printf(“square=%d\n”, 3*3);} => {int n=3; printf(“square=%d\n”, 9);} Load_par “square=%d\n” Load_par 9 Call printf
116 Cloning Example double poewr_series(int n, double a[], double x) { int p; for (p=0; p<n; p++) result += a[p] * (x**p); return result } Is called with x set to 1.0 double poewr_series(int n, double a[]) { int p; for (p=0; p<n; p++) result += a[p] * (1.0**p); return result } double poewr_series(int n, double a[]) { int p; for (p=0; p<n; p++) result += a[p] ; return result }
117 Postprocessing the target code Stupid instruction sequences Load_Reg R1, R2 Load_Reg R2, R1 or Store_Reg R1, n Load_Mem n, R1
118 Creating replacement patterns Example Load_Reg Ra, Rb; Load_Reg Rc, Rd | Ra=Rd, Rb=Rc => Load_Reg Ra, Rb Load_const 1, Ra; Add_Reg Rb, Rc | Ra=Rb, is_last_use(Rb) => Increment Rc
119 Locating and replacing instructions Multiple pattern matching Using FSA Dotted items
120 Homework Study sections Machine code generation 4.3 Assemblers, linkers and loaders