Presentation is loading. Please wait.

Presentation is loading. Please wait.

7. Code Generation Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010.

Similar presentations


Presentation on theme: "7. Code Generation Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010."— Presentation transcript:

1 7. Code Generation Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000. 3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2 nd Ed. 2006) 1

2 2 Overview

3 3 Interpretation An interpreter is a program that consider the nodes of the AST in the correct order and performs the actions prescribed for those nodes by the semantics of the language. Two varieties Recursive Iterative

4 4 Interpretation Recursive interpretation operates directly on the AST [attribute grammar] simple to write thorough error checks very slow: 1000x speed of compiled code Iterative interpretation operates on intermediate code good error checking slow: 100x speed of compiled code

5 5 Recursive Interpretation

6 6 Self-identifying data must handle user-defined data types value = pointer to type descriptor + array of subvalues example: complex number re: 3.0 im: 4.0

7 7 Complex number representation

8 8 Iterative interpretation Operates on threaded AST Active node pointer Flat loop over a case statement IF condition THENELSE FI

9 9 Sketch of the main loop

10 10 Example for demo compiler

11 11 Code Generation Compilation produces object code from the intermediate code tree through a process called code generation Tree rewriting Replace nodes and subtrees of the AST by target code segments Produce a linear sequence of instructions from the rewritten AST

12 12 Example of code generation a:=(b[4*c+d]*2)+9;

13 13 Machine instructions Load_Addr M[Ri], C, Rd Loads the address of the Ri-th element of the array at M into Rd, where the size of the elements of M is C bytes Load_Byte (M+Ro)[Ri], C, Rd Loads the byte contents of the Ri-th element of the array at M plus offset Ro into Rd, where the other parameters have the same meanings as above

14 14 Two sample instructions with their ASTs

15 15 Code generation Main issues: Code selection – which template? Register allocation – too few! Instruction ordering Optimal code generation is NP-complete Consider small parts of the AST Simplify target machine Use conventions

16 16 Object code sequence Load_Byte (b+Rd)[Rc], 4, Rt Load_Addr 9[Rt], 2, Ra

17 17 Trivial code generation

18 18 Code for (7*(1+5))

19 19 Partial evaluation

20 20 New Code

21 21 Simple code generation Consider one AST node at a time Two simplistic target machines Pure register machine Pure stack machine BP SP stack frame vars

22 22 Pure stack machine Instructions

23 23 Example of p:=p+5 Push_Local #p Push_Const 5 Add_Top2 Store_Local #p

24 24 Pure register machine Instructions

25 25 Example of p:=p+5 Load_Mem p, R1 Load_Const 5, R2 Add_Reg R2, R1 Store_Reg R1, p

26 26 Simple code generation for a stack machine The AST for b*b – 4 *(a*c)

27 27 The ASTs for the stack machine instructions

28 28 The AST for b*b - 4*(a*c) rewritten

29 29 Simple code generation for a stack machine (demo) example: b*b – 4*a*c threaded AST - ** * bb ac 4

30 30 Simple code generation for a stack machine (demo) example: b*b – 4*a*c threaded AST - ** * bb ac 4 Sub_Top2 Mul_Top2 Push_Local #b Push_Local #aPush_Local #c Push_Const 4

31 31 Simple code generation for a stack machine (demo) example: b*b – 4*a*c rewritten AST - ** * bb ac 4 Sub_Top2 Mul_Top2 Push_Local #b Push_Local #aPush_Local #c Push_Const 4 Push_Local #b Mul_Top2 Push_Const 4 Push_Local #a Push_Local #c Mul_Top2 Sub_Top2

32 32 Depth-first code generation

33 33 Stack configurations

34 34 Simple code generation for a register machine The ASTs for the register machine instructions

35 35 Code generation with register allocation

36 36 Code generation with register numbering

37 37 Register machine code for b*b - 4*(a*c)

38 38 Register contents

39 39 Weighted register allocation It is advantageous to generate the code for the child that requires the most registers first Weight: The number of registers required by a node

40 40 Register weight of a node

41 41 AST for b*b-4*(a*c) with register weights

42 42 Weighted register machine code

43 43 Example Parameter number N 2 3 1 Stored weight 4 2 1 Registers occupied when 0 1 2 starting parameter N Maximum per parameter 4 3 3 Overall maximum 4

44 44 Example: Tree representation

45 45 Register spilling Too few registers? Spill registers in memory, to be retrieved later Heuristic: select subtree that uses all registers, and replace it by a temporary example: b*b – 4*a*c 2 registers 1 2 1 1 1 1 3 2 2 2 - ** * bb ac 4 T1 1

46 46 1 2 1 1 1 1 3 2 2 2 - ** * bb ac 4 T1 1 Register spilling Load_Mem b, R1 Load_Mem b, R2 Mul_Reg R2, R1 Store_Mem R1, T1 Load_Mem a, R1 Load_Mem c, R2 Mul_Reg R2, R1 Load_Const 4, R2 Mul_Reg R1, R2 Load_Mem T1, R1 Sub_Reg R2, R1

47 47 Another example 1 2 1 1 3 2 2 2 - ** * bb ac 4 T1 1

48 48 Algorithm

49 49 Machines with register-memory operations An instruction: Add_Mem X, R1 Adding the contents of memory location X to R1

50 50 Register-weighted tree for a memory-register machine

51 51 Code generation for basic blocks Finding the optimal rewriting of the AST with available instruction templates is NP-complete. Three techniques Basic blocks Bottom-up tree rewriting Register allocation by graph coloring

52 52 Basic block Improve quality of code emitted by simple code generation Consider multiple AST nodes at a time Generate code for maximal basic blocks that cannot be extended by including adjacent AST nodes basic block: a part of the control graph that contains no splits (jumps) or combines (labels)

53 53 Example of basic block A basic block consists of expressions and assignments Fixed sequence (;) limits code generation An AST is too restrictive

54 54 From AST to dependency graph AST for the simple basic block

55 55 Simple algorithm to convert AST to a data dependency graph Replace arcs by downwards arrows (upwards for destination under assignment) Insert data dependencies from use of V to preceding assignment to V Insert data dependencies from the assignment to a variable V to the previous assignment to V Add roots to the graph (output variables) Remove ;-nodes and connecting arrows

56 56 Simple data dependency graph

57 57 Cleaned-up graph

58 58 Exercise { int n; n = a+1; x = (b+c) * n; n = n+1; y = (b+c) * n; } Convert the above codes to a data dependency graph

59 59 Answer + b c a + 1 * x ++ 1 * y

60 60 Common subexpression elimination Simple example x=a*a+2*a*b + b*b; y=a*a-2*a*b + b*b; Three common subxpressions double quads = a*a + b*b; double cross_prod = 2*a*b; x = quads + cross_prod; y = quads – cross_prod;

61 61 Common subexpression Equal subexpression in a basic block are not necessarily common subexpressions x=a*a+2*a*b + b*b; a=b=0; y=a*a-2*a*b + b*b;

62 62 Common subexpression example (1/3)

63 63 Common subexpression example (2/3)

64 64 Common subexpression example (3/3)

65 65 From dependency graph to code Rewrite nodes with machine instruction templates, and linearize the result Instruction ordering: ladder sequences Register allocation:graph coloring

66 66 Linearization of the data dependency graph Example: (a+b)*c – d Definition of a ladder sequence Each root node is a ladder sequence A ladder sequence S ending in operator node N can be extended with the left operand of N If operator N is commutative then S may also extended with the right operand of N Load_Mem a, R1 Add_Mem b, R1 Mul_Mem, c, R1 Sub_Mem d, R1

67 67 Code generated for a given ladder sequence load_Mem b, R1 Add_Reg I1, R1 Add_Mem c, R1 Store_Reg R1, x

68 68 Heuristic ordering algorithm To delay the issues of register allocation, use pseudo- registers during the linearization Select ladder sequence S without more than one incoming dependencies Introduce temporary (pseudo-) registers for non-leaf operands, which become additional roots Generate code for S, using R1 as the ladder register Remove S from the graph Repeat step 1 through 4 until the entire data dependency graph has been consumed and rewritten to code

69 69 Example of linearization X1

70 70 The code for y, *, + Load_Reg X1, R1 Add_Const 1, R1 Multi_Mem d, R1 Store_Reg R1, y

71 71 Remove the ladder sequence y, *, +

72 72 The code for x, +, +, * Load_Reg X1, R1 Mult_Reg X1, R1 Add_Mem b, R1 Add_Mem c, R1 Store_Reg R1, x

73 73 The Last step Load_Mem a, R1 Add_Const 1, R1 Load_Reg R1, X1

74 74 The results of code generation

75 75 Exercise Generate code for the following dependency graph * 2 * + + x - + y * a * b

76 76 Answers * 2 * + + x - + y * a * b Load_Reg R2, R1 Add_Reg R3, R1 Add_Reg, R4, R1 Store_Mem R1, x 1) ladder: x, +, + Load_Reg R2, R1 Sub_Reg R3, R1 Add_Reg, R4, R1 Store_Mem R1, y 2) ladder: y, +, - R2 R3 R4 Load_Const 2, R1 Mul_Reg Ra, R1 Mul_Reg, Rb, R1 Load_Reg R1, R3 3) ladder: R3, *, * Load_Reg Ra, R1 Mul_Reg Ra, R1 Load_Reg R1, R2 4) ladder: R2, * Load_Reg Rb, R1 Mul_Reg Rb, R1 Load_Reg R1, R4 5) ladder: R4, *

77 77 Register allocation for the linearized code Map the pseudo-registers to memory locations or real registers gcc compiler

78 78 Code optimization in the presence of pointers Pointers cause two different problems for the dependency graph a=x * y; *p = 3; b = x * y; a=*p * y; b = 3; c = *p * q; x * y is not a common subexpression if p happens to point to x or y *p * q is not a common subexpression if p happens to point to b

79 79 Example (1/4) Assignment under a pointer

80 80 Example (2/4) Data dependency graph with an assignment under a pointer

81 81 Example (3/4) Cleaned-up graph

82 82 Example (4/4) Target code *x:=R1

83 83 BURS code generation In practice, machines often have a great variety of instructions, simple ones and complicated ones, and better code can be generated if all available instructions are utilized. Machines often have several hundred different machine instructions, often each with ten or more addressing modes, and it would be very advantages if code generators for such machines could be derived from a concise machine description rather than written by hand.

84 84 BURS code generation Simple instruction patterns (1/2)

85 85 BURS code generation Simple instruction patterns (2/2)

86 86 Example: Input tree

87 87 Na ï ve rewrite Its cost is 17 units 1 + 3 + 4 + 1 + 4 + 3 + 1 = 17

88 88 Code resulting

89 89 Top-down largest-fit rewrite

90 90 Discussions How do we find all possible rewrites, and how do we represent them? It will be clear that we do not fancy listing them all!! How do we find the best/cheapest rewrite among all possibilities, preferably in time linear in the size of the expression to be translated.

91 91 Bottom-up pattern matching The dotted trees

92 92 Outline code for bottom-up pattern matching

93 93 Label set resulting

94 94 Instruction selection by dynamic programming Bottom-up pattern matching with costs #5->reg #6->reg #7.1 #8.1 Instructions selection

95 95 Cost evaluation Lower * #5->reg@7 #6->reg@8 (1+3+4) Higher * #6->reg@12 (1+7+4) #8->reg@9 (1+3+5) Top + (?) Exercise

96 96 Code generation by bottom-up matching

97 97 Code generation by bottom-up matching, using commutativity

98 98 Pattern matching and instruction selection combined Two basic operands State S1: -> cst@0 #1->reg@1 State S2: -> mem@0 #2->reg@3

99 99 States of the BURS

100 100 Creating the cost-conscious next- state table The triplet { ‘ + ’, S1, S1}=S3 S3: #4->reg@3 (1+1+1) { ‘ + ’, S1, S2} = S5 S5: #3->reg@1+0+3=4 #4->reg@1+3+1=5 Exercise: { ‘ + ’, S1, S5} Exercise: { ‘ * ’, S1, S2} –#5->reg@1+0+6=7 (4) –#6->reg@1+3+4=8 –#7.1@0+3+0=3 (0) –#8.1@0+3+0=3 (0)

101 101 Cost conscious next table

102 102 Code generation using cost- conscious next-state table

103 103 Register allocation by graph coloring Procedure-wide register allocation Only live variables require register storage Two variables(values) interfere when their live ranges overlap dataflow analysis: a variable is live at node N if the value it holds is used on some path further down the control-flow graph; otherwise it is dead

104 104 A program segment for live analysis

105 105 Live range of the variables

106 106 Graph coloring NP complete problem Heuristic: color easy nodes last Find node N with lowest degree Remove N from the graph Color the simplified graph Set color of N to the first color that is not used by any of N ’ s neighbors

107 107 Coloring process 3 registers

108 108 Preprocessing the intermediate code Preprocessing of expressions char lower_case_from_capital(char ch) { return ch + ( ‘ a ’ – ‘ A ’ ); } Constant expression evaluation char lower_case_from_capital(char ch) { return ch + 32; }

109 109 Arithmetic simplification Transformations that replace an operation by a simpler one are called strength reductions. Operations that can be removed completely are called null sequences.

110 110 Some transformations for arithmetic simplification

111 111 Preprocessing of if-statements and goto statements When the condition in an if-then-else statement turns out to be constant, we can delete the code of the branch that will never be executed. This process is called dead code elimination. If a goto or return statement is followed by code that has no incoming data flow, that code is dead and can be eliminated.

112 112 Stack representations

113 113 Stack representations (details) condition IF THEN ELSE > y0 y x 5 y x 5 5 y x 5 5 0 y x 5 T y x 5 y x 5 y x 5 x = 7; y x 5 7 y x 5 dead code FI merge

114 114 Preprocessing of routines In-lining method

115 115 In-lining result Advanced examples: {int n=3; printf(“square=%d\n”, n*n);} => {int n=3; printf(“square=%d\n”, 3*3);} => {int n=3; printf(“square=%d\n”, 9);} Load_par “square=%d\n” Load_par 9 Call printf

116 116 Cloning Example double poewr_series(int n, double a[], double x) { int p; for (p=0; p<n; p++) result += a[p] * (x**p); return result } Is called with x set to 1.0 double poewr_series(int n, double a[]) { int p; for (p=0; p<n; p++) result += a[p] * (1.0**p); return result } double poewr_series(int n, double a[]) { int p; for (p=0; p<n; p++) result += a[p] ; return result }

117 117 Postprocessing the target code Stupid instruction sequences Load_Reg R1, R2 Load_Reg R2, R1 or Store_Reg R1, n Load_Mem n, R1

118 118 Creating replacement patterns Example Load_Reg Ra, Rb; Load_Reg Rc, Rd | Ra=Rd, Rb=Rc => Load_Reg Ra, Rb Load_const 1, Ra; Add_Reg Rb, Rc | Ra=Rb, is_last_use(Rb) => Increment Rc

119 119 Locating and replacing instructions Multiple pattern matching Using FSA Dotted items

120 120 Homework Study sections 4.2.13 Machine code generation 4.3 Assemblers, linkers and loaders


Download ppt "7. Code Generation Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010."

Similar presentations


Ads by Google