7. Code Generation Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010.

Slides:



Advertisements
Similar presentations
8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.
Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Intermediate Code Generation
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Register Allocation Mooly Sagiv Schrierber Wed 10:00-12:00 html://
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
Chapter 8 ICS 412. Code Generation Final phase of a compiler construction. It generates executable code for a target machine. A compiler may instead generate.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
SSA.
Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.
1 Compiler Construction Intermediate Code Generation.
Program Representations. Representing programs Goals.
From AST to Code Generation Professor Yihjia Tsai Tamkang University.
Compiler construction in4020 – lecture 8 Koen Langendoen Delft University of Technology The Netherlands.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Code Generation Mooly Sagiv html:// Chapter 4.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Introduction to Code Generation Mooly Sagiv html:// Chapter 4.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
CS 536 Spring Code generation I Lecture 20.
Code Generation Professor Yihjia Tsai Tamkang University.
Code Generation Simple Register Allocation Mooly Sagiv html:// Chapter
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Code Generation Mooly Sagiv html:// Chapter 4.
Compiler Construction Recap Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Intermediate Code. Local Optimizations
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
Improving Code Generation Honors Compilers April 16 th 2002.
Introduction to Code Generation Mooly Sagiv html:// Chapter 4.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2008/2009.
Code Generation Mooly Sagiv html:// Chapter 4.
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
Abstract Interpretation (Cousot, Cousot 1977) also known as Data-Flow Analysis.
4. Processing the intermediate code From: Chapter 4, Modern Compiler Design, by Dick Grunt et al.
1 Stacks Chapter 4 2 Introduction Consider a program to model a switching yard –Has main line and siding –Cars may be shunted, removed at any time.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
Instruction Selection II CS 671 February 26, 2008.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 10, 10/30/2003 Prof. Roy Levow.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Chapter 7 Syntax-Directed Compilation (AST & Target Code) 1.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Introduction to Code Generation and Intermediate Representations
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
4. Bottom-up Parsing Chih-Hung Wang
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
7. Symbol Table Chih-Hung Wang Compilers References 1. C. N. Fischer and R. J. LeBlanc. Crafting a Compiler with C. Pearson Education Inc., D.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
More Code Generation and Optimization Pat Morin COMP 3002.
Compilation (Semester A, 2013/14)
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Mooly Sagiv html://
CS2100 Computer Organisation
8. Symbol Table Chih-Hung Wang
The compilation process
Optimization Code Optimization ©SoftMoore Consulting.
Stacks Chapter 4.
Code Generation.
Unit IV Code Generation
Lecture 16: Register Allocation
5. Bottom-Up Parsing Chih-Hung Wang
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Compiler Construction
Code Generation Part II
Presentation transcript:

7. Code Generation Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, (2 nd Ed. 2006) 1

2 Overview

3 Interpretation An interpreter is a program that consider the nodes of the AST in the correct order and performs the actions prescribed for those nodes by the semantics of the language. Two varieties Recursive Iterative

4 Interpretation Recursive interpretation operates directly on the AST [attribute grammar] simple to write thorough error checks very slow: 1000x speed of compiled code Iterative interpretation operates on intermediate code good error checking slow: 100x speed of compiled code

5 Recursive Interpretation

6 Self-identifying data must handle user-defined data types value = pointer to type descriptor + array of subvalues example: complex number re: 3.0 im: 4.0

7 Complex number representation

8 Iterative interpretation Operates on threaded AST Active node pointer Flat loop over a case statement IF condition THENELSE FI

9 Sketch of the main loop

10 Example for demo compiler

11 Code Generation Compilation produces object code from the intermediate code tree through a process called code generation Tree rewriting Replace nodes and subtrees of the AST by target code segments Produce a linear sequence of instructions from the rewritten AST

12 Example of code generation a:=(b[4*c+d]*2)+9;

13 Machine instructions Load_Addr M[Ri], C, Rd Loads the address of the Ri-th element of the array at M into Rd, where the size of the elements of M is C bytes Load_Byte (M+Ro)[Ri], C, Rd Loads the byte contents of the Ri-th element of the array at M plus offset Ro into Rd, where the other parameters have the same meanings as above

14 Two sample instructions with their ASTs

15 Code generation Main issues: Code selection – which template? Register allocation – too few! Instruction ordering Optimal code generation is NP-complete Consider small parts of the AST Simplify target machine Use conventions

16 Object code sequence Load_Byte (b+Rd)[Rc], 4, Rt Load_Addr 9[Rt], 2, Ra

17 Trivial code generation

18 Code for (7*(1+5))

19 Partial evaluation

20 New Code

21 Simple code generation Consider one AST node at a time Two simplistic target machines Pure register machine Pure stack machine BP SP stack frame vars

22 Pure stack machine Instructions

23 Example of p:=p+5 Push_Local #p Push_Const 5 Add_Top2 Store_Local #p

24 Pure register machine Instructions

25 Example of p:=p+5 Load_Mem p, R1 Load_Const 5, R2 Add_Reg R2, R1 Store_Reg R1, p

26 Simple code generation for a stack machine The AST for b*b – 4 *(a*c)

27 The ASTs for the stack machine instructions

28 The AST for b*b - 4*(a*c) rewritten

29 Simple code generation for a stack machine (demo) example: b*b – 4*a*c threaded AST - ** * bb ac 4

30 Simple code generation for a stack machine (demo) example: b*b – 4*a*c threaded AST - ** * bb ac 4 Sub_Top2 Mul_Top2 Push_Local #b Push_Local #aPush_Local #c Push_Const 4

31 Simple code generation for a stack machine (demo) example: b*b – 4*a*c rewritten AST - ** * bb ac 4 Sub_Top2 Mul_Top2 Push_Local #b Push_Local #aPush_Local #c Push_Const 4 Push_Local #b Mul_Top2 Push_Const 4 Push_Local #a Push_Local #c Mul_Top2 Sub_Top2

32 Depth-first code generation

33 Stack configurations

34 Simple code generation for a register machine The ASTs for the register machine instructions

35 Code generation with register allocation

36 Code generation with register numbering

37 Register machine code for b*b - 4*(a*c)

38 Register contents

39 Weighted register allocation It is advantageous to generate the code for the child that requires the most registers first Weight: The number of registers required by a node

40 Register weight of a node

41 AST for b*b-4*(a*c) with register weights

42 Weighted register machine code

43 Example Parameter number N Stored weight Registers occupied when starting parameter N Maximum per parameter Overall maximum 4

44 Example: Tree representation

45 Register spilling Too few registers? Spill registers in memory, to be retrieved later Heuristic: select subtree that uses all registers, and replace it by a temporary example: b*b – 4*a*c 2 registers ** * bb ac 4 T1 1

** * bb ac 4 T1 1 Register spilling Load_Mem b, R1 Load_Mem b, R2 Mul_Reg R2, R1 Store_Mem R1, T1 Load_Mem a, R1 Load_Mem c, R2 Mul_Reg R2, R1 Load_Const 4, R2 Mul_Reg R1, R2 Load_Mem T1, R1 Sub_Reg R2, R1

47 Another example ** * bb ac 4 T1 1

48 Algorithm

49 Machines with register-memory operations An instruction: Add_Mem X, R1 Adding the contents of memory location X to R1

50 Register-weighted tree for a memory-register machine

51 Code generation for basic blocks Finding the optimal rewriting of the AST with available instruction templates is NP-complete. Three techniques Basic blocks Bottom-up tree rewriting Register allocation by graph coloring

52 Basic block Improve quality of code emitted by simple code generation Consider multiple AST nodes at a time Generate code for maximal basic blocks that cannot be extended by including adjacent AST nodes basic block: a part of the control graph that contains no splits (jumps) or combines (labels)

53 Example of basic block A basic block consists of expressions and assignments Fixed sequence (;) limits code generation An AST is too restrictive

54 From AST to dependency graph AST for the simple basic block

55 Simple algorithm to convert AST to a data dependency graph Replace arcs by downwards arrows (upwards for destination under assignment) Insert data dependencies from use of V to preceding assignment to V Insert data dependencies from the assignment to a variable V to the previous assignment to V Add roots to the graph (output variables) Remove ;-nodes and connecting arrows

56 Simple data dependency graph

57 Cleaned-up graph

58 Exercise { int n; n = a+1; x = (b+c) * n; n = n+1; y = (b+c) * n; } Convert the above codes to a data dependency graph

59 Answer + b c a + 1 * x ++ 1 * y

60 Common subexpression elimination Simple example x=a*a+2*a*b + b*b; y=a*a-2*a*b + b*b; Three common subxpressions double quads = a*a + b*b; double cross_prod = 2*a*b; x = quads + cross_prod; y = quads – cross_prod;

61 Common subexpression Equal subexpression in a basic block are not necessarily common subexpressions x=a*a+2*a*b + b*b; a=b=0; y=a*a-2*a*b + b*b;

62 Common subexpression example (1/3)

63 Common subexpression example (2/3)

64 Common subexpression example (3/3)

65 From dependency graph to code Rewrite nodes with machine instruction templates, and linearize the result Instruction ordering: ladder sequences Register allocation:graph coloring

66 Linearization of the data dependency graph Example: (a+b)*c – d Definition of a ladder sequence Each root node is a ladder sequence A ladder sequence S ending in operator node N can be extended with the left operand of N If operator N is commutative then S may also extended with the right operand of N Load_Mem a, R1 Add_Mem b, R1 Mul_Mem, c, R1 Sub_Mem d, R1

67 Code generated for a given ladder sequence load_Mem b, R1 Add_Reg I1, R1 Add_Mem c, R1 Store_Reg R1, x

68 Heuristic ordering algorithm To delay the issues of register allocation, use pseudo- registers during the linearization Select ladder sequence S without more than one incoming dependencies Introduce temporary (pseudo-) registers for non-leaf operands, which become additional roots Generate code for S, using R1 as the ladder register Remove S from the graph Repeat step 1 through 4 until the entire data dependency graph has been consumed and rewritten to code

69 Example of linearization X1

70 The code for y, *, + Load_Reg X1, R1 Add_Const 1, R1 Multi_Mem d, R1 Store_Reg R1, y

71 Remove the ladder sequence y, *, +

72 The code for x, +, +, * Load_Reg X1, R1 Mult_Reg X1, R1 Add_Mem b, R1 Add_Mem c, R1 Store_Reg R1, x

73 The Last step Load_Mem a, R1 Add_Const 1, R1 Load_Reg R1, X1

74 The results of code generation

75 Exercise Generate code for the following dependency graph * 2 * + + x - + y * a * b

76 Answers * 2 * + + x - + y * a * b Load_Reg R2, R1 Add_Reg R3, R1 Add_Reg, R4, R1 Store_Mem R1, x 1) ladder: x, +, + Load_Reg R2, R1 Sub_Reg R3, R1 Add_Reg, R4, R1 Store_Mem R1, y 2) ladder: y, +, - R2 R3 R4 Load_Const 2, R1 Mul_Reg Ra, R1 Mul_Reg, Rb, R1 Load_Reg R1, R3 3) ladder: R3, *, * Load_Reg Ra, R1 Mul_Reg Ra, R1 Load_Reg R1, R2 4) ladder: R2, * Load_Reg Rb, R1 Mul_Reg Rb, R1 Load_Reg R1, R4 5) ladder: R4, *

77 Register allocation for the linearized code Map the pseudo-registers to memory locations or real registers gcc compiler

78 Code optimization in the presence of pointers Pointers cause two different problems for the dependency graph a=x * y; *p = 3; b = x * y; a=*p * y; b = 3; c = *p * q; x * y is not a common subexpression if p happens to point to x or y *p * q is not a common subexpression if p happens to point to b

79 Example (1/4) Assignment under a pointer

80 Example (2/4) Data dependency graph with an assignment under a pointer

81 Example (3/4) Cleaned-up graph

82 Example (4/4) Target code *x:=R1

83 BURS code generation In practice, machines often have a great variety of instructions, simple ones and complicated ones, and better code can be generated if all available instructions are utilized. Machines often have several hundred different machine instructions, often each with ten or more addressing modes, and it would be very advantages if code generators for such machines could be derived from a concise machine description rather than written by hand.

84 BURS code generation Simple instruction patterns (1/2)

85 BURS code generation Simple instruction patterns (2/2)

86 Example: Input tree

87 Na ï ve rewrite Its cost is 17 units = 17

88 Code resulting

89 Top-down largest-fit rewrite

90 Discussions How do we find all possible rewrites, and how do we represent them? It will be clear that we do not fancy listing them all!! How do we find the best/cheapest rewrite among all possibilities, preferably in time linear in the size of the expression to be translated.

91 Bottom-up pattern matching The dotted trees

92 Outline code for bottom-up pattern matching

93 Label set resulting

94 Instruction selection by dynamic programming Bottom-up pattern matching with costs #5->reg #6->reg #7.1 #8.1 Instructions selection

95 Cost evaluation Lower * (1+3+4) Higher * (1+7+4) (1+3+5) Top + (?) Exercise

96 Code generation by bottom-up matching

97 Code generation by bottom-up matching, using commutativity

98 Pattern matching and instruction selection combined Two basic operands State S1: -> State S2: ->

99 States of the BURS

100 Creating the cost-conscious next- state table The triplet { ‘ + ’, S1, S1}=S3 S3: (1+1+1) { ‘ + ’, S1, S2} = S5 S5: Exercise: { ‘ + ’, S1, S5} Exercise: { ‘ * ’, S1, S2} (4) (0) (0)

101 Cost conscious next table

102 Code generation using cost- conscious next-state table

103 Register allocation by graph coloring Procedure-wide register allocation Only live variables require register storage Two variables(values) interfere when their live ranges overlap dataflow analysis: a variable is live at node N if the value it holds is used on some path further down the control-flow graph; otherwise it is dead

104 A program segment for live analysis

105 Live range of the variables

106 Graph coloring NP complete problem Heuristic: color easy nodes last Find node N with lowest degree Remove N from the graph Color the simplified graph Set color of N to the first color that is not used by any of N ’ s neighbors

107 Coloring process 3 registers

108 Preprocessing the intermediate code Preprocessing of expressions char lower_case_from_capital(char ch) { return ch + ( ‘ a ’ – ‘ A ’ ); } Constant expression evaluation char lower_case_from_capital(char ch) { return ch + 32; }

109 Arithmetic simplification Transformations that replace an operation by a simpler one are called strength reductions. Operations that can be removed completely are called null sequences.

110 Some transformations for arithmetic simplification

111 Preprocessing of if-statements and goto statements When the condition in an if-then-else statement turns out to be constant, we can delete the code of the branch that will never be executed. This process is called dead code elimination. If a goto or return statement is followed by code that has no incoming data flow, that code is dead and can be eliminated.

112 Stack representations

113 Stack representations (details) condition IF THEN ELSE > y0 y x 5 y x 5 5 y x y x 5 T y x 5 y x 5 y x 5 x = 7; y x 5 7 y x 5 dead code FI merge

114 Preprocessing of routines In-lining method

115 In-lining result Advanced examples: {int n=3; printf(“square=%d\n”, n*n);} => {int n=3; printf(“square=%d\n”, 3*3);} => {int n=3; printf(“square=%d\n”, 9);} Load_par “square=%d\n” Load_par 9 Call printf

116 Cloning Example double poewr_series(int n, double a[], double x) { int p; for (p=0; p<n; p++) result += a[p] * (x**p); return result } Is called with x set to 1.0 double poewr_series(int n, double a[]) { int p; for (p=0; p<n; p++) result += a[p] * (1.0**p); return result } double poewr_series(int n, double a[]) { int p; for (p=0; p<n; p++) result += a[p] ; return result }

117 Postprocessing the target code Stupid instruction sequences Load_Reg R1, R2 Load_Reg R2, R1 or Store_Reg R1, n Load_Mem n, R1

118 Creating replacement patterns Example Load_Reg Ra, Rb; Load_Reg Rc, Rd | Ra=Rd, Rb=Rc => Load_Reg Ra, Rb Load_const 1, Ra; Add_Reg Rb, Rc | Ra=Rb, is_last_use(Rb) => Increment Rc

119 Locating and replacing instructions Multiple pattern matching Using FSA Dotted items

120 Homework Study sections Machine code generation 4.3 Assemblers, linkers and loaders