Download presentation
Presentation is loading. Please wait.
Published byGerald Gregory Modified over 9 years ago
1
Chapter 14: Building a Runnable Program
2
- 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code Generation 14.4 Address Space Organization
3
- 2 - Where We Are... Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Code Gen Source code (character stream) token stream abstract syntax tree abstract syntax tree + symbol tables, types Intermediate code regular expressions grammars static semantics
4
- 3 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code Generation 14.4 Address Space Organization
5
- 4 - Intermediate Representation (aka IR) v The compilers internal representation »Is language-independent and machine- independent ASTIR Pentium Java bytecode Itanium TI C5x ARM optimize Enables machine independent and machine dependent optis
6
- 5 - What Makes a Good IR? v Captures high-level language constructs »Easy to translate from AST »Supports high-level optimizations v Captures low-level machine features »Easy to translate to assembly »Supports machine-dependent optimizations v Narrow interface: small number of node types (instructions) »Easy to optimize »Easy to retarget
7
- 6 - Multiple IRs v Most compilers use 2 IRs: »High-level IR (HIR): Language independent but closer to the language »Low-level IR (LIR): Machine independent but closer to the machine »A significant part of the compiler is both language and machine independent! ASTHIR Pentium Java bytecode Itanium TI C5x ARM optimize LIR optimize C++ C Fortran
8
- 7 - High-Level IR v HIR is essentially the AST »Must be expressive for all input languages v Preserves high-level language constructs »Structured control flow: if, while, for, switch »Variables, expressions, statements, functions v Allows high-level optimizations based on properties of source language »Function inlining, memory dependence analysis, loop transformations
9
- 8 - Low-Level IR v A set of instructions which emulates an abstract machine (typically RISC) v Has low-level constructs »Unstructured jumps, registers, memory locations v Types of instructions »Arithmetic/logic (a = b OP c), unary operations, data movement (move, load, store), function call/return, branches
10
- 9 - Alternatives for LIR v 3 general alternatives »Three-address code or quadruples a = b OP c Advantage: Makes compiler analysis/opti easier »Tree representation Was popular for CISC architectures Advantage: Easier to generate machine code »Stack machine Like Java bytecode Advantage: Easier to generate from AST
11
- 10 - Three-Address Code v a = b OP c »Originally, because instruction had at most 3 addresses or operands This is not enforced today, ie MAC: a = b * c + d »May have fewer operands v Also called quadruples: (a,b,c,OP) v Example a = (b+c) * (-e) t1 = b + c t2 = -e a = t1 * t2 Compiler-generated temporary variable
12
- 11 - IR Instructions v Assignment instructions »a = b OP C (binary op) arithmetic: ADD, SUB, MUL, DIV, MOD logic: AND, OR, XOR comparisons: EQ, NEQ, LT, GT, LEQ, GEQ »a = OP b (unary op) arithmetic MINUS, logical NEG »a = b : copy instruction »a = [b] : load instruction »[a] = b : store instruction »a = addr b: symbolic address v Flow of control »label L: label instruction »jump L: unconditional jump »cjump a L : conditional jump v Function call »call f(a1,..., an) »a = call f(a1,..., an) v IR describes the instruction set of an abstract machine
13
- 12 - IR Operands v The operands in 3-address code can be: »Program variables »Constants or literals »Temporary variables v Temporary variables = new locations »Used to store intermediate values »Needed because 3-address code not as expressive as high-level languages
14
- 13 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code Generation 14.4 Address Space Organization
15
- 14 - Translating High IR to Low IR v May have nested language constructs »E.g., while nested within an if statement v Need an algorithmic way to translate »Strategy for each high IR construct »High IR construct sequence of low IR instructions v Solution »Start from the high IR (AST like) representation »Define translation for each node in high IR »Recursively translate nodes
16
- 15 - Notation v Use the following notation: »[[e]] = the low IR representation of high IR construct e v [[e]] is a sequence of low IR instructions v If e is an expression (or statement expression), it represents a value »Denoted as: t = [[e]] »Low IR representation of e whose result value is stored in t v For variable v: t = [[v]] is the copy instruction »t = v
17
- 16 - Translating Expressions v Binary operations: t = [[e1 OP e2]] »(arithmetic, logical operations and comparisons) v Unary operations: t = [[OP e]] OP e1e2 t1 = [[e1]] t2 = [[e2]] t1 = t1 OP t2 OP e1 t1 = [[e1]] t = OP t1
18
- 17 - Translating Array Accesses v Array access: t = [[ v[e] ]] »(type of e is array [T] and S = size of T) t1 = addr v t2 = [[e]] t3 = t2 * S t4 = t1 + t3 t = [t4] /* ie load */ array ve
19
- 18 - Translating Structure Accesses v Structure access: t = [[ v.f ]] »(v is of type T, S = offset of f in T) t1 = addr v t2 = t1 + S t = [t2] /* ie load */ struct vf
20
- 19 - Translating Short-Circuit OR v Short-circuit OR: t = [[e1 SC-OR e2]] »e.g., || operator in C/C++ t = [[e1]] cjump t Lend t = [[e2]] Lend: semantics: 1. evaluate e1 2. if e1 is true, then done 3. else evaluate e2 SC-OR e1e2
21
- 20 - Class Problem v Short-circuit AND: t = [[e1 SC-AND e2]] »e.g., && operator in C/C++ Semantics: 1. Evaluate e1 2. if e1 is true, then evaluate e2 3. else done
22
- 21 - Translating Statements v Statement sequence: [[s1; s2;...; sN]] v IR instructions of a statement sequence = concatenation of IR instructions of statements [[ s1 ]] [[ s2 ]]... [[ sN ]] seq s1s2sN...
23
- 22 - Assignment Statements v Variable assignment: [[ v = e ]] v Array assignment: [[ v[e1] = e2 ]] v = [[ e ]] t1 = addr v t2 = [[e1]] t3 = t2 * S t4 = t1 + t3 t5 = [[e2] [t4] = t5 /* ie store */ recall S = sizeof(T) where v is array(T)
24
- 23 - Translating If-Then [-Else] v [[ if (e) then s ]] v [[ if (e) then s1 else s2 ]] t1 = [[ e ]] t2 = not t1 cjump t2 Lelse Lthen: [[ s1 ]] jump Lend Lelse: [[ s2 ]] Lend: t1 = [[ e ]] t2 = not t1 cjump t2 Lend [[ s ]] Lend: How could I do this more efficiently??
25
- 24 - While Statements v [[ while (e) s ]] Lloop: t1 = [[ e ]] t2 = NOT t1 cjump t2 Lend [[ s ]] jump Lloop Lend: or while-do translation do-while translation t1 = [[ e ]] t2 = NOT t1 cjump t2 Lend Lloop: [[ s ]] t3 = [[ e ]] cjump t3 Lloop Lend: Which is better and why?
26
- 25 - Class Problem n = 0; while (n < 10) { n = n+1; } Convert the following code segment to IR
27
- 26 - Switch Statements v [[ switch (e) case v1:s1,..., case vN:sN ]] t = [[ e ]] L1: c = t != v1 cjump c L2 [[ s1 ]] jump Lend /* if there is a break */ L2: c = t != v2 cjump c L3 [[ s2 ]] jump Lend /* if there is a break */... Lend: Can also implement switch as table lookup. Table contains target labels, ie L1, L2, L3. ‘t’ is used to index table. Benefit: k branches reduced to 1. Negative: target of branch hard to figure out in hardware
28
- 27 - Call and Return Statements v [[ call f(e1, e2,..., eN) ]] v [[ return e ]] t1 = [[ e1 ]] t2 = [[ e2 ]]... tN = [[ eN ]] call f(t1, t2,..., tN) t = [[ e ]] return t
29
- 28 - Statement Expressions v So far: statements which do not return values v Easy extensions for statement expressions: »Block statements »If-then-else »Assignment statements v t = [[ s ]] is the sequence of low IR code for statement s, whose result is stored in t
30
- 29 - Statement Expressions v t = [[ if (e) then s1 else s2 ]] v t = [[ s1; s2;.. sN ]] v Result value of a block statement = value of last stmt in the sequence t1 = [[ e ]] cjump t1 Lthen t = [[ s2 ]] jump Lend Lthen: t = [[ s1 ]] Lend: [[ s1 ]] [[ s2 ]]... t = [[ sN ]]
31
- 30 - Assignment Statements v t = [[ v = e ]] v Result value of an assignment statement = value of the assigned expression v = [[ e ]] t = v
32
- 31 - Nested Expressions v Translation recurses on the expression structure v Example: t = [[ (a – b) * (c + d) ]] t1 = a t2 = b t3 = t1 – t2 t4 = c t5 = d t5 = t4 + t5 t = t3 * t5 [[ (a – b) ]] [[ (c + d) ]] [[ (a-b) * (c+d) ]]
33
- 32 - Nested Statements v Same for statements: recursive translation v Example: t = [[ if c then if d then a = b ]] t1 = c t2 = NOT t1 cjump t2 Lend1 t3 = d t4 = NOT t3 cjump t4 Lend2 t3 = b a = t3 Lend2: Lend1: [[ if c... ]] [[ a = b ]] [[ if d... ]]
34
- 33 - Class Problem for (i=0; i<100; i++) { A[i] = 0; } if ((a > 0) && (b > 0)) c = 2; else c = 3; Translate the following to the generic assembly code discussed
35
- 34 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code Generation 14.4 Address Space Organization
36
- 35 - Issues v These translations are straightforward v But, inefficient: »Lots of temporaries »Lots of labels »Lots of instructions v Can we do this more intelligently? »Should we worry about it?
37
- 36 - 2 Classes of Storage in Processor v Registers »Fast access, but only a few of them »Address space not visible to programmer Doesn’t support pointer access! v Memory »Slow access, but large »Supports pointers v Storage class for each variable generally determined when map HIR to LIR
38
- 37 - Storage Class Selection v Standard (simple) approach »Globals/statics – memory »Locals Composite types (structs, arrays, etc.) – memory Scalars u Accessed via ‘&’ operator? – memory u Rest – Virtual register, later we will map virtual registers to true machine registers. Note, as a result, some local scalars may be “spilled to memory” v All memory approach »Put all variables into memory »Register allocation relocates some mem vars to registers
39
- 38 - 4 Distinct Regions of Memory v Code space – Instructions to be executed »Best if read-only v Static (or Global) – Variables that retain their value over the lifetime of the program v Stack – Variables that is only as long as the block within which they are defined (local) v Heap – Variables that are defined by calls to the system storage allocator (malloc, new)
40
- 39 - Memory Organization Code Static Data Stack Heap... Code and static data sizes determined by the compiler Stack and heap sizes vary at run-time Stack grows downward Heap grows upward Some ABI’s have stack/heap switched
41
- 40 - Class Problem Specify whether each variable is stored in register or memory. For memory which area of the memory? int a; void foo(int b, double c) { int d; struct { int e; char f;} g; int h[10]; char i = 5; float j; }
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.