CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat Intermediate Representation: Three Address Code (TAC) CS 404 Ahmed Ezzat
Outline What is Intermediate Language (IR) and why do we need one? IR Classification Levels Linear IR: TAC TAC Examples Summary CS 404 Ahmed Ezzat
1. Intermediate Representation (IR) What is IR? Intermediate code used by compiler, between source and target code Try to keep M/C dependencies out of IR Why? Support multiple front and back end, for example, support a new machine architecture Subdivide and postpone tasks Allow for reusing machine-independent optimizations (most important) CS 404 Ahmed Ezzat
1. Intermediate Code Generation AST: Abstract Syntax Tree
1. Intermediate Code Generation Back end Code Generation IR Trace Formation Instruction Selection Register Allocation Optimizations * * phase ordering problem
1. IR Motivation What we have so far... An abstract syntax tree With all the program information Known to be correct Well-typed Nothing missing No ambiguities What we need... Something “Executable” Closer to actual machine level of abstraction CS 404 Ahmed Ezzat
1. IR Code Abstract machine code – (Intermediate Representation) Allows machine-independent code generation, optimization AST IR PowerPC Alpha x86 optimize CS 404 Ahmed Ezzat
1. Different IR Forms Low-level IR High-level IR Like RISC machine instructions Use registers, literals, simple operations High-level IR Syntax tree Postfix notation Three Address Code (TAC) CS 404 Ahmed Ezzat
1. Popular IR Forms a[i]; Any representation between the AST and ASM: TAC (Triples): low level Expression Tree: high-level a[i]; MEM + a BINOP MUL i CONST W CS 404 Ahmed Ezzat
1. What Makes a Good IR? Easy to translate from AST Easy to translate to assembly Narrow interface: small number of node types (instructions) Easy to optimize Easy to retarget AST (>40 node types) IR (~15 node types) x86 (>200 opcodes) CS 404 Ahmed Ezzat
2. IR Classification Levels Goal: get program closer to machine code without losing information needed to do useful optimizations Need multiple IR stages MIR PowerPC (LIR) Alpha (LIR) x86 (LIR) optimize AST HIR opt CS 404 Ahmed Ezzat
2. IR Classification Levels High-Level IR (HIR) Used early in the process Usually converted to lower form later on Preserves high-level language constructs Structured flow, variables, methods Allows high-level optimizations based on properties of source language (e.g. inlining, reuse of constant variables) Example: AST CS 404 Ahmed Ezzat
2. IR Classification Levels Medium-Level IR (MIR) Try to reflect the range of features in the source language in a language-independent way Intermediate between AST and assembly Unstructured jumps, registers, memory locations Convenient for translation to high-quality machine code Other MIRs: tree IR: easy to generate, easy to do reasonable instruction selection quadruples: a = b OP c (easy to optimize) stack machine based (like Java bytecode) CS 404 Ahmed Ezzat
2. IR Classification Levels Low-Level IR (LIR) Assembly code + extra pseudo instructions Machine dependent Translation to assembly code is trivial Allows optimization of code for low-level considerations: scheduling, memory layout CS 404 Ahmed Ezzat
2. IR Classification Levels Example i := op1 if step < 0 goto L2 L1: if i > op2 goto L3 instructions i := i + step goto L1 L2: if i < op2 goto L3 goto L2 L3: for i := op1 to op2 step op3 instructions endfor High-level Medium-level CS 404 Ahmed Ezzat
3. Linear IR: Why Use TAC? Makes complex expressions simple Makes complex flow-of-control simple Easy to re-arrange Easy to optimize Syntax trees or DAGs can be represented by TAC Close to assembly code, so easy to generate target code CS 404 Ahmed Ezzat
3. Linear IR: Three-Address Code (TAC) Basic idea: Each instruction is of the form X = Y op Z X, Y, Z can be only registers or constants, or compiler generated temporaries, i.e., similar to assembly, op is an operator. Example: The AST expression: x + y * z is translated to: t1 = y * z t2 = x + t1 Each sub-expression has a “home.” CS 404 Ahmed Ezzat
3. Linear IR: Common TAC Statements Assignment: X := Y + Z Assignment: X := 2 + Z Assignment: X := -1 Copy: X := Y Jump: goto L (L is a symbolic label, execute the statement labeled by L next.) CS 404 Ahmed Ezzat
3. Linear IR: Common TAC Statements Conditional jump: if X relop Y then goto L Function or procedure call Param x Param y Call f, 2 Indexed assignment: x := a[10] Indexed assignment: a[10] := x CS 404 Ahmed Ezzat
3. Linear IR: Common TAC Statements Address assignment: x := &y, x gets location of y Pointer assignment: x := *y, x gets the object pointed by y Pointer assignment: *x :=y, the object pointed by x gets the value of y CS 404 Ahmed Ezzat
3. Linear IR: TAC Design Tradeoffs IRs need to be rich enough to implement the source language Smaller set of operators Easier to implement Long sequence of statements Larger set of operators More difficult to implement Short sequence of statements CS 404 Ahmed Ezzat
3. Linear IR: How to Generate TAC Use syntax directed translations Can be folded into parsing if desired For a non-terminal E, define attributes E.place, the name that will hold the value of E E.code, the TAC evaluating E newtemp, a new temporary variable CS 404 Ahmed Ezzat
3. Linear IR: Storing TAC Quadruple entries TAC Store all instructions in a quadruple table Every instruction has four fields: op, arg1, arg2, result Label of instructions index of instruction in table Quadruple entries TAC t1 := - c t2 := b * t1 t3 := -c t4 := b * t3 t5 := t2 + t4 a := t5 op arg1 arg2 result (0) Uminus c t1 (1) Mult b t2 (2) t3 (3) t4 (4) Plus t5 (5) Assign a CS 404 Ahmed Ezzat
3. Linear IR: Implementation of TAC Compilers can choose to use quadruples, triples, and indirect triples Quadruples: op, arg1, arg2, result Triples: avoid temporary names by using the location of the statement that computes it Indirect triples: list pointers CS 404 Ahmed Ezzat
3. Linear IR: Example: AST TAC AST: a= b * (-c) + b * (-c) IR: TAC t1 := - c t2 := b * t1 t3 := - c t4 := b * t3 t5 := t2 + t4 a := t5 CS 404 Ahmed Ezzat
3. Linear IR: Differences of TAC Implementations Eventually we will put those TACs in different memory locations and go to the next step, i.e., generate assembly code to run Space: Quads take the most space, while triples take the least. Optimizations: Easier to move quads and indirect triples around, hard to move triples. CS 404 Ahmed Ezzat
3. Linear IR: Summary – The IR Machine A machine with: Infinite number of temporaries (think registers) Simple instructions 3-operands Branching Calls (function calls) with simple calling convention Simple code structure Array of instructions Labels to define targets of branches CS 404 Ahmed Ezzat
3. Linear IR: Summary - Temporaries The machine has an infinite number of temporaries: Call them t0, t1, t2, .... Temporaries can hold values of any type The type of the temporary is derived from the generation Temporaries go out of scope with each function CS 404 Ahmed Ezzat
3. Linear IR: Summary – Mapping Names to Variables Variables are names for values Names given by programmers in the input program Names given by compilers for storing intermediate results of computation Reusing temp variables can save space, but mask context and prevent analysis and optimizations Result of t1*t2 is no longer available after t1 is reused Three-address code for x – 2 * y t1 := 2 t2 := y t3 := t1*t2 t4 := x t5 := t4-t3 t1 := 2 t2 := y t1 := t1*t2 t2 := x t1 := t2-t1 Different values use distinct names Different values reuse the same name CS 404 Ahmed Ezzat
3. Linear IR: Summary – Mapping Storage to Variables Variables are placeholders for values: Every variable must have location to store its value Register, stack, heap, static storage Values must be loaded into registers before use x and y are in registers x and y are in memory t1 := 2 t2 := t1*y t3 := x-t2 t1 := 2 t2 := y t3 := t1*t2 t4 := x t5 := t4-t3 Three-address code for x – 2 * y: void A(int b, int *p) { int a, d; a = 3; d = foo(a); *p =b+d; } Which variables can be kept in registers? Which variables must be stored in memory? CS 404 Ahmed Ezzat
4. TAC Examples Below on the left is an arithmetic expression and on the right, is a translation into TAC instructions: Expression TAC Representation a = b * c + b * d; _t1 = b * c; _t2 = b * d; _t3 = _t1 + _t2; a = _t3; CS 404 Ahmed Ezzat
4. TAC Examples Below on the left is an example of branching and its translation to TAC instructions: Expression TAC Representation if (a < b + c) _t1 = b + c; a = a - c; _t2 = a < _t1; c = b * c; IfZ _t2 Goto _L0; _t3 = a - c; a = _t3; _L0: _t4 = b * c; c = _t4; CS 404 Ahmed Ezzat
4. TAC Examples Below on the left is an example of function call and its translation to TAC instructions: Expression TAC Representation n = ReadInteger(); _t0 = LCall _ReadInteger; Binky(arr[n]); n = _t0; _t1 = 4; _t2 = _t1 * n; _t3 = arr + _t2; _t4 = *(_t3); PushParam _t4; LCall _Binky; PopParams 4; CS 404 Ahmed Ezzat
5. Summary IRs provide the interface between the front and back ends of the compiler This can take many forms (Graphical, linear, hybrid). We focused on the linear TAC model Should be machine and language independent Should be amenable to optimization CS 404 Ahmed Ezzat
END CS 404 Ahmed Ezzat