Intermediate code generation
Intermediate code generation Translating source program into an “intermediate language.” Simple CPU Independent, Benefits Retargeting is facilitated Machine independent Code Optimization can be applied.
Types of Intermediate languages Intermediate language can be many different languages, and the designer of the compiler decides this intermediate language. syntax trees postfix notation three-address code
Syntax Trees a + a * ( b - c ) + ( b - c ) * d Syntax Tree DAG
a:=b*-c+b*-c Syntax Tree DAG assign assign + a + a * * * uminus b
2. Postfix Notation Form Rules: 1. If E is a variable/constant, the PN of E is E itself 2. If E is an expression of the form E1 op E2, the PN of E is E1 ’E2 ’op (E1 ’ and E2 ’ are the PN of E1 and E2, respectively.) 3. If E is a parenthesized expression of form (E1), the PN of E is the same as the PN of E1.
Example (a+b)/(c-d) Postfix: ab+cd-/
3. Three address code Statements of general form x:=y op z No built-up arithmetic expressions are allowed. As a result, x:=y + z * w should be represented as t1:=z * w t2:=y + t1 x:=t2
Example t1:=- c t2:=b * t1 t3:=- c t4:=b * t3 t5:=t2 + t4 a:=t5 a:=b*-c+b*-c t1:=- c t2:=b * t1 t3:=- c t4:=b * t3 t5:=t2 + t4 a:=t5
Types of Three-Address Statements Assignment Statement: x:=y op z Assignment Statement: x:=op z Copy Statement: x:=z Unconditional Jump: goto L Conditional Jump: if x relop y goto L Stack Operations: Push/pop More Advanced: Procedure: param x1 param x2 … param xn call p,n Index Assignments: x:=y[i] x[i]:=y Address and Pointer Assignments: x:=&y x:=*y *x:=y
Implementations of 3-address statements Quadruples Triples Indirect triples
Quadruples op arg1 arg2 result (0) uminus c t1 (1) * b t2 (2) (3) t3 (4) + t5 (5) := a a:=b*-c+b*-c t1:=- c t2:=b * t1 t3:=- c t4:=b * t3 t5:=t2 + t4 a:=t5 Temporary names must be entered into the symbol table as they are created.
Triples a:=b*-c+b*-c op arg1 arg2 (0) uminus c (1) * b (2) (3) (4) + (5) assign a a:=b*-c+b*-c t1:=- c t2:=b * t1 t3:=- c t4:=b * t3 t5:=t2 + t4 a:=t5 Temporary names are not entered into the symbol table.
Other types of 3-address statements e.g. ternary operations like x[i]:=y x:=y[i] require two or more entries. e.g. op arg1 arg2 (0) [ ] = x i (1) assign y op arg1 arg2 (0) [ ] = y i (1) assign x
Indirect Triples a:=b*-c+b*-c op (0) (14) (1) (15) (2) (16) (3) (17) (4) (18) (5) (19) op arg1 arg2 (14) uminus c (15) * b (16) (17) (18) + (19) assign a t1:=- c t2:=b * t1 t3:=- c t4:=b * t3 t5:=t2 + t4 a:=t5
Assignment Statements S -> id := E { ptr := lookup(id.name); if ptr <> nil then emit(ptr ‘:=‘ E.place) else error} E -> E1 + E2 { E.place := newtemp; emit(E.place ‘:=‘ E1.place ‘+’ E2.place) } E -> E1 * E2 { E.place := newtemp; emit(E.place ‘:=‘ E1.place ‘*’ E2.place) } E -> - E1 { E.place := newtemp; emit(E.place ‘:=‘ ‘uminus’ E1.place)} E -> ( E1 ) { E.place = E1.place } E -> id { ptr := lookup (id.name); E.place = ptr;
Reusing temporaries A simple algorithm: Say we have a counter c, initialized to zero Whenever a temporary name is used, decrement c by 1 Whenever a new temporary name is created, use $c and increment c by 1 E.g.: x := a*b + c*d – e*f Statement Value of C ---------------------------------------------- $0 := a*b ; 1 (c incremented by 1) $1 := c*d ; 2 (c incremented by 1) $0 := $0 + $1 ; 1 (c decremented twice, incremented once) $1 := e * f ; 2 (c incremented by 1) $0 := $0 -$1 ; 1 (c decremented twice, incremented once) x := $0 ; 0 (c decremented once)