Chap. 4, Intermediate Code Generation
Compilation in a Nutshell 1 Source code (character stream) if (b == 0) a = b; Lexical analysis Token stream if ( b == ) a = b ; Parsing if == = ; Abstract syntax tree (AST) b a b if Semantic Analysis boolean int == = ; Decorated AST int b int 0 int a lvalue int b
Compilation in a Nutshell 2 if boolean int == = ; int b int 0 int a lvalue int b Intermediate Code Generation CJUMP == MEM CONST MOVE NOP Optimization + MEM MEM fp 8 + + fp 4 fp 8 CJUMP == Code generation CX CONST MOVE NOP CMP CX, 0 CMOVZ DX,CX DX CX
Outline Intermediate Code Representation Expressions Translation Array Element Translation Type Conversion Boolean Expression Translation Procedure Translation
Role of Intermediate Code Closer to target language. simplifies code generation. Machine-independent. simplifies retargeting of the compiler. Allows a variety of optimizations to be implemented in a machine-independent way. Many compilers use several different intermediate representations.
Different Kinds of IRs Graphical IRs: the program structure is represented as a graph (or tree) structure. Example: parse trees, syntax trees, DAGs. Linear IRs: the program is represented as a list of instructions for some virtual machine. Example: three-address code. Hybrid IRs: combines elements of graphical and linear IRs.
Graphical IRs 1: Parse Trees A parse tree is a tree representation of a derivation during parsing. Constructing a parse tree: The root is the start symbol S of the grammar. Given a parse tree for X , if the next derivation step is X 1…n then the parse tree is obtained as:
Graphical IRs 2: Abstract Syntax Trees (AST) A syntax tree shows the structure of a program by abstracting away irrelevant details from a parse tree. Each node represents a computation to be performed; The children of the node represents what that computation is performed on.
Graphical IRs 3: Directed Acyclic Graphs (DAGs) A DAG is a contraction of an AST that avoids duplication of nodes. reduces compiler memory requirements; exposes redundancies. E.g.: for the expression (x+y)*(x+y), we have: AST: DAG:
Linear IR 1: Three Address Code Instructions are of the form ‘x = y op z,’ where x, y, z are variables, constants, or “temporaries”. At most one operator allowed on RHS, so no ‘built-up” expressions.
Three Address Code: Example Source: if ( x + y*z > x*y + z) a = 0; Three Address Code: t1 = y*z t2 = x+t1 // x + y*z t3 = x*y t4 = t3+z // x*y + z if (t2 t4) goto L a = 0 L:
An Example Intermediate Instruction Set Procedure call/return: param x, k (x is the kth param) retval x call p enter p leave p return retrieve x Type Conversion: x = cvt_A_to_B y (A, B base types) e.g.: cvt_int_to_float Miscellaneous label L Assignment: x = y op z (op binary) x = op y (op unary); x = y Jumps: if ( x op y ) goto L (L a label); goto L Pointer and indexed assignments: x = y[ z ] y[ z ] = x x = &y x = *y *y = x.
Three Representations of Instructions Three representations of instructions in a data structure Quadruples Triples Indirect triples
Quadruples Quadruple (quad): four fields Exceptions: op, arg1, arg2, result Exceptions: Unary operators: no arg2 Param: no arg2 and result Conditional and unconditional jumps: put the target label in result
Quadruples for a=c*b+c*b op arg1 arg2 result (1) * c b T1 (2) * c b T2 (3) + T1 T2 a
Quadruples for a:=(b+c)*e+(b+c)/f
Triple Triple : three fields op arg1 arg2 (14) uminus c (15) * b (16) (17) (18) + (19) assign a
Outline Intermediate Code Representation Expressions Translation Array Element Translation Type Conversion Boolean Expression Translation Procedure Translation
SDD for Expression Translation The synthesized attribute S.code represents the three-address code for non-terminal S。 Each non-terminal E has two attributes: E.place represents the place to store E’s value。 E.code represents the three-address code for non-terminal E。 Function newtemp returns a different temp variable, such as T1,T2,…, for each call.
Three-address Code Generation SDD for Expression Translation Production Semantic Rules S→id:=E S.code:=E.code || gen(id.place ‘:=’ E.place) E→E1+E2 E.place:=newtemp; E.code:=E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘+’ E2.place) E→E1*E2 E.place:=newtemp; gen(E.place ‘:=’ E1.place ‘*’ E2.place) E→-E1 E.place:=newtemp; E.code:=E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place) E→ (E1) E.place:=E1.place; E.code:=E1.code E→id E.place:=id.place; E.code=‘ ’
Three-address Code Generation SDT for Expression Translation S→id:=E S.code:=E.code || gen(id.place ‘:=’ E.place) E→E1+E2 E.place:=newtemp; E.code:=E1.code || E2.code ||gen(E.place ‘:=’ E1.place ‘+’ E2.place) E→E1*E2 E.place:=newtemp; E.code:=E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place) Three-address Code Generation SDT for Expression Translation S→id:=E { p:=lookup(id.name); if pnil then emit(p ‘:=’ E.place) else error } E→E1+E2 { E.place:=newtemp; emit(E.place ‘:=’ E1.place ‘+’ E2.place)} E→E1*E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘*’ E 2.place)}
Three-address Code Generation SDT for Expression Translation E→-E1 E.place:=newtemp; E.code:=E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place) E→ (E1) E.place:=E1.place; E.code:=E1.code E→id E.place:=id.place; E.code=‘ ’ Three-address Code Generation SDT for Expression Translation E→-E1 { E.place:=newtemp; emit(E.place‘:=’ ‘uminus’E 1.place)} E→(E1) { E.place:=E1.place} E→id { p:=lookup(id.name); if pnil then E.place:=p else error }
a:=(b+c)*e+(b+c)/f
Outline Intermediate Code Representation Expressions Translation Array Element Translation Type Conversion Boolean Expression Translation Procedure Translation
Addressing Array Elements A: array[1..2, 1..3] Column major A[1, 1], A[2, 1], A[1, 2], A[2, 2], A[1, 3], A[2, 3] Row major A[1, 1], A[1, 2], A[1, 3], A[2, 1], A[2, 2], A[2, 3] A[i1, i2] address: base + ( (i1 low1) n2 + (i2 low2 ) ) w =( (i1 n2 ) + i2 ) w + (base ( (low1 n2 ) + low2 ) w)
Addressing Array Elements For an array A[low, low+n-1] with n elements A[i] begins at: base + (i-low)*w For k-dimensional arrays, lowi is the lower-bound of i-th dimension, ((…i1 n2+i2)n3+i3)…)nk+ik)×w + base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w VARPART CONSPART
Array Element Processing Grammar L → id [ Elist ] | id Elist→Elist,E | E To facilitate processing, We rewrite the grammar as L→Elist ] | id Elist→Elist, E | id [ E
New attributes and functions Elist.array : Symbol table entry of id Elist.ndim :number of dimensions. Elist.place :a temporary variable to store the value calculated from the index expression. limit(array,j) :return the length of the j-th dimension.
Each non-terminal L has two attribute values L.place: Symbol table entry of L if L is a simple variable CONSPART value if L is a indexed variable L.offset : Null if L is a simple variable VARPART value if L is a indexed variable
(1) S→L:=E (2) E→E+E (3) E→(E) (4) E→L (5) L→Elist ] (6) L→id (7) Elist→ Elist, E (8) Elist→id [ E
(1) S→L:=E { if L.offset=null then /*L is a simple variable*/ emit(L.place ‘:=’ E.place) else emit( L.place ‘ [’ L.offset ‘]’ ‘:=’ E.place)} (2) E→E1 +E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘+’ E 2.place)}
(3) E→(E1) {E.place:=E1.place} (4) E→L { if L.offset=null then E.place:=L.place else begin E.place:=newtemp; emit(E.place ‘:=’ L.place ‘[’ L.offset ‘]’ ) end }
A[i1,i2,…,ik] ((…i1 n2+i2)n3+i3)…)nk+ik)×w + base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w (8) Elist→id [ E { Elist.place:=E.place; Elist.ndim:=1; Elist.array:=id.place }
A[ i1,i2,…,ik ] ( (…i1 n2+i2)n3+i3)…)nk+ik)×w + base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w (7) Elist→ Elist1, E { t:=newtemp; m:=Elist1.ndim+1; emit(t ‘:=’ Elist1.place ‘*’ limit(Elist1.array,m) ); emit(t ‘:=’ t ‘+’ E.place); Elist.array:= Elist1.array; Elist.place:=t; Elist.ndim:=m }
A[i1,i2,…,ik] ((…i1 n2+i2)n3+i3)…)nk+ik) ×w + base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w (5) L→Elist ] { L.place:=newtemp; emit(L.place ‘:=’ Elist.array ‘-’ C); L.offset:=newtemp; emit(L.offset ‘:=’ w ‘*’ Elist.place) } (6) L→id { L.place:=id.place; L.offset:=null }
a:=B[i,j]
Outline Intermediate Code Representation Expressions Translation Array Element Translation Type Conversion Boolean Expression Translation Procedure Translation
Type Conversion E.type: the data type of non-terminal E Suppose there are two data types: int op real op The semantic action for EE1 op E2: { if E1.type=integer and E2.type=integer E.type:=integer else E.type:=real }
Type Conversion Example x:=y+i*j in which x,y are real and i,j are int。 Three address codes: T1:=i int* j T3:=inttoreal T1 T2:=y real+ T3 x:=T2
Semantic Action for E→E1 +E2 { E.place:=newtemp; if E1.type=integer and E2.type=integer then begin emit (E.place ‘:=’ E 1.place ‘int+’ E 2.place); E.type:=int end else if E1.type=real and E2.type=real then begin emit (E.place ‘:=’ E 1.place ‘real+’ E 2.place); E.type:=real
else if E1.type=integer and E2.type=real then begin u:=newtemp; emit (u ‘:=’ ‘inttoreal’ E 1.place); emit (E.place ‘:=’ u ‘real+’ E 2.palce); E.type:=real end else if E1.type=real and E1.type=integer then begin emit (u ‘:=’ ‘inttoreal’ E 2.place); emit (E.place ‘:=’ E 1.place ‘real+’ u); else E.type:=type_error}
Outline Intermediate Code Representation Expressions Translation Array Element Translation Type Conversion Boolean Expression Translation Procedure Translation
Two translation methods Direct translation: A or B and C=D (1) (=, C, D, T1) (2) (and, B, T1, T2) (3) (or, A, T2, T3) Translation with optimization if (x<100 or x>200 and x<>y) x:=0; if x<100 goto L2 ifFalse x>200 goto L1 ifFlase x<>y goto L1 L2: x=0 L1:
Outline Three-Address Code Expressions Translation Array Element Translation Type Conversion Boolean Expression Translation Direct Translation Optimized Translation Backpatching Procedure Translation
Direct translation a or b and not c can be translated into T1:=not c T2:=b and T1 T3:=a or T1 a<b can be written as if a<b then 1 else 0 Hence, it can translated into 100: if a<b goto 103 101: T:=0 102: goto 104 103: T:=1 104:
Boolean Expression Direct Translation SDT emit – print the three address code to the output file nextstat – address index for the next three address code emit will add 1 to nextstat by generating a new three address code
Boolean Expression Direct Translation SDT E→E1 or E2 {E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘or’ E2.place)} E→E1 and E2 {E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘and’ E2.place)} E→not E1 {E.place:=newtemp; emit(E.place ‘:=’ ‘not’ E 1.place)} E→(E1) {E.place:=E1.place}
Boolean Expression Direct Translation SDT a<b is translated into 100: if a<b goto 103 101: T:=0 102: goto 104 103: T:=1 104: Eid1 relop id2 { E.place:=newtemp; emit(‘if’ id1.place relop. op id2. place ‘goto’ nextstat+3); emit(E.place ‘:=’ ‘0’); emit(‘goto’ nextstat+2); emit(E.place‘:=’ ‘1’) } E→id { E.place:=id.place }
a<b or c<d and e<f Direction Translation 100: if a<b goto 103 101: T1:=0 102: goto 104 103: T1:=1 104: if c<d goto 107 105: T2:=0 106: goto 108 107: T2:=1 108: if e<f goto 111 109: T3:=0 110: goto 112 111: T3:=1 112: T4:=T2 and T3 113: T5:=T1 or T4 Eid1 relop id2 { E.place:=newtemp; emit(‘if’ id1.place relop. op id2. place ‘goto’ nextstat+3); emit(E.place ‘:=’ ‘0’); emit(‘goto’ nextstat+2); emit(E.place‘:=’ ‘1’) } E→id { E.place:=id.place } E→E1 or E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘or’ E2.place)} E→E1 and E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘and’ E2.place) }
Outline Three-Address Code Expressions Translation Array Element Translation Type Conversion Boolean Expression Translation Direct Translation Optimized Translation Backpatching Procedure Translation
Translation for Boolean Expression as Conditional Statement Control if E then S1 else S2 Two exits for E : E.true and E.false To E.true E.code To E.false E.true: S1.code goto S.next E.false: S2.code …… S.next
Example: if a>c or b <d then S1 else S2 the following three address code if a>c goto L2 true exit goto L1 L1: if b<d goto L2 true exit goto L3 false exit L2: (S1 three address code) goto Lnext L3: (S2 three address code) Lnext:
Newlabel- a new label will be returned for each call. For each Boolean expression E,there are two labels: E.true is the label to reach when E is true E.false is the label to reach when E is false
Three Address Code Generation SDD for Boolean Expression Productions Semantic Rules E→E1 or E2 E1.true:=E.true; E1.false:=newlabel; E2.true:=E.true; E2.false:=E.false; E.code:=E1.code || gen(E1.false ‘:’) || E2.code E1.code To E.true To E1.false E2.code To E.false
Three Address Code Generation SDD for Boolean Expression Productions Semantic Rules E→E1 and E2 E1.true:=newlabel; E1.false:=E.false; E2.true:=E.true; E2.false:=E.fasle; E.code:=E1.code || gen(E1.true ‘:’) || E2.code E1.code To E. false To E1. true E2.code To E.true To E.false
Three Address Code Generation SDD for Boolean Expression Productions Semantic Rules E→not E1 E1.true:=E.false; E1.false:=E.true; E.code:=E1.code E→ (E1) E1.true:=E.true; E1.false:=E.false;
Three Address Code Generation SDD for Boolean Expression Productions Semantic Rules E→id1 relop id2 E.code:=gen(‘if ’ id1.place relop.op id2.place ‘goto’ E.true) || gen(‘goto’ E.false) E→true E.code:=gen(‘goto’ E.true) E→false E.code:=gen(‘goto’ E.false)
Outline Intermediate Code Representation Expressions Translation Array Element Translation Type Conversion Boolean Expression Translation Direct Translation Optimized Translation Backpatching Procedure Translation
Backpatching Key problem: matching a jump instruction with the target of the jump Passing labels as inherited attributes, a separate pass is needed to bind labels to addresses Backpatching: passing lists of jumps as synthesized attributes
One-pass Code Generation for Boolean Expression Quadruples (jnz, a, -, p) -- if a goto p (jrop, x, y, p) -- if x rop y goto p (j, -, -, p) -- goto p
Translating Short-Circuit Expressions Using Backpatching E E or M E | E and M E | not E | ( E ) | id relop id | true | false M Synthesized attributes: E.code three-address code E.truelist backpatch list for jumps on true E.falselist backpatch list for jumps on false M.quad location of current three-address quad
Backpatch Operations with Lists nextquad –location of next quadruple makelist(i) creates a new list containing three-address location i, returns a pointer to the list merge(p1, p2) concatenates lists pointed to by p1 and p2, returns a pointer to the concatenates list backpatch(p, i) inserts i as the target label for each of the statements in the list pointed to by p
Backpatching with Lists: Example 100: if a < b goto _ 101: goto _ 102: if c < d goto _ 103: goto _ 104: if e < f goto _ 105: goto _ a < b or c < d and e < f backpatch 100: if a < b goto TRUE 101: goto 102 102: if c < d goto 104 103: goto FALSE 104: if e < f goto TRUE 105: goto FALSE
Backpatching with Lists: Translation Scheme M { M.quad := nextquad() } E E1 or M E2 { backpatch(E1.falselist, M.quad); E.truelist := merge(E1.truelist, E2.truelist); E.falselist := E2.falselist } E E1 and M E2 { backpatch(E1.truelist, M.quad); E.truelist := E2.truelist; E.falselist := merge(E1.falselist, E2.falselist); } E not E1 { E.truelist := E1.falselist; E.falselist := E1.truelist } E ( E1 ) { E.truelist := E1.truelist; E.falselist := E1.falselist }
Backpatching with Lists: Translation Scheme (cont’d) E id1 relop id2 { E.truelist := makelist(nextquad()); E.falselist := makelist(nextquad() + 1); emit(‘if’ id1.place relop.op id2.place ‘goto _’); emit(‘goto _’) } E true { E.truelist := makelist(nextquad()); E.falselist := nil; emit(‘goto _’) } E false { E.falselist := makelist(nextquad()); E.truelist := nil; emit(‘goto _’) }
Flow-of-Control Statements and Backpatching: Grammar S if E then S | if E then S else S | while E do S | begin L end | A L L ; S | S Synthesized attributes: S.nextlist backpatch list for jumps to the next statement after S (or nil) L.nextlist backpatch list for jumps to the next statement after L (or nil) Jumps out of S1 S1 ; S2 ; S3 ; S4 ; S4 … 100: Code for S1 200: Code for S2 300: Code for S3 400: Code for S4 500: Code for S5 backpatch(S1.nextlist, 200) backpatch(S2.nextlist, 300) backpatch(S3.nextlist, 400) backpatch(S4.nextlist, 500)
Flow-of-Control Statements and Backpatching S A { S.nextlist := nil } S begin L end { S.nextlist := L.nextlist } S if E then M S1 { backpatch(E.truelist, M.quad); S.nextlist := merge(E.falselist, S1.nextlist) } L L1 ; M S { backpatch(L1.nextlist, M.quad); L.nextlist := S.nextlist; } L S { L.nextlist := S.nextlist; } M { M.quad := nextquad() }
Flow-of-Control Statements and Backpatching (cont’d) S if E then M1 S1 N else M2 S2 { backpatch(E.truelist, M1.quad); backpatch(E.falselist, M2.quad); S.nextlist := merge(S1.nextlist, merge(N.nextlist, S2.nextlist)) } S while M1 E do M2 S1 { backpatch(S1,nextlist, M1.quad); backpatch(E.truelist, M2.quad); S.nextlist := E.falselist; emit(‘goto _’) } N { N.nextlist := makelist(nextquad()); emit(‘goto _’) }
while (a<b) do if (c<d) then x:=y+z; S→if E then M S1 { backpatch(E.truelist, M.quad); S.nextlist:=merge(E.falselist, S1.nextlist) } M→ { M.quad:=nextquad } S→A { S.nextlist:=makelist( ) } (5) E→id1 relop id2 { E.truelist:=makelist(nextquad); E.falselist:=makelist(nextquad+1); emit(‘j’ relop.op ‘,’ id 1.place ‘,’ id 2.place‘,’ ‘_’); emit(‘j, -, -, _’) } S→while M1 E do M2 S1 { backpatch(S1.nextlist, M1.quad); backpatch(E.truelist, M2.quad); S.nextlist:=E.falselist emit(‘j,-,-,’ M1.quad) } M→ { M.quad:=nextquad } S→id:=E { p:=lookup(id.name); if pnil then emit(p ‘:=’ E.place) else error } E→E1+E2 { E.place:=newtemp; emit(E.place ‘:=’ E1.place ‘+’ E2.place)}
while (a<b) do if (c<d) then x:=y+z; 100 (j<, a, b, 102) 101 (j, -, -, 107) 102 (j<, c, d, 104) 103 (j, -, -, 100) 104 (+, y, z, T) 105 (:=, T, -, x) 106 (j, -, -, 100) 107
Outline Intermediate Code Representation Expressions Translation Array Element Translation Type Conversion Boolean Expression Translation Procedure Translation
Translating Procedure Calls S call id ( Elist ) Elist Elist , E | E foo(a+1, b, 7) t1 := a + 1 t2 := 7 param t1 param b param t2 call foo 3
Translating Procedure Calls S call id ( Elist ) { for each item p on queue do emit(‘param’ p); emit(‘call’ id.place |queue|) } Elist Elist , E { append E.place to the end of queue } Elist E { initialize queue to contain only E.place }