Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University
2 Intermediate code generation is in the mediate part of compiler, it is a bridge which translate source program into intermediate representation and then translate into target code. The position of intermediate code generation in compiler is shown in Figure 8.1..
3
4 There are two advantages of using intermediate code, The first one is that we can attach different target code machines to same front part after the part of intermediate code generation; ; The second one is that a machine-independent code optimizer can be applied to the intermediated representation..
5 Intermediate codes are machine independent codes, but they are close to machine instructions. The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator..
6 Intermediate language can be many different languages, and the designer of the compiler decides this intermediate language. Postfix notation, four-address code(Quadraples), three- address code, portable code and assembly code can be used as an intermediate language. In this chapter, we will introduce them in detail.
Postfix Notation If we can represent the source program by postfix notation, it will be easy to be translated into target code, because the target instruction order is same with the operator order in postfix notation..
The definition of postfix notation the postfix notation for the expression a+b*c is abc*+. the expression are as follows: 1 The order of operands for expression in postfix notation is same with its original order. 2 Operator follows its operand, and there are no parentheses in postfix notation. 3 The operator appears in the order by the calculation order.
9 For example, the postfix notation for expression a*(b+c/d) is abcd/+*, the translation procedure is just following the steps above.. firstly, according to step 1 we get the order of operands of the expression: abcd, secondly, by the step 2, the first operator in operator order is /, because it just follows its operands cd, in addition, as the step 3, operator / is calculated first, so the operator follow operands is /. The second operator in operator order is +, it dues to that there is parentheses in the original expression, operator + should be calculated earlier than operator *.The last one is *, because * is calculated lastly..
10 The other example, the postfix notation for expression a*b+(c-d)/e is ab*cd-e/+. From examples, we know it is a bit difficult to translate an expression into its postfix notation. So scientist E.W.DIJKSTRA from Holand created a method to solve the problem..
E.W.DIJKSTRA Method There are two stacks in E.W.DIJKSTRA method, one stack storages operands, the other one is for operators, the procedure of it is shown by Figure 8.2, and the step of E.W.DIJKSTRA method is as follows:.
12
13 Actually, scanning the expression is from left to right. At the beginning of scanning, we push identifier # to the bottom of operator stack, similarly, we add identifier # to the end of expression to label that it is terminal of expression. When the two identifier # meet, it means the end of scanning. The steps of scanning are: 1 If it is operand, go to the operand stack :
If it is operator, it should be compared with the operator on the top of operator stack. When the priority of operator on the top stack is bigger than the scanning operator, or equal to it, the operator on the top of operator stack would be popped and go to the left side. On the other hand when the priority of operator on the top stack is less than the scanning operator, scanning operator should be pushed into operator stack.
If it is left parenthesis, just push it into operator stack, and then compare the operators within parentheses.. If it is right parenthesis, pop all the operators within parentheses, what is more, parentheses would be disappeared and would not be represented as postfix notation.. 4 Return to step 1 till two identifier # meet.
16 Example 8.1 There is an expression of a+b*c, its postfix notation is abc*+. From the translating procedure shown by Figure 8.3, we can see that operator order is *+, it is also the pop order of the operator stack and calculating order.
17
18
Extended postfix notation If the expression E is in the form of E 1 :=E 2, then the postfix notation for E is :=, where and are the postfix notation for E 1 and E 2,respectively. Example 8.2 There is an expression a:=b, according to the definition above, its postfix notation is ab:=.
20 Example 8.3 For expression a : =5*(b+8), the postfix notation of it is: a5b8+*:= If There is a program which form is : IF u THEN S 1 ELSE ……………………BEGIN S 2 ……………………END
21 u in this program is a condition which has two results, one is true, the other one is false, S 1, S 2 are two parts of program, l 1 is the start number of S 2, l 2 is the end number of S 2.. The postfix notation of the program is as follows: BZ means if the value u is not true, then turn to l 1, BR means just go to l 2. Now we will give an example to explain the translation..
22 Example 8.4 S:=0; IF i<10 THEN S : =S+i ELSEBEGIN i:=i+1 END ; The postfix notation of example 8.4 is as follows, the value of in this program is 7, the value of is 9. S0 : = i10< 7 BZ SSi+:= 9 BR (7) ii1+:= (9)
Four-Address Code The other representation of intermediate code is four-address code, and its definition is: op, y, z, x Apply operator op to y and z, and store the result in x. where x, y and z are names, constants or compiler-generated temporaries; op is any operator..
24 Example 8.5 The expression: a+(-b*c+d)*e might be translated into the following four-address code sequence ( 1 )( - , b , , T1 ) ( 2 )( * , T1 , c , T2 ) ( 3 )( + , T2 , d , T3 ) ( 4 )( * , T3 , e , T4 ) ( 5 )( + , a , T4 , T5 ) We can divide four-address code into different types according to their operators.
25 Binary Operator: op, y, z, result Where op is a binary arithmetic or logical operator. This binary operator is applied to y and z, and the result of the operation is stored in result. For example: a+b*c, its four address code is: ( * , b , c , T1 ) ( + , a , T1 , T2 )
26 Unary Operator: op, y,, result Where op is a unary arithmetic or logical operator. This unary operator is applied to y, and the result of the operation is stored in result. For example, expression S:=0, its four address code is: : (: = , 0 , , S )
27 Unconditional Jumps: op, L, We will jump to the three-address code with the label L, and the execution continues from that statement. Here op is BR. For example: : ( BR , 9 , , )
28 Conditional Jumps: op, L,,x Here op is BZ, we will jump to the three-address code with the label L if the result of x is not true, and the execution continues from that statement. If the result is true, the execution continues from the statement following this conditional jump statement.. ( BZ , 7 , , T1 )
29 The four address code for example 8.4: ( 1 )(: = , 0 , , S ) ( 2 )( — , i , 10 , T1 ) ( 3 )( BZ , 7 , , T1 ) ( 4 )( + , S , i , T2 ) ( 5 )(: = , T2 , , S ) ( 6 )( BR , 9 , , ) ( 7 )( + , i , l , T3 ) ( 8 )(: = , T3 , , i ) The storage for four-address code is similar with postfix notation, namely, they are all use E.W.DIJKSTRA method to realize the operation..
Three-Address Code The difference between three-address code and four- address code is the different memory they occupy. When we produce target code, all the data will be assigned run- time memory. The memory location will be placed in the symbol-table for the data. Compared with three-address code, symbol table for four-address code interpose an extra field to store the result part in four-address code. When we use the calculation result, we should only look for the fourth part in four-address code, however, in three- address code, we should define a temporary value which references to the result part. This problem makes three- address code more difficult to be designed in an optimizing compiler..
Portable Code Portable code is a kind of intermediate code, it can be written by many program languages. This section, we will explain portable code written by PASCAL subprogram.. Portable code includes two sections, one is PROCEDURE BLOCK which forms intermediate code, the other one is PROCEDURE GEN which generates intermediate code, and then stores it to CODE by PROCEDURE INTERPRET
32 We will introduce the PROCEDURE GEN in detail. There is a PASCAL source program which is shown below.. PROGRAM main ; PROCEDURE 1 ; PROCEDURE 2 ; BEGIN READ ( i ); WHILE i>1 DO BEGIN IF i>10 THEN CALL 1 ELSE BEGIN CALL 2 END ; END
33 The portable code of the source program is as follows.
34 From above portable code, the structure of portable code includes three parts, the first one is operand, such as INT, STO, OPR, LOD, JPC and CAL, the second part is the level value, actually it is 0, the third part is value, such as relative address, the number of units, procedure enter address, value of constant or some special operators..
35 INT means data space in stack. A represents unit number in stack for procedures, for example, 5 in line 11. CAL means that it calls procedure. A in it is the address of procedure. LIT is pushing constant into the top of stack. A in it is the value of constant. LOD is pushing variable into the top of stack. A in it is the relative address of variable.
36 STO means to pop the top of stack to unit. A in it is the relative address of it. JMP means to go to an address directly. JPC is to move the address while the value on the top of stack is false, otherwise it moves forward. OPR is operator. When A=2, it represents the calculation of “+”. When A=12, it means “ > ”. When A=16 means the operator of “read” which reads data from the top of stack. When A=0, it means fetch return address.
Assembly code Assembly code is a kind of intermediate code. Compared with three-address code and four- address code, it has the following advantages: 1 It is easier to be translated into machine code, in addition, its code is mapped to machine code one by one.. 2 It needn’t to be calculated the transfer address, because it often use symbol to represent address.
38 It can use all kinds of bite to represent data, and needn’t to be transferred. Example 8.5 a+(-b*c+d)*e might be translated into the following assembly code
39 Mov ax,b Neg ax Mov bx,c Imul bx Mov bx,d Add ax,bx Mov bx,e Imul bx Mov bx,a Add ax,bx Mov t,ax
40 We will explain the assembly code above by examples, such as: Mov ax,b means storing data b to variable ax, Neg ax means that the value ax is negative. Imul bx means multiplying value bx by value ax, and then stores their result to ax. Add ax,bx means add value bx to value ax, and then stores their result to ax. t means a temporary variable.