Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

Slides:



Advertisements
Similar presentations
Euripides Montagne University of Central Florida (Summer 2011)
Advertisements

Target Code Generation
Intermediate Code Generation
Chapter 10- Instruction set architectures
Chapter 8 ICS 412. Code Generation Final phase of a compiler construction. It generates executable code for a target machine. A compiler may instead generate.
CHAPTER 2 GC101 Program’s algorithm 1. COMMUNICATING WITH A COMPUTER  Programming languages bridge the gap between human thought processes and computer.
There are two types of addressing schemes:
 Suppose for a moment that you were asked to perform a task and were given the following list of instructions to perform:
8 Intermediate code generation
1 Compiler Construction Intermediate Code Generation.
IT253: Computer Organization Lecture 6: Assembly Language and MIPS: Programming Tonga Institute of Higher Education.
The CPU Revision Typical machine code instructions Using op-codes and operands Symbolic addressing. Conditional and unconditional branches.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
Chapter 14: Building a Runnable Program Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code.
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
Chapter 7 Expressions and Assignment Statements. Copyright © 2007 Addison-Wesley. All rights reserved. 1–2 Arithmetic Expressions Arithmetic evaluation.
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2008/2009.
Bit Operations C is well suited to system programming because it contains operators that can manipulate data at the bit level –Example: The Internet requires.
A bit can have one of two values: 0 or 1. The C language provides four operators that can be used to perform bitwise operations on the individual bits.
Lecture 18 Last Lecture Today’s Topic Instruction formats
CS412/413 Introduction to Compilers Radu Rugina Lecture 15: Translating High IR to Low IR 22 Feb 02.
1 Stacks Chapter 4 2 Introduction Consider a program to model a switching yard –Has main line and siding –Cars may be shunted, removed at any time.
Summer 2014 Chapter 1: Basic Concepts. Irvine, Kip R. Assembly Language for Intel-Based Computers 6/e, Chapter Overview Welcome to Assembly Language.
CoE3DJ4 Digital Systems Design
Compiler Chapter# 5 Intermediate code generation.
CSC 3210 Computer Organization and Programming Chapter 1 THE COMPUTER D.M. Rasanjalee Himali.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
C++ Programming: From Problem Analysis to Program Design, Fourth Edition Chapter 19: Stacks and Queues (part 2)
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 18: Stacks and Queues (part 2)
Chapter 1 Introduction Study Goals: Master: the phases of a compiler Understand: what is a compiler Know: interpreter,compiler structure.
Chapter 10 Code Optimization Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.
Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
CHP-3 STACKS.
Code Generation CPSC 388 Ellen Walker Hiram College.
What is a program? A sequence of steps
Intermediate Language  Compiler Model Front-End− language dependant part Back-End− machine dependant part [1/34]
Code Generation How to produce intermediate or target code.
1 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator.
 Most C programs perform calculations using the C arithmetic operators (Fig. 2.9).  Note the use of various special symbols not used in algebra.  The.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
Review A program is… a set of instructions that tell a computer what to do. Programs can also be called… software. Hardware refers to… the physical components.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
OPERATORS IN C CHAPTER 3. Expressions can be built up from literals, variables and operators. The operators define how the variables and literals in the.
1 Chapter10: Code generator. 2 Code Generator Source Program Target Program Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator.
Principles of Programming - NI July Chapter 4: Basic C Operators In this chapter, you will learn about: Assignment operators Arithmetic operators.
INTERMEDIATE LANGUAGES SUNG-DONG KIM DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
Compiler Chapter 9. Intermediate Languages Sung-Dong Kim Dept. of Computer Engineering, Hansung University.
Data Transfers, Addressing, and Arithmetic
Intermediate code Jakub Yaghob
Chapter 12 Variables and Operators
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Chapter 9. Intermediate Languages
Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 4 – The Instruction Set Architecture.
Assembly Language Programming Part 2
Stacks Chapter 4.
Algorithms and Data Structures
Intermediate Code Generation
Chapter 12 Variables and Operators
Instruction set architectures
CSCE Fall 2013 Prof. Jennifer L. Welch.
Chapter 6 Intermediate-Code Generation
College of Computer Science and Engineering
Chapter-3 Operators.
Instruction set architectures
CSCE Fall 2012 Prof. Jennifer L. Welch.
8 Code Generation Topics A simple code generator algorithm
OPERATORS in C Programming
DATA TYPES There are four basic data types associated with variables:
OPERATORS in C Programming
Presentation transcript:

Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University

2 Intermediate code generation is in the mediate part of compiler, it is a bridge which translate source program into intermediate representation and then translate into target code. The position of intermediate code generation in compiler is shown in Figure 8.1..

3

4 There are two advantages of using intermediate code, The first one is that we can attach different target code machines to same front part after the part of intermediate code generation; ; The second one is that a machine-independent code optimizer can be applied to the intermediated representation..

5 Intermediate codes are machine independent codes, but they are close to machine instructions. The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator..

6 Intermediate language can be many different languages, and the designer of the compiler decides this intermediate language. Postfix notation, four-address code(Quadraples), three- address code, portable code and assembly code can be used as an intermediate language. In this chapter, we will introduce them in detail.

Postfix Notation If we can represent the source program by postfix notation, it will be easy to be translated into target code, because the target instruction order is same with the operator order in postfix notation..

The definition of postfix notation the postfix notation for the expression a+b*c is abc*+. the expression are as follows: 1 The order of operands for expression in postfix notation is same with its original order. 2 Operator follows its operand, and there are no parentheses in postfix notation. 3 The operator appears in the order by the calculation order.

9 For example, the postfix notation for expression a*(b+c/d) is abcd/+*, the translation procedure is just following the steps above.. firstly, according to step 1 we get the order of operands of the expression: abcd, secondly, by the step 2, the first operator in operator order is /, because it just follows its operands cd, in addition, as the step 3, operator / is calculated first, so the operator follow operands is /. The second operator in operator order is +, it dues to that there is parentheses in the original expression, operator + should be calculated earlier than operator *.The last one is *, because * is calculated lastly..

10 The other example, the postfix notation for expression a*b+(c-d)/e is ab*cd-e/+. From examples, we know it is a bit difficult to translate an expression into its postfix notation. So scientist E.W.DIJKSTRA from Holand created a method to solve the problem..

E.W.DIJKSTRA Method There are two stacks in E.W.DIJKSTRA method, one stack storages operands, the other one is for operators, the procedure of it is shown by Figure 8.2, and the step of E.W.DIJKSTRA method is as follows:.

12

13 Actually, scanning the expression is from left to right. At the beginning of scanning, we push identifier # to the bottom of operator stack, similarly, we add identifier # to the end of expression to label that it is terminal of expression. When the two identifier # meet, it means the end of scanning. The steps of scanning are: 1 If it is operand, go to the operand stack :

If it is operator, it should be compared with the operator on the top of operator stack. When the priority of operator on the top stack is bigger than the scanning operator, or equal to it, the operator on the top of operator stack would be popped and go to the left side. On the other hand when the priority of operator on the top stack is less than the scanning operator, scanning operator should be pushed into operator stack.

If it is left parenthesis, just push it into operator stack, and then compare the operators within parentheses.. If it is right parenthesis, pop all the operators within parentheses, what is more, parentheses would be disappeared and would not be represented as postfix notation.. 4 Return to step 1 till two identifier # meet.

16 Example 8.1 There is an expression of a+b*c, its postfix notation is abc*+. From the translating procedure shown by Figure 8.3, we can see that operator order is *+, it is also the pop order of the operator stack and calculating order.

17

18

Extended postfix notation If the expression E is in the form of E 1 :=E 2, then the postfix notation for E is :=, where and are the postfix notation for E 1 and E 2,respectively. Example 8.2 There is an expression a:=b, according to the definition above, its postfix notation is ab:=.

20 Example 8.3 For expression a : =5*(b+8), the postfix notation of it is: a5b8+*:= If There is a program which form is : IF u THEN S 1 ELSE ……………………BEGIN S 2 ……………………END

21 u in this program is a condition which has two results, one is true, the other one is false, S 1, S 2 are two parts of program, l 1 is the start number of S 2, l 2 is the end number of S 2.. The postfix notation of the program is as follows: BZ means if the value u is not true, then turn to l 1, BR means just go to l 2. Now we will give an example to explain the translation..

22 Example 8.4 S:=0; IF i<10 THEN S : =S+i ELSEBEGIN i:=i+1 END ; The postfix notation of example 8.4 is as follows, the value of in this program is 7, the value of is 9. S0 : = i10< 7 BZ SSi+:= 9 BR (7) ii1+:= (9)

Four-Address Code The other representation of intermediate code is four-address code, and its definition is: op, y, z, x Apply operator op to y and z, and store the result in x. where x, y and z are names, constants or compiler-generated temporaries; op is any operator..

24 Example 8.5 The expression: a+(-b*c+d)*e might be translated into the following four-address code sequence ( 1 )( - , b , , T1 ) ( 2 )( * , T1 , c , T2 ) ( 3 )( + , T2 , d , T3 ) ( 4 )( * , T3 , e , T4 ) ( 5 )( + , a , T4 , T5 ) We can divide four-address code into different types according to their operators.

25 Binary Operator: op, y, z, result Where op is a binary arithmetic or logical operator. This binary operator is applied to y and z, and the result of the operation is stored in result. For example: a+b*c, its four address code is: ( * , b , c , T1 ) ( + , a , T1 , T2 )

26 Unary Operator: op, y,, result Where op is a unary arithmetic or logical operator. This unary operator is applied to y, and the result of the operation is stored in result. For example, expression S:=0, its four address code is: : (: = , 0 , , S )

27 Unconditional Jumps: op, L, We will jump to the three-address code with the label L, and the execution continues from that statement. Here op is BR. For example: : ( BR , 9 , , )

28 Conditional Jumps: op, L,,x Here op is BZ, we will jump to the three-address code with the label L if the result of x is not true, and the execution continues from that statement. If the result is true, the execution continues from the statement following this conditional jump statement.. ( BZ , 7 , , T1 )

29 The four address code for example 8.4: ( 1 )(: = , 0 , , S ) ( 2 )( — , i , 10 , T1 ) ( 3 )( BZ , 7 , , T1 ) ( 4 )( + , S , i , T2 ) ( 5 )(: = , T2 , , S ) ( 6 )( BR , 9 , , ) ( 7 )( + , i , l , T3 ) ( 8 )(: = , T3 , , i ) The storage for four-address code is similar with postfix notation, namely, they are all use E.W.DIJKSTRA method to realize the operation..

Three-Address Code The difference between three-address code and four- address code is the different memory they occupy. When we produce target code, all the data will be assigned run- time memory. The memory location will be placed in the symbol-table for the data. Compared with three-address code, symbol table for four-address code interpose an extra field to store the result part in four-address code. When we use the calculation result, we should only look for the fourth part in four-address code, however, in three- address code, we should define a temporary value which references to the result part. This problem makes three- address code more difficult to be designed in an optimizing compiler..

Portable Code Portable code is a kind of intermediate code, it can be written by many program languages. This section, we will explain portable code written by PASCAL subprogram.. Portable code includes two sections, one is PROCEDURE BLOCK which forms intermediate code, the other one is PROCEDURE GEN which generates intermediate code, and then stores it to CODE by PROCEDURE INTERPRET

32 We will introduce the PROCEDURE GEN in detail. There is a PASCAL source program which is shown below.. PROGRAM main ; PROCEDURE 1 ; PROCEDURE 2 ; BEGIN READ ( i ); WHILE i>1 DO BEGIN IF i>10 THEN CALL 1 ELSE BEGIN CALL 2 END ; END

33 The portable code of the source program is as follows.

34 From above portable code, the structure of portable code includes three parts, the first one is operand, such as INT, STO, OPR, LOD, JPC and CAL, the second part is the level value, actually it is 0, the third part is value, such as relative address, the number of units, procedure enter address, value of constant or some special operators..

35 INT means data space in stack. A represents unit number in stack for procedures, for example, 5 in line 11. CAL means that it calls procedure. A in it is the address of procedure. LIT is pushing constant into the top of stack. A in it is the value of constant. LOD is pushing variable into the top of stack. A in it is the relative address of variable.

36 STO means to pop the top of stack to unit. A in it is the relative address of it. JMP means to go to an address directly. JPC is to move the address while the value on the top of stack is false, otherwise it moves forward. OPR is operator. When A=2, it represents the calculation of “+”. When A=12, it means “ > ”. When A=16 means the operator of “read” which reads data from the top of stack. When A=0, it means fetch return address.

Assembly code Assembly code is a kind of intermediate code. Compared with three-address code and four- address code, it has the following advantages: 1 It is easier to be translated into machine code, in addition, its code is mapped to machine code one by one.. 2 It needn’t to be calculated the transfer address, because it often use symbol to represent address.

38 It can use all kinds of bite to represent data, and needn’t to be transferred. Example 8.5 a+(-b*c+d)*e might be translated into the following assembly code

39 Mov ax,b Neg ax Mov bx,c Imul bx Mov bx,d Add ax,bx Mov bx,e Imul bx Mov bx,a Add ax,bx Mov t,ax

40 We will explain the assembly code above by examples, such as: Mov ax,b means storing data b to variable ax, Neg ax means that the value ax is negative. Imul bx means multiplying value bx by value ax, and then stores their result to ax. Add ax,bx means add value bx to value ax, and then stores their result to ax. t means a temporary variable.