Chapter 6 Compiler
Introduction Translator (Program) Object program or target program Source Program High Level Language Eg: Pascal or C Compiler Low level language Eg: Assembly language or Machine language
Need for Translator Programming with machine language is direct communication with a computer but it is easy to make a mistake. Because of the difficulties with machine language programming we use assembly language. In the assembly program, the programmer must know the details of how a specific computer operates and also translate complex operations and data structures into sequences of low level operation which use only the primitive data types that machine language provides.
Need for Translator… The programmer must also be intimately concerned with how and where data is represented within the machine. To avoid these problems, high-level programming language were developed. A high-level programming language makes the programming task simpler, but it also introduces some problems. We have to use compiler to translate the high-level language into a machine language.
Compiler vs Interpreter
Compilation The program processing is considerable The resulting intermediate form, machine-specific binary executable code, is low-level The interpreting mechanism is the hardware CPU; and Program execution is relatively fast.
Interpretation The program processing in minimal to moderate The resulting intermediate form, some system-specific data structure, is high-to medium-level The interpreting mechanism is a (software) program Program execution is relatively slow.
Compiling Phases
The Structure of a Compiler
Lexical Analysis The process is scanning and done by scanner or tokenizer (translates the input into a form that is more usable by the rest of the compiler). The token is an indivisible lexical unit. The usual token are Keywords – DO, IF Identifiers – SUM, X Operator Symbols - <,=,+ Punctuation symbols – parenthesis, commas
Lexical Analysis… Beside that it also: Handle numeric and string literals. Remove white space and comments. The other important task is to build a symbol table. This is a table of all the identifiers (variable name, procedures and constants) used in the program.
Syntactic Analysis The process is parsing and done by parser or syntactic analyzer. Parser recognizes syntactically legal programs (as defined by a grammar) and reject illegal ones. Each different computer language has its own grammar which makes it unique. Some grammars are complex (PL/I) and other are relatively easy (PASCAL).
Syntactic Analysis… In syntactic analysis there are two major routines comprise: Parsing routines The parser check for the correct order of the tokens and then call the semantic routines to check whether the series of tokens(a production) will make sense to the computer Output: Generate Parse tree and Syntax tree Semantic routines The semantic routines then reduces the production another step toward complete translation to machine code.
Grammars A grammar over a given character set consists of A set of terminals, which are strings of zero or more characters. A set of non terminals, which are variables representing a set of terminals. A set of productions, each of which has a left side consisting of a single non terminal and a right side consisting of zero or more terminals or non terminals. A distinguished starting non terminal.
Grammars… Used for description, parsing, analysis, etc. Base on recursive definition of program structure. Many possible representation, including BNF (Backus-Naur Form), EBNF (Extended BNF), syntax charts, etc.
BNF BNF was invented ca. 1960 and used in the formal description of Algo-60. It is just a particular notation for grammars, in which Non terminals are represented by names inside angle brackets Example: <program>,<expression>,<S> Terminals are represented by themselves Example: WHILE, (, 3. The empty string is written as <empty>
BNF Example <program> ::=BEGIN<statement-seq>END <statement-seq> ::=<statement> <statement-seq> ::=<statement>; <statement-seq> <statement> ::=<while-statement> <statement> ::=<for-statement> <statement> ::=<empty> <while-statement>::=WHILE<expression>DO<statement-seq>END <expression> ::=<factor> <expression> ::=<factor> AND <factor> <expression> ::=<factor> OR <factor> <factor> ::=(<expression>) <factor> ::=<variable> <for-statement> ::=… <variable> ::=…
EBNF EBNF is (any) extension of BNF, usually with these features: A vertical bar, |, represents a choice, Parentheses, ( and ), represent grouping, Square brackets, [ and ], represent an optional construct Curly braces, { and }, represent zero or more repetitions Non terminals begin with upper-case letters. Non-alphabetic terminal symbols are quoted, at least when necessary to avoid confusion with the meta-symbols above.
EBNF Example Program ::= BEGIN Statement-seq END Statement-seq ::= Statement [‘;’ Statement-seq] Statement ::=[While-statement | For- statement] While-statement ::= WHILE Expression DO Statement-seq END Expression ::= Factor { (AND|OR) Factor} Factor ::= ‘(‘ Expression ‘)’ | Variable For-statement ::=… Variable ::=…
Simplified Pascal Grammar <prog> ::= PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-list> END <prog-name> ::= id <dec-list> ::= <dec> | <dec-list> ; <dec> <dec> ::= <id-list>:<type> <type> ::= INTEGER <id-list> ::= id | <id-list> , id <stmt-list> ::= <stmt> | <stmt-list> ; <stmt> <stmt> ::= <assign> | <read> | <write> | <for> <assign> ::= id::= <exp> <exp> ::= <term> | <exp> + <term> | <exp> - <term> <term> ::= <factor> | <term> * <factor> | <term> DIV <factor> <factor> ::= id | int | ( <exp> ) <read> ::= READ (<id-list>) <write> ::= WRITE (<id-list>) <for> ::= FOR <index-exp> DO <body> <index-exp> ::= id:=<exp> TO <exp> <body> ::= <stmt> | BEGIN <stmt-list> END
Parse Tree When each symbol is distinguished and classified in the symbol table, the compiler must understand its syntactic construction. The syntactic structures are expressed in phrases and such phrases are the core of the language and form groups of tokens. The compiler must find the meaning of each phrase. The grammatical structure of the language and operation priorities is called a parse tree.
Example of Parse Tree Position := Initial + Rate * 60
Syntax Tree Information from parse tree become useless, such as variable name, arithmetic operations and constants, can take up memory space and might slow down the computer. Only the essential information is used in the syntax tree.
Example of Syntax Tree Position := Initial + Rate * 60
Semantic Analysis This task are done in the syntax analysis or the intermediate code generation. So sometimes this phase is not considered an independent phase. It perform two task: Checking to make sure that each series of tokens will be understand by the computer when it is fully translated to machine code Converting the series of token one step closer to machine code.
Intermediate Code Generation The intermediate code generator used the structure by the syntax analyzer to create a stream of simple instructions. The code is generated by the same subroutines that phase the input stream. The intermediate language resembles the assembly language with the primary difference that intermediate code need not specify the register used in each operation.
Code Optimization Code optimization is an optional phase designed to improve the intermediate code so that the ultimate object program runs faster and takes less space. Its output is another intermediate code program that does the same job as the original, but perhaps in a way that saves time and/or space.
Object Code Generation This process takes the intermediate code produced by the optimizer and generates virtual machine code. It is this part of the compilation phase that is machine dependent. Each type of computer has an operating system that processes virtual machine code differently; therefore, the code generator must be different for each type of computer.
Object Code Generation If the program is free from syntactical errors, code generation should take place without any problem. When the code generator is finished, the code produced will be in machine code, but the format of the code is not yet executable. It is in a format (a.OBJ file in our case) that is ready to go to a linker, which will create an executable (*.EXE or *.COM) file from the machine code the compiler has generated.
Symbol-table Manager The Table Management or Book Keeping portion of the compiler keeps track of the names used by the programs and records essential information about each. The data structure used to store this information is called a Symbol table.
Error Handler The error handler is invoked when a flaw in the source program is detected. It must warn the programmer by issuing a diagnostic and adjust the information being passed from phase to phase so that each can produced. Both the table management and error handling routines interact with all phases of a compiler.
7. LINKER AND LOADER Linking is an operation that combine program units interpreted separately to form one module or one program that can be execute. This linking can be done by system program called linker. There are two type of linking: When a reference is called. When there is external reference to the labels. Linking occurs during: Coding time 2. Compile/assemble time After interpret, during loading Execution time
Linking Process Load Module Compiler/Assembler Object Module Source module Object Module Compiler/Assembler Object Module Object Module Source module Linker Object Module Compiler/Assembler Source module Object Module Main Memory Loader
7.1 Loader Loader is a system software that process instruction in the object file & place the object file in a suitable location in the physical memory, where loading occurred. Besides this loader is also performing binding which turn relative address to actual address. Object file RAM ADDRESS A A+N-1 N Memory location Object program (N Word) with absolute address
Binding can occur during: Coding time Interpreting Loading time Execution time Loader’s tasks include: Allocation To prepare & allocate space in the memory space for executable module. Relocation To recheck & change address locations. Loading To load/place machine instructions into memory. Address binding To complete and link address.
7.2 Types of Loader Loader is software that perform loading functions. Loader consists of : Absolute loader Relocatable loader Dynamic loader Linking loader Linkage loader Bootstrap loader
7.3 Design & Relocation of Loader & Linker In relocatable loader, physical address is not link during program interpretation. It is done when loading program into the memory. Address created after the interpretation process is relative address.
Relocatable Loader A BSR 30 40 500 A 20 BSR 530 540 520 29 29 530 30 B Absolute Address A BSR 30 40 500 NSR B LA1 DC.L #LB1 A 20 BSR 530 540 520 29 29 530 30 B 5 40 Relocate loader B Linker 49 10 LB1 DC.L 5 50 550 19 C 59 590 C Load Module Memory 9