Introduction to Compilers
Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software engineering
Compilers A compiler is a program that reads a program written in one language – the source language – and translate it into an equivalent program in another language - the target language. Early compilers ’s
Machine Language, Assembly Language, High-Level Language Machine language is the native language of the computer on which the program is run. Native code It consists of bit strings which are interpreted by the mechanism inside the computer. Example in IBM 370 Binary: Hexadecimal: 1835 Copy the content of Register 5 into Register 3 LR 3, 5 Assembler, assembly language
Machine Language, Assembly Language, High-Level Language Example High-level language X := Y + Z; Assembly language L 3, Y ; Load the working register with Y A 3, Z ; Add Z ST 3, X ; Store the result in X
Terminology Source language Java, C, C++ Object language Machine language Object code Object file, object module Target machine The computer on which the program is to be run
Terminology Cross compiler A compiler that generates code for a machine that is different from the machine on which the compiler runs. Example: A compiler which can be run on a IBM PC but which compiles to the machine language of a special- purpose embedded system.
Compilers and Interpreters Compiler Translates the high-level program to the target program. Interpreter Executes the program.
The Environment of the Compiler
Example COMP myprog ; Compiles the program LINK myprog ; Links the program RUN myprog ; Runs the program
Phases of a Compiler Five (six) phases of compilation Lexical analysis Syntactic analysis (Semantic analysis) Intermediate code generation Optimization Object code generation
Phases of a Compiler Language Processing System
Phases of a Compiler
Two Parts of Compilation Analysis breaks up the source program creates an intermediate representation Synthesis constructs the desired target program from the intermediate representation
Analysis Lexical Analysis linear analysis, scanning Syntax Analysis parsing, hierarchical analysis Semantic Analysis Intermediate Code Generation Advantage of dividing analysis simple design compiler efficiency compiler portability
Analysis of the Source Program Lexical analysis (linear analysis) the streams of characters making up the source program is read from left-to-right and grouped into tokens Syntax analysis (hierarchical analysis) characters or tokens are grouped hierarchically into nested collections with collective meaning Semantic analysis certain checks are performed to ensure that the components of a program fit together meaningfully
Lexical Analysis Linear analysis, scanning Reads the stream of characters in the source program from left to right, and groups into tokens Tokens are sequences of characters having a collective meaning
Example: Lexical Analysis position := initial + rate * 60 id1 := id2 + id3 * 60
Syntax Analysis Parsing Hierarchical analysis Groups the tokens of the source program into grammatical phrases represented by parse tree that are used by the compiler to synthesize output.
Example: Syntax Analysis
Semantic Analysis Checks the source program for semantic errors and gathers type or semantic information for the subsequent code generation phase
Example: Semantic Analysis
Error Handler When each phases of compilation encounters error, a phase must somehow deal with that error. Error in Lexical Phase The characters in the input do not form any token of the language. Error in Syntax Phase The token stream violates the structure rules (syntax) of the language. Error in Semantic Phase Constructs have the right syntactic structure, but no meaning to the operation involved.
S/W Tools Performing Analysis Structure editors a sequence of command => a source program Pretty printers indentation, fonts Static checkers a program => discover bugs without run Interpreters performing operations
Performing Analysis Text formatters typeset text Silicon compilers circuit design Query interpreters DB
Intermediate Code Generation Explicit intermediate representation A program for an abstract machine Two properties of intermediate code Easy to produce Easy to translate into the target program Intermediate form Three address form (quadruples, triples) Two address form
Three-Address Code Has at most three operands Each three-address instruction has at most one operator in addition to the assignment The compiler must generate a temporary name to hold the value computed by each instruction May have fewer than 3 operands
Example: Three-Address Code
Example: Intermediate Code
Synthesis Part The synthesis part constructs the desired target program from the intermediate representation.
Code Optimization Improve the intermediate code to get the fast- running machine code Optimizing compiler
Example: Code Optimization
Code Generation Generates the target codes re-locatable machine code assembly code
Example: Code Generation
System Support There is a certain amount of supporting code to be supplied to the compilation. Symbol table management Error handling
System Support Symbol table handler The central repository of information about the names or identifiers in the program Error handling Implements the compiler’s response to errors in the code it is compiling. Diagnostics Where the error was found and what kind of error it was
Passes, Front End, Back End The compiler makes one or more passes through the program. A pass consists of reading a version of the program from a file and writing a new version of it to an output file. A pass normally comprises more than one phase, but the number of passes, and the phases they cover, varies.
Passes, Front End, Back End Front End Dependent on the source language and have little or no concern with the target machine Lexical analysis (Semantic analysis) Intermediate code generation Back End Machine-dependent Code optimization Target code generation
Writing a Compiler The first compiler was written in assembly language; there was no other alternative. High-level language compilers Cross compiler Useful tools – “compiler compilers” Lex Yacc
Retargetable Compilers In many cases, a compiler writer will want to adapt a compiler for use with a new target A compiler that can be modified in this way is said to be retargetable. Cross compiler Alternative approaches Distinction between Front End and Back End Compiler for imaginary machine (virtual machine)
Cousins of the Compiler Preprocessors Assemblers Loaders and Link-Editors
Preprocessor A preprocessor is a simple translator that is applied to the source program before it is submitted to the compiler. Before the program is compiled it is passed through the preprocessor, which replaces all occurrences of the pre-defined expression with the defined sequence of instructions. Example
Functions of Preprocessors Macro processing File inclusion Language extensions DB query languages embedded in highlevel languages
Example: Preprocessors The C Programming Language
Assemblers Two-pass assembly Loaders and Link-Editors
Assemblers Assembly code a mnemonic version of machine code, in which names are used instead of binary codes for operations, and names are also given to memory addresses
Two-Pass Assembly(1) In the first pass, All the identifiers that denote storage locations are found and stored in a symbol table Identifiers are assigned storage locations as they are encountered for the first time Example: b := a + 2
Two-Pass Assembly (2) In the second pass, The assembler scans the input again. It translates each operation code into the sequence of bits representing that operation in machine language It translates each identifier representing a location into the address given for that identifier in the symbol table The output of the second pass is usually relocatable machine code
Example: Relocatable Addresses Altering the relocatable address to absolute or unrelocatable machine code: Suppose that the address space containing the data is to be loaded starting at location “L = ” L must be added to the address of the instruction
Loaders and Link-Editors Loader Performs the two functions of loading and link-editing Loading Consists of taking relocatable machine code, altering the relocatable addresses, and placing the altered instructions and data in memory at the proper locations
Link-Editors Link-editor Allows us to make a single program from several files of relocatable machine code. These files may have been the result of several different compilations, and One or more may be library files. External references In which the code of one file refers to a location in another file.
Summary A quick overall picture of what a compiler does, what goes into it, and how it is organized