CS510 Compiler Lecture 1
Sources Lecture Notes Book 1 : “Compiler construction principles and practice”, Kenneth C. Louden. Book 2 : “Compilers Principles, techniques, & tools” 2nd edition AHO LAM … Stanford University PDFs
Course Degrees Compiler Labs 20 Assignment 10 Final Exam 70 --------------------------------- Total 100
Why Study Compilers? Build a large, ambitious software system. See theory come to life. Learn how to build programming languages. Learn how programming languages work. Learn tradeoffs in language design.
A Short History of Compilers First, there was nothing. Then, there was machine code. Then, there were assembly languages. Programming expensive; 50% of costs for machines went into programming
Definition A compiler is a special form of a translator (Translator) A translator is a program, or a system, that converts an input text some language to a text in another language, with the same meaning: Compiler Interpreter
Compiler and Interpreter The compiler and interpreter are used for converting the high level language to machine language.
Interpreter interpreter just does the same work as of the compiler, but the major variation is that, it converts the high level language into an intermediate code which is executed by the processor.
What is a compiler? A compiler translates (or compiles) a program written in a high-level programming language that is suitable for human programmers into the low-level machine language that is required by computers. This language can be very different from the machine language that the computer can execute, so some means of bridging the gap is required. This is where the compiler comes in.
Interpreter is on-line Compiler is off-line
What Do Compilers Do A compiler acts as a translator, transforming human-oriented programming languages into computer-oriented machine languages. Ignore machine-dependent details for programmer Programming Language (Source) Machine Language (Target) Compiler
The Structure of a Compiler (1) The Phases of a Compiler A compiler consists of a number of steps, or phases, that perform distinct logical operations The phases of a compiler are shown in the given figure, together with three supporting components that interact with some or all of the phases The Structure of a Compiler (1) Scanner Parser Semantic Analyzer Source Code Optimizer Code Generator Target Code Target Tokens Syntax Tree Annotated Tree Intermediate Literal Table Symbol Error Handler
The Structure of a Compiler (2) Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code
The Structure of a Compiler (3) Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Scanner The scanner begins the analysis of the source program by reading the input, character by character, and grouping characters into individual words and symbols (tokens) RE ( Regular expression ) NFA ( Non-deterministic Finite Automata ) DFA ( Deterministic Finite Automata ) LEX Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code
Extended Example For Scanner The Scanner: This piece of the compiler performs what is called lexical analysis: it receives the source code in the form of a stream of characters and divides it up into meaningful units called tokens. Source code: a[index] = 4 + 2 Tokens: ID Lbracket ID Rbracket AssignOp Num AddOp Num A scanner may perform other operations along with the recognition of tokens. For example, it may enter identifiers into the symbol table, and it may enter literals (numeric constants and quoted strings) into the literal table.
The Structure of a Compiler (4) Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Parser Given a formal syntax specification (typically as a context-free grammar [CFG] ), the parse reads tokens and groups them into units as specified by the productions of the CFG being used. As syntactic structure is recognized, the parser either calls corresponding semantic routines directly or builds a syntax tree. CFG ( Context-Free Grammar ) BNF ( Backus-Naur Form ) GAA ( Grammar Analysis Algorithms ) LL, LR, SLR, LALR Parsers YACC Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code
Extended Example For Parser The Parser: This piece of the compiler performs what is called syntax analysis: it receives the source code in the form of tokens and determines the structure of the source code represented as a parse tree or a syntax tree. May 20, 2018 Prof. Abdelaziz Khamis
Extended Example For Parser (Continued) A parse tree is a useful aid to visualizing the syntax of a program of program element, but it is inefficient in its representation of that structure. Parsers tend to generate a syntax tree instead, which is a “neat” version of the parse tree with only essential information May 20, 2018 Prof. Abdelaziz Khamis
The Structure of a Compiler (5) Source Program Tokens Syntactic Scanner Parser Semantic Analyzer Structure (Character Stream) Intermediate Representation Semantic Analyzer Perform two functions Check the static semantics of each construct Do the actual translation The heart of a compiler Syntax Directed Translation Semantic Processing Techniques IR (Intermediate Representation) Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code
Extended Example For Semantic Analyzer The Semantic Analyzer: This piece of the compiler performs what is called semantic analysis: it computes additional information, called attributes, needed for compilation once the syntactic structure of a program is known. These attributes, such as data types, are often added to the syntax tree. They may also be entered into the symbol table. May 20, 2018 Prof. Abdelaziz Khamis
The Structure of a Compiler (6) Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Optimizer The IR code generated by the semantic routines is analyzed and transformed into functionally equivalent but improved IR code This phase can be very complex and slow Peephole optimization loop optimization, register allocation, code scheduling Register and Temporary Management Peephole Optimization Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code
Extended Example For Code Optimizer The Source Code Optimizer: This piece of the compiler generates intermediate code/representation (such as three-address code) from the syntax tree that is more closely resembles target code. In our example, three-address code for the original C expression might look like this: t = 4 + 2 a[index] = t; Now the optimizer would improve this code in two steps, first computing the result of the addition and then replacing the temporary variable t by its value to get the three-address code a[index] = 6
The Structure of a Compiler (7) Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Code Generator Interpretive Code Generation Generating Code from Tree/Dag Grammar-Based Code Generator Optimizer Code Generator Target machine code
Extended Example For Code Generator May 20, 2018 Prof. Abdelaziz Khamis Extended Example For Code Generator The Code Generator: This piece of the compiler takes the intermediate code/representation and generates code for the target machine. In this course we will write target code in assembly language form for ease of understanding, although most compilers generate object code directly. A possible sample code sequence for the given expression might be (in a hypothetical assembly language) MOV R0, index ;; value of index → R0 MUL R0, 2 ;; double value in R0 MOV R1, &a ;; address of a → R1 ADD R1, R0 ;; add R0 to R1 MOV *R1, 6 ;; constant 6 → address in R1
Extended Example (Continued) The Target Code Optimizer: This piece of the compiler improves the target code by eliminating redundant or unnecessary operations, replacing slow instructions by faster ones, and choosing addressing modes to improve performance In the sample target code given, there are a number of improvements possible: Use a shift instruction to replace the multiplication Use indexed addressing to perform the array store With these two optimizations, our target code becomes MOV R0, index ;; value of index → R0 SHL R0 ;; double value in R0 MOV &A[R0], 6 ;; constant 6 → address of a + R0
For Lab Part
The TINY Sample Language A course on compiler construction would be incomplete without examples for each step in the compilation process In this course we will use a compiler for a small language, called TINY, as a running example for the techniques that will be discussed in the compilation phases A program in TINY has a very simple structure: A sequence of statements separated by semicolons There are no declaration and no procedures All variables are integer variables There are only two control statements: if & repeat An if-statement must be terminated by the keyword end There are read and write statements that perform input/output Expressions are limited to Boolean and integer arithmetic
TINY Example We will use the following sample program as a running example throughout the course read x; if x > 0 then fact := 1; repeat fact := fact * x; x := x - 1 until x = 0; write fact end
The TINY Compiler The TINY compiler consists of the following files. The source code for these files is listed in Appendix B of the textbook globals.h main.c util.h util.c scan.h scan.c parse.h parse.c symtab.h symtab.c analyze.h analyze.c code.h code.c cgen.h cgen.c
The TM Machine We simplify the target language of the TINY compiler to be the assembly language for a simple hypothetical machine, which we will call the TM machine The source code for a TM simulator is listed in Appendix C of the textbook. It reads from a file the target code produced by the TINY compiler and executes it Exercise: Compile the TINY compiler and the TM machine simulator. Use the TINY compiler to compile the TINY sample program, and then use the machine simulator to execute it.
C-Minus: A language for a Compiler Project A more extensive language than TINY, suitable for a compiler project, is described in Appendix A of the textbook. It is a significantly restricted subset of C, which we will call C-Minus C-Minus has the following features: It contains integers, integer arrays, and functions It has local and global declarations and recursive functions It has an if-statement and a while-statement A program consists of a sequence of function and variable declarations A main function must be declared last Appendix A provides guidance on how to modify and extend the TINY compiler to C-Minus
C-Minus Example The following is a sample program in C-Minus. More examples exist in Appendix A of the textbook int fact( int x ) { if (x > 1) return x * fact(x-1); else return 1; } void main( void ) { int x; x = read(); if (x > 0) write( fact(x) );