Download presentation
Published byMarjorie McCarthy Modified over 8 years ago
1
Compilers Design Chapter1 : Introduction, Structure of a Compiler
2
Textbook Compilers Principles, Techniques, and Tools, Alfred V.Aho, Ravi Sethi, and Jeffrey D. Ullman, Addison-Wesley References Programming Language Processors in Java. Compilers and Interpreters, D.A. Watt and D.F. Brown, Pearson Education Ltd.
3
Objectives * To introduce principles, techniques, and tools for compiler construction * To obtaining the knowledge what a compiler does and how to build one.
4
Course Outline 1. Introduction, Structure of a Compiler
2. Lexical Analysis: Tokens, Regular Expressions 3. Parsing: Context-free grammars, predictive 4. Abstract Syntax: Semantic actions, abstract parse trees 5. Semantic Analysis: Symbol tables, bindings, type-checking 6. Stack Frames: Representation and Abstraction 7. Intermediate Code: Representation trees, translation 8. Basic Blocks: Canonical trees, conditional branches 9. Instruction Selection: Algorithms for selection 10. Liveness Analysis: Solution of dataflow equations 11. Register Allocation: Spilling
5
Why we need to know compilers?
Seeing the development of a compiler gives you a feeling for how programs work. A great example of interplay between theory and practice. Many algorithms and models you will use in compilers are fundamental, and will be useful to you elsewhere: automata, regular expressions (lexing) context-free grammars, trees (parsing) hash tables (symbol table) dynamic programming, graph coloring (code gen.)
6
Computer Organization
Applications Compiler Operating System Hardware
7
History, Programming Languages
Machine coding (binary programming – punch holes) (first generation) The computer’s ‘native language’, binary digits (0s, 1s) Assembly Language (second generation) One-to-one correspondence to machine language MOV AX, 5h MOV DX, 3h ADD AX Assembler: translates assembly language programs into machine language
8
C, C++, Java, Fortran, QuickBasic
History, Programming Languages (High-Level Languages) Procedural Languages (third generation) * Instructions translate into machine language instructions * Uses common words rather than abbreviated mnemonics C, C++, Java, Fortran, QuickBasic A= 3 B= A * 2 - 1 D= A / B + A^5 Compiler - translates the entire program at once Interpreter - translates and executes one source program statement at a time
9
Sturtured Query Language (SQL)
History, Programming Languages (High-Level Languages) Nonprocedural Languages (fourth generation) Allows the user to specify the desired result without having to specify the detailed procedures needed for achieving the result Sturtured Query Language (SQL) Natural Language Programming Languages (fifth generation (intelligent) languages) Translates natural languages into a structured, machine-readable form
10
High-Level Languages Expressions: such as +, -, *, /
Data Types: simple types (e.g. Boolean, int, float) as well as composite structures (records) and arrays - can be defined by the programmer Control Structures: allow programming of selective computation as well as iterative computation Declaration: introduce identifiers to indicate const. Values, variables, procedures etc. Abstraction: separation of concerns i.e. break a problem up and deal with sub-sets Encapsulation: (data abstraction) grouping relevant relations and selectively hiding specific information (e.g. classes)
11
Language Processors Editors: to enter text
Translator: translates text from one language to another Compiler: translates from a high-level language to low-level language Interpreter: takes a text (in a particular language) and runs it immediately Assembler: translates from an assembly language into the corresponding machine code. Programming language (source code) Translator program (Assembler, Compiler, or Interpreter) Machine language (object code)
12
What is a Compiler? A compiler is program that reads a program written in one language (source language) and translates it into an equivalent program in another language (target language) Compiler source program target program error messages Page 1
13
Compiler Source programs: Many possible source languages, from traditional, to application specific languages. Programming languages (High-level) Modeling languages Document description languages Database query languages Target programs: Another programming language, often the machine language of a particular computer system High-level programming language Low-level programming language (assembler or machine code) Application-specific target language Error messages: Essential for program development
14
compilers are used to do this conversion
Do we need Compilers? * Machines understand only 1’s and 0’s. High-level languages, make it easier for the user to program in, but not for the machine to understand. * Once the programmer has written and edited the program (in an Editor), it needs to be translated into machine language (1’s and 0’s) before it can be executed. compilers are used to do this conversion
15
Where are compilers used?
Implementation of programming languages C, C++, Java, Lisp, Prolog, SML, Haskell, Ada, Fortran Document processing DVI PostScript, Word documents PDF Natural language processing NL database query language database commands Hardware design silicon compilers, CAD data machine operations, equipment lists Report generation CAD data list of parts, All kinds of input/output translations various UNIX text filters, . . .
16
Compilers & Interpreters
* Interpreters are another class of translators * Compiler: translates a program once and for all into target language. C++ * Interpreter: effectively translates a source program every time it is run. Basic * Compilers and interpreters (highbred) are used together Java Java compiled into Java byte code, byte code interpreted by a Java Virtual Machine (JVM).
17
Compiler / Translator and Interpreter
* A translator is used to produce an “equivalent” program in another language (e.g. from C to Pascal) * Compiler is a translator that generally takes in a higher level language (e.g. C) and transforms it into a low level language (usually object or machine code). * Compiler/Translator produce the entire output code before executing * Interpreter compiles and executes a statement at a time before moving on to the next statement
18
Analysis-Synthesis Model of Compilation
There are two parts of compilation Part1, Analysis: breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Part2, Synthesis: constructs the desired target program from the intermediate representation. It requires the most specialized techniques Page 2
19
The phases of a compiler
Source Program Target Program Semantic Analyser Intermediate Code Generator Code Optimizer Code Generator Syntax Analyser Lexical Analyser Symbol Table Manager Error Handler The phases of a compiler
20
Intermediate Code Generator
Source Program Target Program Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator Syntax Analyzer Lexical Analyzer Symbol Table Manager Error Handler Decompose statements into tokens Detects errors, Reports errors Parsing, check order of tokens with grammar, create Abstract Syntax Tree Type checking, identify operators & operands Stores record for each identifier and its attributes First translation & create temp. sub-result variables Improve speed, efficiency Generates final assembly code
21
Part1, Analysis of the Source Program
Analysis consists of three phases: Lexical (Linear or Scanning): reads from left-to-right and grouped into tokens that are sequences of characters having a collective meaning. Syntax Analysis (Hierarchical or Parsing): characters or tokens are grouped hierarchically into nested collections with collective meaning. Semantic Analysis: certain checks are performed to ensure that the components of a program fit together meaningfully. Page 5
22
Lexical Analysis (Linear Analysis/ Scanning)
Translate the input program, entered as a sequence of characters, into a sequence of words or symbols (tokens). For example, the keyword for should be treated as a single entity, not as a 3 character string. position := initial + rate * 60 The assignment statement would be grouped into the following tokens 1. The identifier position 2. The assignment symbol := 3. The identifier initial 4. The plus sign + 5. The identifier rate 6. The multiplication sign * 7. The number 60 Note: the blank separating the characters of these tokens would normally be eliminated during lexical analysis
23
Syntax Analysis (Hierarchical Analysis or Parsing)
It involves grouping the tokens of the source program into grammatical phrases that are used by the compiler to synthesize output. Usually, the grammatical phrases of the source program are represented by a parse tree such as the following slide: Determine the structure of the program, for example, identify the components of each statement and expression and check for syntax errors.
24
position := initial + rate * 60
Syntax Tree assignment statement identifier expression number rate initial position := 60 * + Parse Tree
25
Semantic Analysis * Checks the source program for semantic errors and gathers type information for subsequent code generation phase * It uses the hierarchy structure determined by the syntax-analysis phase * Check that the program is reasonable, for example, that it does not include references to undefined variables. * An important component of semantic analysis is type checking position initial rate := + * 60 inttoreal
26
Part2, Synthesis Intermediate Code Generation: as a program for an abstract machine. It should be easy to produce and easy to translate into the target program. Code Optimization: attempts to improve the intermediate code. The program can be fixed during the code optimization phase. Code Generation: memory locations are selected for each of the variables used by the program. Intermediate instructions are each translated into a sequence of machine instructions that perform the same task. A crucial aspect is the assignment of variables to registers.
27
Intermediate Code Generator
temp1 := inttoreal (60) temp2 := rate * temp1 temp3 := initial + temp2 position := temp3 Code Optimizer temp1 := rate * 60.0 position := initial + temp1 Code Generator MOVF rate, R2 MULF #60, R2 MOVF initial, R1 ADDF R2, R1 MOVF R1, position
28
Symbol table Error handler
A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data from that record quickly. Error handler Each phase can encounter errors. However, after detecting an error, a phase must somehow deal with that error, so that compilation can proceed, allowing further errors in the source program to be deducted.
29
Inside the Compiler sequence of characters scanner sequence of tokens
parser checker optimizer/ code generator Lexical Analysis sequence of tokens Syntactic Analysis/ Parsing Abstract Syntax Tree (AST) Contextual Analysis/ checking Static Semantics verified/ annotated AST Optimization and Code Generation target code
30
Language Processing System
skeletal source program Pre-processor source program Compiler target assembly program Assembler Re-locatable machine code Loader/Linker
31
Language Processing System
Converts mnemonics (assembly code) into object code. Two- pass assembler : 1. denote storage locations for identifiers in symbol table 2. translate code into machine code, translate locations into addresses skeletal source program Performs: Macro-processing, File inclusion, “Rational” reprocessor, Language extension Pre-processor source program Split into 6 phases. Produces assembly code. Some compilers include the assembler too. Compiler target assembly program Assembler Links other object & library files with object code Reads file, placing relocatable addresses into proper locations in memory Re-locatable machine code Loader/Linker
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.